summaryrefslogtreecommitdiffhomepage
path: root/sisu-markup_tree-sitter.md
diff options
context:
space:
mode:
Diffstat (limited to 'sisu-markup_tree-sitter.md')
-rw-r--r--sisu-markup_tree-sitter.md32
1 files changed, 31 insertions, 1 deletions
diff --git a/sisu-markup_tree-sitter.md b/sisu-markup_tree-sitter.md
index a76b623..65297f3 100644
--- a/sisu-markup_tree-sitter.md
+++ b/sisu-markup_tree-sitter.md
@@ -18,7 +18,15 @@
### What it parses
-- **Headers**: Version comment, YAML-like fields with continuations, comments
+- **Headers** (both dialects):
+ - YAML dialect (sisudoc-spine): `title:` / `creator:` mapping,
+ 2-space indented continuation lines, `#` line comments, `# SiSU 8.0`
+ style banner.
+ - Bespoke dialect (original SiSU): `@title:` / `@creator:` keys with
+ 1-space ` :sub-key:` continuations (and 1-space wrap continuations
+ under `@links:` / `@classify:`), `%` line comments, `% SiSU 4.0.0`
+ style banner. The two dialects share a single grammar; the
+ yaml-vs-bespoke distinction is purely the header preamble.
- **Headings**: All levels (`:A~` through `:D~`, `1~` through `3~`), named segments, numbering suppression
- **Block elements**: code, poem, block, group, table (both `curly{}` and `` ``` tic `` syntax), quote blocks
- **Inline formatting**: emphasis `*{}*`, bold `!{}!`, italic `/{}/`, underline `_{}_`, citation `"{}"`, superscript `^{}^`, subscript `,{},`, inserted `+{}+`, strikethrough `-{}-`, monospace `#{}#` - all recursive/nestable
@@ -30,6 +38,8 @@
### Test results against real documents
+#### sisudoc-spine corpus (yaml-header dialect)
+
Across the full pod-samples corpus (26 `.sst` files): **22 / 26 parse with
zero errors**. The remaining four are summarised below.
@@ -49,6 +59,26 @@ regex highlighter masked (crossed nesting, dropped close markers, missing
channel markers); leaving them as parse errors lets editors flag the
problems instead of silently mis-colouring the surrounding text.
+#### original sisu corpus (bespoke-header dialect)
+
+Run against `sisu-markup-samples/data/samples/`:
+
+| Layout | Pass / Total | Note |
+|-----------|--------------|-------------------------------------------------------|
+| minimal | 2 / 2 | |
+| current | 20 / 21 | one residual source-markup edge case |
+| generic | 15 / 21 | older / pre-cleanup variants |
+| wrapped | 7 / 21 | physically hard-wrapped paragraphs; known limitation |
+| **total** | **44 / 65** | |
+
+The `wrapped/` variants are the dominant failure mode. They hard-wrap
+each paragraph across multiple physical lines; the grammar currently
+treats each physical line as its own paragraph node, so wrapped lines
+trip later structural rules. This is the same "multi-line paragraphs"
+limitation listed below; it affects sisu's `wrapped/` layout heavily
+because that layout was produced by a text-formatting pass. Documents
+in `current/` (one paragraph per physical line) parse at 95 percent.
+
### Known limitations
- **Crossed inline nesting**: Sources that interleave inline markers