diff options
Diffstat (limited to 'sisu-markup_tree-sitter.md')
| -rw-r--r-- | sisu-markup_tree-sitter.md | 32 |
1 files changed, 31 insertions, 1 deletions
diff --git a/sisu-markup_tree-sitter.md b/sisu-markup_tree-sitter.md index a76b623..65297f3 100644 --- a/sisu-markup_tree-sitter.md +++ b/sisu-markup_tree-sitter.md @@ -18,7 +18,15 @@ ### What it parses -- **Headers**: Version comment, YAML-like fields with continuations, comments +- **Headers** (both dialects): + - YAML dialect (sisudoc-spine): `title:` / `creator:` mapping, + 2-space indented continuation lines, `#` line comments, `# SiSU 8.0` + style banner. + - Bespoke dialect (original SiSU): `@title:` / `@creator:` keys with + 1-space ` :sub-key:` continuations (and 1-space wrap continuations + under `@links:` / `@classify:`), `%` line comments, `% SiSU 4.0.0` + style banner. The two dialects share a single grammar; the + yaml-vs-bespoke distinction is purely the header preamble. - **Headings**: All levels (`:A~` through `:D~`, `1~` through `3~`), named segments, numbering suppression - **Block elements**: code, poem, block, group, table (both `curly{}` and `` ``` tic `` syntax), quote blocks - **Inline formatting**: emphasis `*{}*`, bold `!{}!`, italic `/{}/`, underline `_{}_`, citation `"{}"`, superscript `^{}^`, subscript `,{},`, inserted `+{}+`, strikethrough `-{}-`, monospace `#{}#` - all recursive/nestable @@ -30,6 +38,8 @@ ### Test results against real documents +#### sisudoc-spine corpus (yaml-header dialect) + Across the full pod-samples corpus (26 `.sst` files): **22 / 26 parse with zero errors**. The remaining four are summarised below. @@ -49,6 +59,26 @@ regex highlighter masked (crossed nesting, dropped close markers, missing channel markers); leaving them as parse errors lets editors flag the problems instead of silently mis-colouring the surrounding text. +#### original sisu corpus (bespoke-header dialect) + +Run against `sisu-markup-samples/data/samples/`: + +| Layout | Pass / Total | Note | +|-----------|--------------|-------------------------------------------------------| +| minimal | 2 / 2 | | +| current | 20 / 21 | one residual source-markup edge case | +| generic | 15 / 21 | older / pre-cleanup variants | +| wrapped | 7 / 21 | physically hard-wrapped paragraphs; known limitation | +| **total** | **44 / 65** | | + +The `wrapped/` variants are the dominant failure mode. They hard-wrap +each paragraph across multiple physical lines; the grammar currently +treats each physical line as its own paragraph node, so wrapped lines +trip later structural rules. This is the same "multi-line paragraphs" +limitation listed below; it affects sisu's `wrapped/` layout heavily +because that layout was produced by a text-formatting pass. Documents +in `current/` (one paragraph per physical line) parse at 95 percent. + ### Known limitations - **Crossed inline nesting**: Sources that interleave inline markers |
