diff options
| -rw-r--r-- | sisu-markup_tree-sitter.md | 80 |
1 files changed, 66 insertions, 14 deletions
diff --git a/sisu-markup_tree-sitter.md b/sisu-markup_tree-sitter.md index 7b8c51d..a76b623 100644 --- a/sisu-markup_tree-sitter.md +++ b/sisu-markup_tree-sitter.md @@ -30,20 +30,72 @@ ### Test results against real documents -| Document | Lines | Result | -|---|---|---| -| Alice in Wonderland | 1,877 | 0 errors | -| Wealth of Networks | 5,833 | 0 errors | -| GPL v3 | 297 | 0 errors | -| Free Culture | 5,690 | 1 error (multi-line footnote) | -| Autonomous Contract | 329 | 1 error (multi-line footnote) | - -### Known limitations (v1) - -- **Multi-line footnotes**: Footnote content that wraps across physical lines causes parse errors. Most footnotes are single-line and work correctly. -- **Book index content**: Parsed as a blob rather than structured entries (terms, sub-terms, spans). Still highlighted correctly as a unit. -- **Block content**: Inline markup inside poem/group/table blocks is not parsed (treated as raw content). -- **Multi-line paragraphs**: Each physical line is a separate paragraph node (doesn't merge continuation lines). +Across the full pod-samples corpus (26 `.sst` files): **22 / 26 parse with +zero errors**. The remaining four are summarised below. + +| Document | Result | +|---|---| +| Alice in Wonderland (1,877 lines) | 0 errors | +| Wealth of Networks (5,833 lines) | 0 errors | +| GPL v3 (297 lines) | 0 errors | +| Free Culture (5,690 lines) | 0 errors | +| Autonomous Contract | 1 error - **source markup bug**: crossed inline nesting `_{ ... /{ ... }_ ... }/` on line 103 | +| The Public Domain (Boyle) | 1 error - **source markup bug**: footnote close missing `~` (`200.}` instead of `200.}~`) on line 355 | +| Not Without Help (Amissah) | 1 error - **source markup bug**: bare `~[ ... ]~` editor note without an explicit channel marker (`*` or `+`) on line 695 | +| sisu-manual (`sisu_markup.sst`) | 1 error - **grammar gap**: bare collection-link form `{text}filename.sst` (no `:` prefix) is a documented SiSU shorthand the grammar does not yet model | + +The structural grammar reports source-side markup bugs that the legacy +regex highlighter masked (crossed nesting, dropped close markers, missing +channel markers); leaving them as parse errors lets editors flag the +problems instead of silently mis-colouring the surrounding text. + +### Known limitations + +- **Crossed inline nesting**: Sources that interleave inline markers + improperly (e.g. `_{ ... /{ ... }_ ... }/`) parse as ERROR. This is by + design - the regex highlighter tolerated this because it did not enforce + nesting; a structural grammar cannot. +- **Bare editor note**: `~[ ... ]~` without `*` or `+` is rejected. + Authors must pick the asterisk channel (`~[* ... ]~`) or plus channel + (`~[+ ... ]~`). +- **Bare collection-link form**: `{text}filename.{sst,ssm,ssi}` (no `:` + prefix) is not yet recognised; only `{text}:filename` is. Worth adding + as a follow-up. +- **Book index content**: Parsed as a blob rather than structured entries + (terms, sub-terms, spans). Still highlighted correctly as a unit. +- **Block content**: Inline markup inside poem/group/table blocks is not + parsed (treated as raw content). +- **Multi-line paragraphs**: Each physical line is a separate paragraph + node (doesn't merge continuation lines). + +### v2 changes + +- **Multi-line footnotes**: Footnotes and editor notes whose body wraps + across physical lines now parse cleanly, via a dedicated `_note_inline` + rule that admits newlines in text runs. +- **Editor note channel marker**: `~[*` / `~[+` are now captured as + `editor_note_marker` (a child node), so themes can colour the asterisk + and plus channels distinctly via the `@attribute` capture in + `highlights.scm`. +- **Numeric segment names**: `1~name` segment ids may now begin with a + digit (e.g. `2~1 ...` as used in /Free Culture/). +- **Header keys**: restricted to the recognised top-level SiSU YAML keys + (`title:`, `creator:`, `date:`, `rights:`, `classify:`, `identifier:`, + `original:`, `notes:`, `links:`, `make:`, `publisher:`, `license:`). + Previous regex `[a-z][a-z_]*:` accidentally matched URL prefixes such as + `http:` on a line by themselves, and sentence fragments mid-document. + +### New query files + +In addition to `highlights.scm`, `folds.scm`, and `injections.scm`, this +release adds: + +- **`textobjects.scm`** - heading section, block body, footnote, link, + paragraph, inline-formatting and book-index objects, in `.outer` and + `.inner` flavours. Compatible with `nvim-treesitter-textobjects` and + Emacs `treesit-thing-settings`. +- **`indents.scm`** - indentation hints. SiSU is largely flat-indented; + these mostly anchor top-level structures at column 0. ### Usage |
