summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorRalph Amissah <ralph.amissah@gmail.com>2026-05-10 00:05:07 -0400
committerRalph Amissah <ralph.amissah@gmail.com>2026-05-10 00:05:07 -0400
commit53052a79504bc99e18856d1106faf3c8b324afec (patch)
treeaef93838ec6361d84672a05398b5028f1e870272
parentqueries: add textobjects.scm and indents.scm (diff)
docs: update README for v2 grammar changes
Reflect the current state of the grammar, queries, and corpus results: - v2 changes section listing the four grammar fixes (multi-line note bodies, digit-leading segment names, header_key keyset restriction, editor_note_marker promotion). - Test results across the full pod-samples corpus (22 / 26 docs parse with zero errors). The remaining four are itemised: three source markup bugs the grammar now correctly surfaces (crossed inline nesting, missing footnote close `~`, bare editor note without channel marker) and one acknowledged grammar gap (bare collection-link form `{text}filename.sst`). - Known limitations updated: removed the multi-line footnote entry, added the bare-collection-link gap and the bare-editor-note rejection rule. - New query files (textobjects.scm, indents.scm) listed. (assisted by Claude-Code)
-rw-r--r--sisu-markup_tree-sitter.md80
1 files changed, 66 insertions, 14 deletions
diff --git a/sisu-markup_tree-sitter.md b/sisu-markup_tree-sitter.md
index 7b8c51d..a76b623 100644
--- a/sisu-markup_tree-sitter.md
+++ b/sisu-markup_tree-sitter.md
@@ -30,20 +30,72 @@
### Test results against real documents
-| Document | Lines | Result |
-|---|---|---|
-| Alice in Wonderland | 1,877 | 0 errors |
-| Wealth of Networks | 5,833 | 0 errors |
-| GPL v3 | 297 | 0 errors |
-| Free Culture | 5,690 | 1 error (multi-line footnote) |
-| Autonomous Contract | 329 | 1 error (multi-line footnote) |
-
-### Known limitations (v1)
-
-- **Multi-line footnotes**: Footnote content that wraps across physical lines causes parse errors. Most footnotes are single-line and work correctly.
-- **Book index content**: Parsed as a blob rather than structured entries (terms, sub-terms, spans). Still highlighted correctly as a unit.
-- **Block content**: Inline markup inside poem/group/table blocks is not parsed (treated as raw content).
-- **Multi-line paragraphs**: Each physical line is a separate paragraph node (doesn't merge continuation lines).
+Across the full pod-samples corpus (26 `.sst` files): **22 / 26 parse with
+zero errors**. The remaining four are summarised below.
+
+| Document | Result |
+|---|---|
+| Alice in Wonderland (1,877 lines) | 0 errors |
+| Wealth of Networks (5,833 lines) | 0 errors |
+| GPL v3 (297 lines) | 0 errors |
+| Free Culture (5,690 lines) | 0 errors |
+| Autonomous Contract | 1 error - **source markup bug**: crossed inline nesting `_{ ... /{ ... }_ ... }/` on line 103 |
+| The Public Domain (Boyle) | 1 error - **source markup bug**: footnote close missing `~` (`200.}` instead of `200.}~`) on line 355 |
+| Not Without Help (Amissah) | 1 error - **source markup bug**: bare `~[ ... ]~` editor note without an explicit channel marker (`*` or `+`) on line 695 |
+| sisu-manual (`sisu_markup.sst`) | 1 error - **grammar gap**: bare collection-link form `{text}filename.sst` (no `:` prefix) is a documented SiSU shorthand the grammar does not yet model |
+
+The structural grammar reports source-side markup bugs that the legacy
+regex highlighter masked (crossed nesting, dropped close markers, missing
+channel markers); leaving them as parse errors lets editors flag the
+problems instead of silently mis-colouring the surrounding text.
+
+### Known limitations
+
+- **Crossed inline nesting**: Sources that interleave inline markers
+ improperly (e.g. `_{ ... /{ ... }_ ... }/`) parse as ERROR. This is by
+ design - the regex highlighter tolerated this because it did not enforce
+ nesting; a structural grammar cannot.
+- **Bare editor note**: `~[ ... ]~` without `*` or `+` is rejected.
+ Authors must pick the asterisk channel (`~[* ... ]~`) or plus channel
+ (`~[+ ... ]~`).
+- **Bare collection-link form**: `{text}filename.{sst,ssm,ssi}` (no `:`
+ prefix) is not yet recognised; only `{text}:filename` is. Worth adding
+ as a follow-up.
+- **Book index content**: Parsed as a blob rather than structured entries
+ (terms, sub-terms, spans). Still highlighted correctly as a unit.
+- **Block content**: Inline markup inside poem/group/table blocks is not
+ parsed (treated as raw content).
+- **Multi-line paragraphs**: Each physical line is a separate paragraph
+ node (doesn't merge continuation lines).
+
+### v2 changes
+
+- **Multi-line footnotes**: Footnotes and editor notes whose body wraps
+ across physical lines now parse cleanly, via a dedicated `_note_inline`
+ rule that admits newlines in text runs.
+- **Editor note channel marker**: `~[*` / `~[+` are now captured as
+ `editor_note_marker` (a child node), so themes can colour the asterisk
+ and plus channels distinctly via the `@attribute` capture in
+ `highlights.scm`.
+- **Numeric segment names**: `1~name` segment ids may now begin with a
+ digit (e.g. `2~1 ...` as used in /Free Culture/).
+- **Header keys**: restricted to the recognised top-level SiSU YAML keys
+ (`title:`, `creator:`, `date:`, `rights:`, `classify:`, `identifier:`,
+ `original:`, `notes:`, `links:`, `make:`, `publisher:`, `license:`).
+ Previous regex `[a-z][a-z_]*:` accidentally matched URL prefixes such as
+ `http:` on a line by themselves, and sentence fragments mid-document.
+
+### New query files
+
+In addition to `highlights.scm`, `folds.scm`, and `injections.scm`, this
+release adds:
+
+- **`textobjects.scm`** - heading section, block body, footnote, link,
+ paragraph, inline-formatting and book-index objects, in `.outer` and
+ `.inner` flavours. Compatible with `nvim-treesitter-textobjects` and
+ Emacs `treesit-thing-settings`.
+- **`indents.scm`** - indentation hints. SiSU is largely flat-indented;
+ these mostly anchor top-level structures at column 0.
### Usage