aboutsummaryrefslogtreecommitdiffhomepage
path: root/org/out_src_abstraction_peg_text.org
Commit message (Collapse)AuthorAgeFilesLines
* org headers rearranged (& odd hilighting issue)HEADmainRalph Amissah11 days1-5/+5
| | | | | - odd hilighting issue ... must result from my org config, but "fix" makes things easier for me.
* org/ out of sync with ./src (sync)Ralph Amissah11 days1-12/+2
|
* .ssp: omit empty-value array property entriesRalph Amissah2026-04-221-3/+6
| | | | | | | | | | | | | Add empty-string guards to array property loops (.stow_link, .lev4_subtoc, .anchor_tag) so entries with zero-length values are not emitted. Empty properties have no value for PEG parsing - absent lines are faster to skip than matching a property name to find an empty value. Removes 1488 empty .anchor_tag: lines from Wealth of Networks .ssp alone. Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
* .ssp: add .children property for heading tree navigationRalph Amissah2026-04-221-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | - Add explicit child heading OCN lists to heading objects, pre-computed in a single O(n) pass over the body section before serialization. This makes the document tree directly navigable without scanning - each heading lists its direct sub-heading OCNs. - Example output for a chapter heading: [10] heading :1 .last_descendant: 65 .children: 14 24 42 57 - Implementation: builds an int[][int] map (parent_ocn -> child heading OCNs) from one pass over the body objects, then emits .children: during serialization for headings that have entries in the map. - The tree was already reconstructable from parent_ocn + last_descendant_ocn, but .children makes it immediate - no scanning required to find a heading's sub-structure. - Tested against all 35 sample documents - zero failures. Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
* .ssp serializer: include all ObjGenericComposite fieldsRalph Amissah2026-04-221-9/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Make the .ssp format a complete representation of the document abstraction by serializing all remaining fields from ObjGenericComposite (only omitting ptr.* runtime indices which are meaningless outside the in-memory context). - New fields added: .ancestors_collapsed: - collapsed level ancestor chain .dom_status: - DOM structure markedup tags status[8] .dom_status_collapsed: - DOM structure collapsed status[8] .heading_lev_collapsed: - collapsed heading level .parent_lev: - parent heading level (markup) .o_n_type: - object numbering type (0=ocn, 1=non, 2=bkidx) .is_of_type: - para/block type classification .attrib: - general attributes string .meta_lang: - block language (group/block/quote) .meta_syntax: - codeblock syntax from metainfo .sha256: - hex-encoded SHA-256 digest of object content .has: images_no_dim - image without dimensions flag .table_aligns: - column alignment array .table_walls: - table walls/borders flag .stow_link: - extracted URLs (one per line) .heading_lev_anchor: - heading level anchor tag .segment_epub: - EPUB segment anchor tag .heading_ancestors_text: - pipe-separated ancestor headings .lev4_subtoc: - sub-table-of-contents entries (one per line) .anchor_tag: - additional anchor tags (one per line) - Tested against all 35 sample documents - zero failures. Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
* .ssp serializer: omit identifier when it equals OCNRalph Amissah2026-04-221-3/+6
| | | | | | | | | | | | | | | - For heading objects, the identifier was always emitted on the declaration line (e.g. "[10] heading :1 10") even when it was just the OCN repeated. Now only emits the identifier when it differs from the OCN (i.e. when there is a named segment like "acknowledgments" or "a1"), reducing redundancy. Before: [10] heading :1 10 After: [10] heading :1 Named segments still appear: [0] heading :1 a1 Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
* .ssp document abstraction as PEG parsable textRalph Amissah2026-04-221-0/+314
--show-abstraction flag to write .ssp document abstraction files - Add a new output mode that serializes the in-memory document abstraction (produced by spineAbstraction) to a human-readable, line-oriented text format (.ssp). This captures the full object model after parsing and abstraction but before output generation. - The .ssp format uses unambiguous line prefixes: @section { } - section boundaries (head/toc/body/endnotes/...) [N] type - object declaration with OCN .name: value - object properties (only non-defaults) | content - text content lines % comment - comments - New files: src/sisudoc/io_out/create_abstraction_txt.d Serializer module following the same template pattern as metadoc_show_summary.d. Walks doc.abstraction() section by section, writing metadata preamble (@meta, @make, @doc_has) then each object with its properties and text content. Output goes to {output_path}/{lang}/abstraction/{doc}.ssp - Changes to spine.d: - Add "show-abstraction" to opts initialization, getopt, and OptActions struct - Add show_abstraction to abstraction(), require_processing_files(), and meta_processing_general() so the flag triggers full document processing - Insert call at both spineAbstraction sites (parallel and serial branches), gated by show_abstraction flag, following the same pattern as show_config/show_summary/show_make - Tested against all 35 sample documents (including multilingual live-manual in 9 languages) - zero failures. Works standalone (--show-abstraction) or combined with other output flags (--show-abstraction --html --text). No effect on existing code paths when the flag is not used. Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)