| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- include all (doc abstraction) .ssp in pod zip and in digests
- fixed: for multi-language pods built with --pod2, only the last
language's .ssp file was being written into pod.zip and listed in
.digests.txt each languages' .ssp files were on disk in the pod
directory (copied during their own per-language passes) but were not
in final zip as it was being built once for each language and
writing over previous, (only the last one remaining). The solution
is to follow the pattern already used to avoid this by .sstm and
.ssi, namely wait for the last language and iterate the
manifest_list_of_languages internaly.
(assisted by Claude-Code)
|
| |
|
|
| |
(assisted by Claude-Code)
|
| |
|
|
|
|
| |
- fatal error on missing/unwritable --sqlite-db-path
(assisted by Claude-Code)
|
| |
|
|
|
| |
- odd hilighting issue ... must result from my org config, but "fix"
makes things easier for me.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add int[] children_headings field to DocObj_MetaInfo_ and
compute it in the post-processing pass of metadoc_from_src.d,
right after last_descendant_ocn. Single O(n) pass builds a
parent_ocn -> child heading OCNs map, then assigns to each
heading object. Useful for tree-structured output.
The .ssp serializer now reads directly from the abstraction
field instead of pre-computing its own map.
metadoc_object_setter.d: +1 line (field declaration)
metadoc_from_src.d: +17 lines (computation)
create_abstraction_txt.d: -10 lines (simplified)
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Finer-grained control over when .ssp files are produced:
--show-abstraction writes .ssp to OUTPUT/lang/abstraction/
independently of any pod flag
--pod builds pod without .ssp bundled
--pod2 builds pod with .ssp in media/abstraction/
Changes to spine.d:
- show_abstraction() now only responds to its own flag and
pod2, no longer triggered by source_or_pod
- Add pod2 to opts init, getopt, OptActions
- pod() returns true for both --pod and --pod2
- source_or_pod() includes pod2
Changes to source_pod.d:
- Remove per-document pod directory (rmdirRecurse) before
regeneration, ensuring clean slate on every run. This
prevents stale content from previous runs (e.g. a --pod2
run followed by --pod would otherwise leave an outdated
media/abstraction/ directory)
- Gate abstraction directory creation and .ssp bundling on
pod2 flag specifically
Tested: --pod (no .ssp), --pod2 (.ssp in pod + zip),
--show-abstraction (standalone .ssp), --pod after --pod2
(stale abstraction cleaned up). All 35 sample documents pass.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add empty-string guards to array property loops
(.stow_link, .lev4_subtoc, .anchor_tag) so entries with
zero-length values are not emitted. Empty properties have
no value for PEG parsing - absent lines are faster to skip
than matching a property name to find an empty value.
Removes 1488 empty .anchor_tag: lines from Wealth of
Networks .ssp alone.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add explicit child heading OCN lists to heading objects,
pre-computed in a single O(n) pass over the body section
before serialization. This makes the document tree directly
navigable without scanning - each heading lists its direct
sub-heading OCNs.
- Example output for a chapter heading:
[10] heading :1
.last_descendant: 65
.children: 14 24 42 57
- Implementation: builds an int[][int] map (parent_ocn ->
child heading OCNs) from one pass over the body objects,
then emits .children: during serialization for headings
that have entries in the map.
- The tree was already reconstructable from parent_ocn +
last_descendant_ocn, but .children makes it immediate -
no scanning required to find a heading's sub-structure.
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Make the .ssp format a complete representation of the
document abstraction by serializing all remaining fields
from ObjGenericComposite (only omitting ptr.* runtime
indices which are meaningless outside the in-memory context).
- New fields added:
.ancestors_collapsed: - collapsed level ancestor chain
.dom_status: - DOM structure markedup tags status[8]
.dom_status_collapsed: - DOM structure collapsed status[8]
.heading_lev_collapsed: - collapsed heading level
.parent_lev: - parent heading level (markup)
.o_n_type: - object numbering type (0=ocn, 1=non, 2=bkidx)
.is_of_type: - para/block type classification
.attrib: - general attributes string
.meta_lang: - block language (group/block/quote)
.meta_syntax: - codeblock syntax from metainfo
.sha256: - hex-encoded SHA-256 digest of object content
.has: images_no_dim - image without dimensions flag
.table_aligns: - column alignment array
.table_walls: - table walls/borders flag
.stow_link: - extracted URLs (one per line)
.heading_lev_anchor: - heading level anchor tag
.segment_epub: - EPUB segment anchor tag
.heading_ancestors_text: - pipe-separated ancestor headings
.lev4_subtoc: - sub-table-of-contents entries (one per line)
.anchor_tag: - additional anchor tags (one per line)
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- For heading objects, the identifier was always emitted on the
declaration line (e.g. "[10] heading :1 10") even when it was
just the OCN repeated. Now only emits the identifier when it
differs from the OCN (i.e. when there is a named segment like
"acknowledgments" or "a1"), reducing redundancy.
Before: [10] heading :1 10
After: [10] heading :1
Named segments still appear: [0] heading :1 a1
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- When --source/--pod is used, automatically generate the .ssp
document abstraction and bundle it into the pod at
media/abstraction/{doc_uid}.{lang}.ssp
- This makes show_abstraction implicitly true when source_or_pod
is active, so the .ssp file is generated before the pod
assembler runs (abstraction runs before outputHub, and
source_or_pod is the first task in outputHub).
- Changes:
paths_source.d:
Add abstraction_root() path helper to _PodPaths struct,
following the same pattern as image_root(). Produces
paths like pod/media/abstraction/ for both zpod (inside
zip) and filesystem_open_zpod (open directory).
source_pod.d:
- Create media/abstraction/ directory in
podArchive_directory_tree
- Bundle .ssp file in pod_zipMakeReady: reads from the
abstraction output directory, copies to open pod
directory, adds to zip archive, computes SHA-256 digest
- Write .ssp digest in zipArchiveDigest alongside sstm
and ssi digests
spine.d:
Make show_abstraction() return true when source_or_pod is
active (previously only returned true for explicit
--show-abstraction flag).
- The .ssp is always included when building pods - no exclusion
flag for this experimental feature to keep things simple.
Not generated for non-pod outputs (--text, --html, etc.)
unless --show-abstraction is explicitly passed.
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--show-abstraction-db flag to write per-document
- SQLite database of document abstraction
(Claude-Code primary assist)
- Add a new output mode that serializes the in-memory document
abstraction to a per-document SQLite database. This complements
the .ssp text format (--show-abstraction) with a queryable
database representation of the same data.
- Schema:
metadata table - key/value pairs for document metadata
(title, creator, dates, rights, classify, identifiers,
language, notes, make settings, doc_has counts)
objects table - one row per document object with columns:
section, seq (position within section), ocn, is_a,
is_of_part, is_of_type, heading_level, identifier,
parent_ocn, last_descendant_ocn, ancestors,
indent/bullet/lang, has_* flags, segment/anchor tags,
table/code properties, text content
Indexed on: section, ocn, parent_ocn, is_a, heading_level
- Uses prepared statements via d2sqlite3 (existing dependency)
for safe and efficient insertion. Each document produces a
standalone .abstraction.db file in the abstraction/ output
directory.
- New files:
src/sisudoc/io_out/create_abstraction_db.d
Follows the same pattern as create_abstraction_txt.d.
Creates schema, populates metadata via key/value inserts,
then iterates all sections writing objects with prepared
statements within a single transaction.
- Changes to spine.d:
- Add "show-abstraction-db" to opts init, getopt, OptActions
- Add to abstraction(), require_processing_files(), and
meta_processing_general() gates
- Insert call at both spineAbstraction sites
- Tested against all 35 sample documents (including 9-language
live-manual) - zero failures. Works standalone or combined
with --show-abstraction and other output flags.
- Example queries the database supports:
SELECT ocn, heading_level, text FROM objects
WHERE is_a = 'heading' AND section = 'body';
SELECT * FROM objects WHERE parent_ocn = 10;
SELECT key, value FROM metadata WHERE key LIKE 'title.%';
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--show-abstraction flag to write .ssp document abstraction files
- Add a new output mode that serializes the in-memory document
abstraction (produced by spineAbstraction) to a human-readable,
line-oriented text format (.ssp). This captures the full object
model after parsing and abstraction but before output generation.
- The .ssp format uses unambiguous line prefixes:
@section { } - section boundaries (head/toc/body/endnotes/...)
[N] type - object declaration with OCN
.name: value - object properties (only non-defaults)
| content - text content lines
% comment - comments
- New files:
src/sisudoc/io_out/create_abstraction_txt.d
Serializer module following the same template pattern as
metadoc_show_summary.d. Walks doc.abstraction() section by
section, writing metadata preamble (@meta, @make, @doc_has)
then each object with its properties and text content.
Output goes to {output_path}/{lang}/abstraction/{doc}.ssp
- Changes to spine.d:
- Add "show-abstraction" to opts initialization, getopt, and
OptActions struct
- Add show_abstraction to abstraction(), require_processing_files(),
and meta_processing_general() so the flag triggers full document
processing
- Insert call at both spineAbstraction sites (parallel and serial
branches), gated by show_abstraction flag, following the same
pattern as show_config/show_summary/show_make
- Tested against all 35 sample documents (including multilingual
live-manual in 9 languages) - zero failures. Works standalone
(--show-abstraction) or combined with other output flags
(--show-abstraction --html --text). No effect on existing code
paths when the flag is not used.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| | |
|
| |
|
|
|
|
| |
- claude contributed src
- processes zip from url using (system
installed) curl for download
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- claude contributed src
- Opens the zip with std.zip.ZipArchive (reads the whole file into
memory)
- Locates pod.manifest inside the archive to discover document paths
and languages
- Extracts markup files (.sst/.ssm/.ssi) as in-memory strings
- Extracts images as in-memory byte arrays
- Extracts conf/dr_document_make if present
- Presents these to the existing pipeline as if they were read from
the filesystem
- Some security mitigations:
- Zip Slip / Path Traversal: Reject entries containing `..` or
starting with `/`; canonicalize resolved paths and verify they
fall within extraction root
- Zip Bomb: Check `ArchiveMember.size` before extracting; enforce
per-file (50MB) and total size limits (500MB)
- Entry Count: Limit number of entries (a pod should have at most
~100 files)
- Path depth: limit (Maximum 10 path components).
- Symlinks: Verify no symlinks in extracted content before
processing (post-extraction recursive scan)
- Filename Validation: Only allow expected characters; reject null
bytes
- Malformed Zips: Catch `ZipException` from `std.zip.ZipArchive`
constructor
- Cleanup on error
|
| |
|
|
|
| |
- FIXES issue with .tex files and xetex finding image paths when run
within latex/ output directory
|
| | |
|
| |
|
|
| |
- revisit links (fix later)
|
| |
|
|
|
|
| |
- preferable, endnote parent object number
available for use (as here in text output,
compare "endnotes, add caller ocn" commit)
|
| | |
|
| | |
|
| |
|
|
| |
- spine --text [--output=output path] [markup source]
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
- appears to work, but needs review
|
| |
|
|
| |
- plus minor housekeeping/tidy
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
| |
- tics a bit cumbersome where single quotes work
just as well
- testing required (special cases not covered)
- diverges from sisu markup which will need an
update sometime
|
| | |
|
| |
|
|
|
|
| |
- struct replaces tuple
- some direct naming of structs returned
(instead of use of auto) - minor
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
- serial processing (need to be built serially)
- multilingual pods, copy all languages before zip
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
- used e.g. in html text home button
|
| | |
|
| |
|
|
|
| |
- src/sisudoc (replaces src/doc_reform)
- sisudoc spine (used more)
|
| | |
|