SpecCompiler Core Design
This document describes the software architecture and design of SpecCompiler Core.
SpecCompiler uses a layered architecture with the engine orchestrating a five-phase Pipeline that processes documents through registered Handler.
The architecture comprises 30 CSCs organized in four source layers and two model packages. For the complete decomposition with all 162 CSUs, see the Software Decomposition chapter.
SpecCompiler uses a dynamic type system where models define available types for objects, Float, relations, and views.
Each type module exports a category-specific definition:
-- Example: models/default/types/objects/hlr.lua
local M = {}
M.object = {
id = "HLR",
long_name = "High-Level Requirement",
description = "System-level requirement",
extends = "TEXTUAL",
attributes = {
{ name = "status", datatype = "STATUS_ENUM", min_occurs = 1 },
{ name = "rationale", datatype = "STRING" }
}
}
-- Optional handler for custom rendering
M.handler = {
name = "hlr_render",
prerequisites = { "spec_objects" },
on_transform = function(data, ctx, diag) ... end
}
return M| Category | Database Table | Key Fields |
|---|---|---|
| Specifications | spec_specification_types | id, long_name, extends, is_default |
| Objects | spec_object_types | id, long_name, extends, is_default (HLR, FD, CSC, CSU, VC, etc.) |
| Floats | spec_float_types | id, long_name, counter_group, needs_external_render |
| Relations | spec_relation_types | id, source_type_ref, target_type_ref, link_selector |
| Views | spec_view_types | id, materializer_type, counter_group |
The Specification Intermediate Representation (SpecIR) is implemented as a SQLite Database schema composed from:
src/db/schema/types.lua
src/db/schema/content.lua
src/db/schema/build.lua
src/db/schema/search.lua
src/db/schema/init.lua (combines all
schema modules, initializes EAV pivot views)
The schema has four domains:
| Domain | Tables |
|---|---|
| Type system | spec_specification_types,
spec_object_types, spec_float_types,
spec_relation_types, spec_view_types,
datatype_definitions, spec_attribute_types,
enum_values |
| Content | specifications,
spec_objects, spec_floats,
spec_relations, spec_views,
spec_attribute_values |
| Build cache | build_graph,
source_files, output_cache |
| Search (FTS5) | fts_objects,
fts_attributes, fts_floats |
SpecIR is a ReqIF inspired relational metamodel that lowers textual specifications into a typed intermediate representation against which structural validity is evaluated.
Where Pandoc provides the syntactic bridge from Markdown to a structured AST, SpecIR provides the semantic layer by separating: a type layer (Γ), which defines what may exist and a content layer, which records what does exist.
Validation reduces to relational set operations: constraints are expressed as queries over finite sets of entities and relations, and violations emerge as counterexamples (e.g., anti-joins between expected and actual structures).
Allocation: Realized by CSC Core Runtime (Core Runtime) through CSU Build Engine (Build Engine) and CSU Pipeline Orchestrator (Pipeline Orchestrator), with handlers registered via CSC Pipeline Handlers (Pipeline Handlers). Phase-specific handlers are organized in CSC Analyze Handlers (Analyze Handlers), CSC Initialize Handlers (Initialize Handlers), and CSC-012 (Transform Handlers), with shared utilities in CSC Shared Pipeline Utilities (Shared Pipeline Utilities).
The Pipeline execution orchestration function manages the five-phase document processing lifecycle from initial Pandoc hook entry through final output generation. It encompasses Handler registration, dependency-based execution ordering, context propagation, and phase abort logic.
Entry Point: The Pandoc filter (CSU Build Engine) hooks into the Pandoc Meta callback, extracts project metadata via CSU-009, and invokes the build engine. The engine creates the database, initializes the data manager, loads the model via CSU-008, processes document files (with Build Cache checks), and delegates to the pipeline orchestrator.
Handler Registration: During model loading, CSU-008 registers each
handler with the Pipeline Orchestrator CSU Pipeline Orchestrator, which enforces
the registration contract by validating that every handler declares both
a name and a [TERM-24](@) field before accepting
registration. The orchestrator rejects duplicate handler names —
attempting to register a handler whose name already exists raises an
immediate error. Accepted handlers are stored in a lookup table keyed by
name for O(1) retrieval during phase execution. Each handler implements
phase hooks using the naming convention on_{phase} (e.g., on_initialize, on_analyze, on_transform, on_verify, on_emit). All hooks receive the full
contexts array: on_{phase}(data, contexts, diagnostics).
Topological
Sort: Before executing each phase, the Pipeline
Orchestrator CSU Pipeline
Orchestrator applies Kahn’s algorithm to produce a deterministic
handler execution order. Only handlers that implement an on_{phase} hook for the current phase
participate in the sort; handlers without a relevant hook are skipped
entirely. The algorithm begins by building a dependency graph restricted
to participants and initializing an in-degree count for each node. Nodes
with zero in-degree — handlers whose prerequisites are already satisfied
— seed a processing queue. At each step the algorithm dequeues the first
node, appends it to the sorted output, and decrements the in-degree of
all its dependents; any dependent whose in-degree reaches zero is
enqueued. Alphabetical tie-breaking is applied at every dequeue step so
that handlers at the same dependency depth are always emitted in the
same order, guaranteeing deterministic output across runs. After the
queue is exhausted, if the sorted list length is less than the
participant count a dependency cycle exists and is reported as an
error.
For example, if INITIALIZE has three handlers —
specifications(no prerequisites),spec_objects(prerequisite:specifications), andspec_floats(prerequisite:specifications) — the sort produces[specifications, spec_floats, spec_objects], withspec_floatsandspec_objectsordered alphabetically since both depend only onspecifications.
Phase Execution: The pipeline executes five phases in order:
All phases use the same dispatch model: for each handler in sorted
order, the Pipeline Orchestrator CSU Pipeline Orchestrator calls the
handler’s on_{phase}(data, contexts, diagnostics)
hook once with the full set of contexts. Handlers are responsible for
iterating over contexts internally. Each handler invocation is bracketed
by uv.hrtime() calls at nanosecond
precision to record its duration, and the orchestrator also records the
aggregate duration of each phase, providing two-level performance
visibility (handler-level and phase-level).
Phase Abort: After VERIFY, the Pipeline Orchestrator CSU Pipeline Orchestrator inspects the diagnostics collector for any error-level entries. If errors exist, the pipeline aborts before EMIT — only output generation is skipped, since TRANSFORM has already completed. This design allows proof views to validate transform results before committing to output.
Context Propagation: Each document file produces a context containing the parsed AST, file path, specification ID, and walker state. Contexts are passed through all phases, accumulating state. The diagnostics collector aggregates errors and warnings across all handlers and contexts.
Component Interaction
The pipeline is realized through the core runtime and four handler packages that correspond to pipeline phases.
CSC Core Runtime (Core
Runtime) provides the entry point and orchestration layer. CSU Pandoc Filter Entry
Point (Pandoc Filter Entry Point) hooks into the Pandoc callback to
launch the build. CSU
Configuration Parser (Configuration Parser) reads project.yaml and resolves model, output,
and logging settings. CSU Data
Loader (Data Loader) loads external data files referenced by
specifications. CSU Build
Engine (Build Engine) coordinates the full build lifecycle —
creating the database, loading the model, processing document files with
cache checks, and delegating to CSU Pipeline Orchestrator (Pipeline
Orchestrator) for phase execution. The orchestrator drives topological
sort, handler dispatch, timing, and abort logic across all five
phases.
CSC Pipeline Handlers
(Pipeline Handlers) registers cross-cutting handlers. CSU Include Expansion Filter (Include
Expansion Filter) resolves include
directives during INITIALIZE, expanding referenced files into the
document AST before entity parsing begins.
CSC Initialize Handlers (Initialize Handlers) parses the Pandoc AST into SpecIR entities. CSU Specification Parser (Specification Parser) extracts document-level metadata, CSU Object Parser (Object Parser) identifies typed content blocks, CSU Attribute Parser (Attribute Parser) extracts key-value attributes from object paragraphs, CSU Float Parser (Float Parser) detects embedded figures, tables, and listings, CSU Relation Parser (Relation Parser) captures cross-reference links, and CSU View Parser (View Parser) identifies view directives.
CSC Analyze Handlers (Analyze Handlers) resolves cross-references and infers types. CSU Relation Resolver (Relation Resolver) matches link targets to spec objects using selector-based resolution. CSU Relation Type Inferrer (Relation Type Inferrer) assigns relation type_refs based on source and target object types. CSU Attribute Caster (Attribute Caster) validates and casts attribute values against their declared datatypes.
CSC Shared Pipeline Utilities (Shared Pipeline Utilities) provides reusable base modules consumed by handlers across phases. CSU Spec Object Base (Spec Object Base) and CSU Specification Base (Specification Base) provide shared parsing logic for objects and specifications. CSU Float Base (Float Base) provides float detection and extraction. CSU Attribute Paragraph Utilities (Attribute Paragraph Utilities) parses attribute blocks from definition-list paragraphs. CSU Include Handler (Include Handler) and CSU Include Utilities (Include Utilities) manage file inclusion and path resolution. CSU Render Utilities (Render Utilities) and CSU Math Render Utilities (Math Render Utilities) provide AST-to-output conversion helpers. CSU View Utilities (View Utilities) supports view parsing and materialization. CSU Source Position Compatibility (Source Position Compatibility) normalizes Pandoc source position data across API versions.
Selected SQLite as the persistence engine for the Specification Intermediate Representation.
SQLite provides:
Selected Entity-Attribute-Value storage for dynamic object and float attributes.
The EAV model enables runtime schema extension through model definitions:
Selected a five-phase sequential pipeline (INITIALIZE, ANALYZE, TRANSFORM, VERIFY, EMIT) for document processing.
Five phases separate concerns and enable verification before output:
Selected Kahn’s algorithm with declarative prerequisite arrays for handler execution ordering within each pipeline phase.
Declarative prerequisites with topological sort enable:
prerequisites = {"handler_a", "handler_b"}
rather than manual sequence numbers
Allocation: Realized by CSC Core Runtime (Core Runtime) through CSU Type Loader (Type Loader). The foundational type definitions are provided by CSC Default Model (Default Model), which all domain models extend.
The type model discovery function loads model definitions from the filesystem and registers them with the Pipeline and data manager. It enables domain-specific extensibility by allowing models to define custom specification types, object types, float types, relation types, view types, and pipeline handlers.
Model Path Resolution: The Type Loader (CSU Type Loader) resolves the
model directory by checking SPECCOMPILER_HOME/models/{model} first,
then falling back to the project root. This two-stage resolution allows
global model installation while supporting project-local overrides.
Directory Scanning: The loader scans models/{model}/types/ for category
directories matching the five type categories:
| Category | Directory | Export Field | Database Table |
|---|---|---|---|
| Specifications | specifications/ |
M.specification |
spec_specification_types |
| Objects | objects/ |
M.object |
spec_object_types |
| Floats | floats/ |
M.float |
spec_float_types |
| Relations | relations/ |
M.relation |
spec_relation_types |
| Views | views/ |
M.view |
spec_view_types |
A typical model directory layout for the default model:
models/default/types/
├── specifications/
│ └── srs.lua -- exports M.specification
├── objects/
│ ├── hlr.lua -- exports M.object
│ └── section.lua -- exports M.object
├── floats/
│ ├── figure.lua -- exports M.float
│ └── plantuml.lua -- exports M.float + M.handler
├── relations/
│ └── traces_to.lua -- exports M.relation
└── views/
└── toc.lua -- exports M.view
Module Loading: Each .lua file in a category directory is
loaded via require(). The loader
inspects the module’s export fields to determine the type category and
calls the appropriate registration method on the data manager. If a
module exports M.handler, the
handler is also registered with the pipeline for phase-specific
processing.
Attribute Registration: Object and float types may
declare attributes tables describing
their data schema. The loader registers these with the data manager,
creating datatype definitions and attribute constraints (name, datatype,
min/max occurs, enum values, bounds) in the spec_attribute_defs and spec_datatype_defs tables.
Handler Registration: Type modules may export a
M.handler table with phase hooks.
These handlers extend the pipeline’s processing capabilities with
type-specific logic (e.g., PlantUML float rendering, traceability matrix
view materialization). Handlers declare Prerequisites to control execution ordering
via Topological
Sort.
Base Types: Relation type modules may export M.base instead of M.relation to act as base types that own
resolution logic. Base types are not registered in the database. The
type loader registers their M.resolve function into _selector_resolvers (keyed by M.base.link_selector), which the relation
resolver dispatches to during the ANALYZE phase. Concrete relation types
use M.extend(overrides) to inherit
base properties (e.g., link_selector)
while adding their own constraints. This follows the same delegation
pattern as float type modules (src/pipeline/transform/spec_floats.lua).
Custom Display Text: Relation types that need custom
link display text (e.g., showing title instead of PID) export a standard
M.handler with on_transform using the shared link_rewrite_utils.rewrite_display_for_type()
utility.
Component Interaction
The type discovery function is realized by the core type loader and the default model that provides foundational type definitions.
CSC Core Runtime (Core
Runtime) provides CSU Type
Loader (Type Loader), which drives the entire model discovery
lifecycle — resolving model paths, scanning category directories,
loading modules via require(),
registering types and handlers with the data manager and pipeline, and
propagating inherited attributes and relation properties.
CSC Default Model (Default Model) provides the two foundational types that every domain model inherits. CSU SECTION Object Type (SECTION Object Type) defines the implicit structural type for untitled content sections, enabling document structure representation without requiring explicit type declarations. CSU SPEC Specification Type (SPEC Specification Type) defines the base specification type with version, status, and date attributes that all domain-specific specification types extend.
Selected Lua modules with extends chains for type definitions.
Lua modules as type definitions enable:
extends field enables
single-inheritance (e.g., HLR extends TRACEABLE) with automatic
attribute propagation
require() loading reuses Pandoc’s
built-in Lua module system without additional dependency
Allocation: Realized by CSC Database Persistence (Database Persistence) through CSU Data Manager (Data Manager) and CSU Build Cache (Build Cache). The database schema is defined in CSC DB Schema (DB Schema), queries in CSC DB Queries (DB Queries), and materialized views in CSC DB Views (DB Views).
The Intermediate Representation persistence function manages the SQLite Database database that stores all parsed specification content and provides cache coherency for incremental builds. It encompasses schema management, the EAV Model storage model, build caching, and output caching.
SQLite Persistence: The data manager (CSU Data Manager) wraps all
database operations, providing a query API over the SpecIR schema. The
database file is created in the project’s output directory and persists
across builds. Schema creation is executed inside DataManager.new(), establishing all content
and type tables.
SpecIR Schema: The content schema (CSU Output Cache) defines the core entity tables:
specifications — Document-level
containers with header AST and metadata
spec_objects — Typed content blocks
with AST, file position, and specification scope
spec_floats — Embedded figures,
listings, tables with render state
spec_views — Materialized data views
(TOC, LOF, traceability matrices)
spec_relations — Links between
objects with type inference results
spec_attribute_values — EAV-model
attribute storage for object and float properties
The type schema (CSU Proof View Definitions) defines the metamodel tables that describe valid types, attribute definitions, and datatype constraints.
EAV Attribute Model: Attributes are stored as
individual rows in spec_attribute_values rather than as
columns, enabling dynamic schema extension through model definitions.
Each attribute row references its parent entity (object or float),
attribute definition, and stores the value as text with type casting at
query time.
Build
Cache: The build cache (CSU Build Cache) tracks document content
hashes in the source_files table.
Before parsing a document, the engine computes its SHA1 hash and
compares against the stored value. Unchanged documents skip parsing
entirely and reuse their cached SpecIR state. To support incremental
builds, the build_graph table tracks
include dependencies so that changes to included files also trigger
rebuilds. When the document hash matches, the Build Engine (CSU-005) queries the build_graph table for all include
dependencies recorded during the previous build. It then computes a SHA1
hash for each include file via the Hash Utilities (CSU-065). If an include file is missing, the
hash returns nil, which forces a cache miss and a full document rebuild.
The resulting path-to-hash map is compared against the stored values;
any mismatch invalidates the cache and triggers reparsing of the root
document.
Output Cache: The output cache tracks generated output files and their input hashes. Before generating an output file, the emitter checks whether the input hash matches the stored value. If current, the output generation is skipped. This provides incremental output generation independent of the build cache.
Component Interaction
The storage subsystem is realized through four packages that separate runtime operations from static definitions.
CSC Database Persistence
(Database Persistence) provides the runtime database layer. CSU Database Handler
(Database Handler) wraps raw SQLite operations and connection management
with DELETE journal mode for single-file reliability. CSU Data Manager (Data Manager) builds on the
handler to provide the high-level query API used by all pipeline phases
— inserting spec entities during INITIALIZE, updating references during
ANALYZE, and reading assembled content during EMIT. CSU Build Cache (Build Cache) queries source_files and build_graph to detect changed documents
via SHA1 comparison. CSU
Output Cache (Output Cache) tracks generated output files and their
input hashes to skip redundant generation. CSU Proof View Definitions (Proof View
Definitions) materializes SQL proof views at build time for the VERIFY
phase.
CSC DB Schema (DB Schema) defines the database structure through composable modules. CSU Schema Aggregator (Schema Aggregator) is the entry point, composing: CSU Content Schema (Content Schema) for the core SpecIR tables, CSU Type System Schema (Type System Schema) for attribute and datatype definitions, CSU Build Schema (Build Schema) for source file and dependency tracking, and CSU Search Schema (Search Schema) for FTS5 virtual tables.
CSC DB Queries (DB Queries) mirrors the schema structure with composable query modules. CSU Query Aggregator (Query Aggregator) combines: CSU Content Queries (Content Queries) for spec entity CRUD, CSU Resolution Queries (Resolution Queries) for cross-reference and relation resolution, CSU Build Queries (Build Queries) for cache and dependency lookups, CSU Type Queries (Type Queries) for type definitions and attribute constraints, and CSU Search Queries (Search Queries) for FTS5 population and search.
CSC DB Views (DB Views) provides materialized SQL views over the SpecIR data. CSU Views Aggregator (Views Aggregator) composes: CSU EAV Pivot Views (EAV Pivot Views) for pivoting attribute values into typed columns, CSU Resolution Views (Resolution Views) for joining relations with resolved targets, and CSU Public API Views (Public API Views) for stable query interfaces used by pipeline handlers and external tools.
Selected SHA1 content hashing for incremental build detection.
Content-addressed hashing provides deterministic cache invalidation:
build_graph table ensures changes to
included files trigger rebuilds
pandoc.sha1() when available, falling
back to vendor/sha2.lua in
standalone mode
Selected runtime-generated CREATE VIEW statements per object type to pivot EAV attributes into typed columns.
Dynamic view generation bridges EAV flexibility with query usability:
Allocation: Realized by CSC Transform Handlers (Transform Handlers) and CSC Emit Handlers (Emit Handlers) through CSU External Render Handler (External Render Handler) and CSU Emitter Orchestrator (Emitter Orchestrator). Output infrastructure is provided by CSC Infrastructure (Infrastructure), with format-specific utilities in CSC Format Utilities (Format Utilities), DOCX generation in CSC DOCX Generation (DOCX Generation), I/O operations in CSC I/O Utilities (I/O Utilities), and external process management in CSC Process Management (Process Management).
The multi-format publication function encompasses the TRANSFORM and EMIT pipeline phases, covering content rendering, view materialization, external subprocess rendering, document assembly from the Intermediate Representation database, Float numbering and resolution, and parallel output generation via Pandoc.
TRANSFORM Phase: Prepares content for output through four handler stages:
CSU View Materializer:
Pre-computes view data (table of contents, list of figures/tables,
abbreviation lists) and stores as JSON in spec_views.resolved_data. Dispatches by
view type: toc queries spec_objects
ordered by file_seq, lof/lot queries floats by counter_group, abbrev_list queries view content.
CSU Float Transformer:
Resolves float content that does not require external rendering (e.g.,
CSV parsing, text processing) and updates spec_floats.resolved_ast.
CSU External Render
Handler: Coordinates parallel subprocess rendering for float types
requiring external tools (PlantUML, ECharts, Math). Prepares tasks via
renderer callbacks, checks output cache for hits, and spawns remaining
tasks in parallel via luv spawn_batch(). Results update resolved_ast with output paths.
CSU Object Render
Handler: Invokes type-specific handlers for each spec_object
(ordered by file_seq). Type handlers provide header() and body() functions dispatched through the
base handler wrapper. Rendered AST is merged back to spec_objects.ast.
CSU Specification Render
Handler: Renders document title headers via specification type
handlers, storing result in specifications.header_ast.
EMIT Phase: Assembles and generates output documents in multiple formats, fulfilling multi-format output requirements:
CSU Float Numbering: Assigns sequential numbers to floats by counter_group (e.g., FIGURE, TABLE, LISTING, EQUATION) across all documents. Shared counter groups enable natural numbering where related visual content types form a single sequence.
CSU Document Assembler:
Reconstructs a complete Pandoc AST from the SpecIR database Queries
spec_objects ordered by file_seq,
decodes JSON AST fragments, normalizes header levels, and assembles
floats and views into the document structure. Metadata is built from
specification attributes and the assembler returns a complete pandoc.Pandoc document.
CSU Float Resolver:
Builds a lookup map of rendered float results (ast, number, caption,
type_ref) from spec_floats with
resolved_ast. The resolver queries
the database for all floats belonging to the current specification and
indexes them for efficient lookup during document traversal.
CSU Float Emitter: Walks assembled document blocks and replaces float placeholder CodeBlocks with rendered Div elements containing captions and semantic CSS classes for format-specific styling.
CSU FTS Indexer:
Replaces view placeholder blocks with materialized content from the
TRANSFORM phase. Views are expanded inline as formatted tables, lists,
or custom structures depending on view type. Additionally, populates
FTS5 virtual tables (fts_objects,
fts_attributes, fts_floats) for Full-Text Search in
the web application, converting AST to plain text and indexing
searchable fields with Porter stemming.
CSU Inline Handler
Dispatcher: Processes inline elements during emit - resolves (@) links to #anchor references, processes citations,
and renders inline math expressions.
CSU Emitter
Orchestrator: Format-agnostic orchestration for batch mode
execution: (1) assigns float numbers globally, (2) assembles documents
per specification, (3) resolves and transforms floats, (4) applies
format-specific filters (docx, html), (5) serializes assembled documents
to intermediate JSON via pandoc.write(doc, "json") to temporary
files, (6) checks output_cache for
staleness and skips generation when outputs are current, (7) spawns
parallel Pandoc processes via luv for concurrent format conversion
(docx, html5), and (8) cleans up intermediate JSON files after
generation completes. Format-specific postprocessors run after Pandoc
generation: DOCX postprocessing applies style fixups via OOXML
manipulation, and HTML5 postprocessing bundles assets for web
application deployment.
Infrastructure Support: The Pandoc CLI wrapper (CSU Pandoc CLI Builder)
builds command arguments for multiple output formats (docx, html5,
markdown, json) with reference document, bibliography, and filter
support. The reference generator (CSU Reference Generator) creates reference.docx from style presets for DOCX
output styling.
Component Interaction
The output subsystem spans the TRANSFORM and EMIT pipeline phases, realized through handler packages and infrastructure components.
CSC Transform Handlers
(Transform Handlers) prepares content for output. CSU View Materializer (View Materializer)
pre-computes view data as JSON. CSU Float Transformer (Float Transformer)
resolves float content not requiring external tools. CSU External Render Handler (External Render
Handler) coordinates parallel subprocess rendering via luv. CSU Object Render Handler
(Object Render Handler) invokes type-specific header/body renderers. CSU Specification Render
Handler (Specification Render Handler) renders document title
headers. CSU Relation Link
Rewriter (Relation Link Rewriter) rewrites (@) link targets to resolved anchors in
the final AST.
CSC Emit Handlers (Emit Handlers) assembles and generates output documents. CSU FTS Indexer (FTS Indexer) populates full-text search tables. CSU Float Numbering (Float Numbering) assigns sequential numbers by counter group. CSU Document Assembler (Document Assembler) reconstructs complete Pandoc AST from SpecIR. CSU Float Resolver (Float Resolver) builds the rendered float lookup map. CSU Float Emitter (Float Emitter) replaces float placeholders with rendered Divs. CSU View Emitter (View Emitter) expands view placeholders with materialized content. CSU Inline Handler Dispatcher (Inline Handler Dispatcher) processes inline elements — links, citations, math. CSU Float Handler Dispatcher (Float Handler Dispatcher) routes float rendering to type-specific handlers. CSU View Handler Dispatcher (View Handler Dispatcher) routes view expansion to type-specific materializers. CSU Emitter Orchestrator (Emitter Orchestrator) coordinates the full emit sequence: numbering, assembly, resolution, filtering, and parallel Pandoc output.
CSC Infrastructure (Infrastructure) provides cross-cutting utilities. CSU Hash Utilities (Hash Utilities) computes SHA1 hashes for cache coherency. CSU Logger (Logger) implements NDJSON structured logging. CSU JSON Utilities (JSON Utilities) handles JSON serialization for AST interchange. CSU Reference Cache (Reference Cache) caches resolved cross-references for O(1) lookup during emit. CSU MathML to OMML Converter (MathML to OMML Converter) translates MathML equations to Office Math Markup for DOCX output.
CSC Format Utilities (Format Utilities) provides format-specific helpers. CSU Format Writer (Format Writer) serializes AST to intermediate JSON for Pandoc consumption. CSU XML Utilities (XML Utilities) generates and manipulates XML for OOXML postprocessing. CSU ZIP Utilities (ZIP Utilities) handles DOCX archive creation and modification.
CSC DOCX Generation
(DOCX Generation) produces Word documents. CSU Preset Loader (Preset Loader) reads
style preset definitions from Lua files. CSU Style Builder (Style Builder) generates
OOXML style elements from preset values. CSU OOXML Builder (OOXML Builder) assembles
the final DOCX package with styles, numbering, and content parts. CSU Reference Generator
(Reference Generator) creates reference.docx for Pandoc’s --reference-doc option.
CSC I/O Utilities (I/O Utilities) provides file-system operations. CSU Document Walker (Document Walker) traverses specification documents and their includes to build processing contexts. CSU File Walker (File Walker) scans directories for files matching glob patterns during model discovery.
CSC Process Management (Process Management) manages external subprocesses. CSU Pandoc CLI Builder (Pandoc CLI Builder) constructs Pandoc command-line arguments for each output format. CSU Task Runner (Task Runner) provides the luv-based parallel task executor used by both external float rendering and batch Pandoc output generation.
Selected Newline-Delimited JSON format for structured logging.
NDJSON enables:
Configuration via config.logging.level with env override
SPECCOMPILER_LOG_LEVEL.
Selected luv (libuv) for parallel subprocess execution.
Document processing benefits from parallel execution:
Selected preset system for DOCX style customization.
Corporate documents require consistent styling. The preset system:
Selected Deno as the runtime for TypeScript-based external tools.
Deno provides:
npm: specifiers
Tools are spawned via task_runner.spawn_sync() with timeout
handling.
Selected Pandoc as the document parsing and output generation engine.
Pandoc serves as both input parser and output generator:
pandoc.read() provides a well-defined
AST
--reference-doc support enables
DOCX style customization via generated reference.docx
--lua-filter support enables
format-specific transformations (docx.lua, html.lua)
Allocation: Realized by CSC Default Filters (Default Filters), CSC Default Postprocessors (Default Postprocessors), CSC Default Styles (Default Styles), CSC Default Float Types (Default Float Types), CSC Default Relation Types (Default Relation Types), and CSC Default View Types (Default View Types).
The type system and domain model definition function encompasses the default model components that provide base type definitions, format-specific processing, and style configuration. These components are loaded by the Type Loader (CSU-008) during the model discovery phase described in FD-002 and collectively define the foundational capabilities that all domain models inherit and extend.
Model Directory Structure: Each model provides a
standard directory layout with types/
(objects, floats, relations, views, specifications), filters/, postprocessors/, and styles/ subdirectories. The default model
(models/default/) establishes
baseline definitions for all five type categories.
Type Definitions: CSC Default Float Types defines float types
(FIGURE, TABLE, LISTING, PLANTUML, CHART, MATH) with counter groups for
shared numbering. CSC Default
Relation Types defines cross-reference relation types (XREF_FIGURE,
XREF_TABLE, XREF_LISTING, XREF_MATH, XREF_CITATION) that map # link selectors to target float types. CSC Default View Types
defines view types (TOC, LOF, ABBREV, ABBREV_LIST, GAUSS, MATH_INLINE)
with inline prefix syntax and materializer strategies.
Format Processing: CSC Default Filters provides Pandoc Lua filters for DOCX, HTML, and Markdown output that convert speccompiler-format markers (page breaks, bookmarks, captions, equations) into native output elements. CSC Default Postprocessors applies format-specific post-processing after Pandoc output generation, loading template-specific fixup modules for DOCX and LaTeX. CSC Default Styles defines style presets with page layout, typography, and formatting configuration for DOCX (Letter-sized, standard margins) and HTML (Inter/JetBrains Mono fonts, color palette) output.
Component Interaction
The default model is realized through six packages that define the baseline type system, format processing, and style configuration inherited by all domain models.
CSC Default Float Types (Default Float Types) defines the visual content types. CSU FIGURE Float Type (FIGURE) handles image-based floats. CSU TABLE Float Type (TABLE) handles tabular data with CSV parsing. CSU LISTING Float Type (LISTING) handles code blocks with syntax highlighting. CSU PLANTUML Float Type (PLANTUML) renders UML diagrams via external subprocess. CSU CHART Float Type (CHART) renders ECharts visualizations. CSU MATH Float Type (MATH) renders LaTeX equations via KaTeX. Each type declares a counter group for cross-specification numbering.
CSC Default Relation
Types (Default Relation Types) defines cross-reference relations
that resolve # link selectors to
float targets. CSU XREF_FIGURE
Relation Type (XREF_FIGURE) targets FIGURE floats. CSU XREF_TABLE Relation
Type (XREF_TABLE) targets TABLE floats. CSU XREF_LISTING Relation Type (XREF_LISTING)
targets LISTING floats. CSU
XREF_MATH Relation Type (XREF_MATH) targets MATH floats. CSU XREF_CITATION Relation
Type (XREF_CITATION) resolves bibliography citations via BibTeX
keys.
CSC Default View Types (Default View Types) defines data views with inline prefix syntax. CSU TOC View Type (TOC) generates tables of contents from spec objects. CSU LOF View Type (LOF) generates lists of figures, tables, or listings by counter group. CSU ABBREV View Type (ABBREV) renders inline abbreviation expansions. CSU ABBREV_LIST View Type (ABBREV_LIST) generates abbreviation glossaries. CSU GAUSS View Type (GAUSS) renders Gaussian distribution charts. CSU MATH_INLINE View Type (MATH_INLINE) renders inline LaTeX math expressions.
CSC Default Filters (Default Filters) provides format-specific Pandoc Lua filters applied during EMIT. CSU DOCX Filter (DOCX Filter) converts markers to OOXML-compatible elements — page breaks, bookmarks, and custom styles. CSU HTML Filter (HTML Filter) converts markers to semantic HTML5 elements with CSS classes. CSU Markdown Filter (Markdown Filter) normalizes markers for clean Markdown output.
CSC Default Postprocessors (Default Postprocessors) applies fixups after Pandoc generation. CSU DOCX Postprocessor (DOCX Postprocessor) manipulates the OOXML package — injecting custom styles, fixing table widths, and applying numbering overrides. CSU LaTeX Postprocessor (LaTeX Postprocessor) applies template-specific LaTeX fixups for PDF output.
CSC Default Styles (Default Styles) provides output styling presets. CSU DOCX Style Preset (DOCX Style Preset) defines page layout (Letter, standard margins), heading styles, table formatting, and font selections for Word output. CSU HTML Style Preset (HTML Style Preset) defines the web typography (Inter/JetBrains Mono), color palette, and responsive layout for HTML output.
Selected layered model loading where domain models extend and override the default model by type identifier.
ID-based override enables clean domain specialization:
policy_key
Allocation: Realized by CSC Core Runtime (Core Runtime) and CSC Default Proof Views (Default Proof Views) through CSU Verify Handler (Verify Handler) and CSU-065 (Hash Utilities).
The audit and integrity function ensures deterministic compilation, reproducible builds, and audit trail integrity. It encompasses content-addressed hashing for incremental build detection, structured logging for audit trails, and include dependency tracking for proper cache invalidation.
Include Hash Computation: The engine (CSU Build Engine) queries the
build_graph table for known includes
from the previous build, then computes SHA1 hashes for each include
file. Missing files cause a cache miss (triggering a full rebuild). The
resulting map of path-to-hash is compared against stored values to
detect changes. SHA1 hashing uses Pandoc’s built-in pandoc.sha1() when available, falling back
to vendor/sha2.lua in standalone
worker mode.
Document Change Detection: Each document’s content
is hashed and compared against the source_files table. Unchanged documents
(same content hash and include hashes) skip parsing and reuse cached Intermediate
Representation state, providing significant performance improvement
for large projects.
Structured Logging: NDJSON (Newline-Delimited JSON)
logging provides machine-parseable audit trails with structured data
(level, message, timestamp, context). Log level is configurable via
config.logging.level with
environment override via SPECCOMPILER_LOG_LEVEL. Levels: DEBUG,
INFO, WARN, ERROR.
Build Reproducibility: Given identical source files (by content hash), project configuration, and tool versions, the system produces identical outputs. Content-addressed hashing of documents, includes, and the P-IR state ensures deterministic compilation.
Verification Execution: The Verify Handler CSU Verify Handler executes in batch mode during the VERIFY phase. It iterates over all registered proof views, querying each via the Data Manager CSU Data Manager. For each violation row returned, the handler consults the Validation Policy CSU Validation Policy to determine the configured severity level. Error-level violations are emitted as structured diagnostics via the Diagnostics Collector CSU Diagnostics Collector. Violations at the ignore level are suppressed entirely. Each proof view enforces constraints declared by the type metamodel, ensuring that registered types satisfy their validation rules.
After verification completes, the handler stores the verification result (error and warning counts) in all pipeline contexts. The Pipeline Orchestrator CSU Pipeline Orchestrator checks for errors after VERIFY and aborts before EMIT if any exist.
Proof Views — Entity-Based Taxonomy
Proof views follow the SpecIR 5-tuple: S
(Specification), O (Object), F (Float),
R (Relation), V (View). Each proof is
identified by its policy_key.
Specification Proofs (S)
| Policy Key | View Name | Validates |
|---|---|---|
spec_missing_required |
view_spec_missing_required | Required spec attributes present |
spec_invalid_type |
view_spec_invalid_type | Specification type is valid |
Spec Object Proofs (O)
| Policy Key | View Name | Validates |
|---|---|---|
object_missing_required |
view_object_missing_required | Required object attributes present |
object_cardinality_over |
view_object_cardinality_over | Attribute count <= max_occurs |
object_cast_failures |
view_object_cast_failures | Attribute value casts to declared type |
object_invalid_enum |
view_object_invalid_enum | Enum value exists in enum_values |
object_invalid_date |
view_object_invalid_date | Date format is YYYY-MM-DD |
object_bounds_violation |
view_object_bounds_violation | Numeric values within min/max bounds |
object_duplicate_pid |
view_object_duplicate_pid | PID is globally unique |
Spec Float Proofs (F)
| Policy Key | View Name | Validates |
|---|---|---|
float_orphan |
view_float_orphan | Float has a parent object |
float_duplicate_label |
view_float_duplicate_label | Float labels unique per specification |
float_render_failure |
view_float_render_failure | External render succeeded |
float_invalid_type |
view_float_invalid_type | Float type is registered |
Spec Relation Proofs (R)
| Policy Key | View Name | Validates |
|---|---|---|
relation_unresolved |
view_relation_unresolved | Link target resolves |
relation_dangling |
view_relation_dangling | Target ref points to existing object |
relation_ambiguous |
view_relation_ambiguous | Float reference is unambiguous |
Spec View Proofs (V)
| Policy Key | View Name | Validates |
|---|---|---|
view_materialization_failure |
view_view_materialization_failure | View materialization succeeded |
Component Interaction
The audit subsystem is realized through core runtime components and the default proof view package.
CSC Core Runtime (Core
Runtime) provides the verification infrastructure. CSU Build Engine (Build Engine) drives the
build lifecycle and content-addressed hash computation. CSU Proof Loader (Proof
Loader) discovers and loads proof view modules from model directories,
registering them with the data manager for VERIFY phase execution. CSU Validation Policy
(Validation Policy) maps proof policy_key values to configured severity
levels (error, warn, ignore) from project.yaml. CSU Verify Handler (Verify Handler) iterates
over registered proof views during VERIFY, querying each via CSU Data Manager (Data
Manager) and emitting violations through CSU Diagnostics Collector (Diagnostics
Collector). CSU Pipeline
Orchestrator (Pipeline Orchestrator) inspects diagnostics after
VERIFY and aborts before EMIT if errors exist.
CSC Default Proof Views (Default Proof Views) provides the baseline verification rules organized by the SpecIR 5-tuple. Specification proofs: CSU Spec Missing Required (Spec Missing Required) validates that required specification attributes are present, and CSU Spec Invalid Type (Spec Invalid Type) validates that specification types are registered. Object proofs: CSU Object Missing Required (Object Missing Required) checks required object attributes, CSU Object Cardinality Over (Object Cardinality Over) enforces max_occurs limits, CSU Object Cast Failures (Object Cast Failures) validates attribute type casts, CSU Object Invalid Enum (Object Invalid Enum) checks enum values against allowed sets, CSU Object Invalid Date (Object Invalid Date) validates YYYY-MM-DD date format, and CSU Object Bounds Violation (Object Bounds Violation) checks numeric bounds. Float proofs: CSU Float Orphan (Float Orphan) detects floats without parent objects, CSU Float Duplicate Label (Float Duplicate Label) enforces label uniqueness per specification, CSU Float Render Failure (Float Render Failure) flags failed external renders, and CSU Float Invalid Type (Float Invalid Type) validates float type registration. Relation proofs: CSU Relation Unresolved (Relation Unresolved) detects links whose targets cannot be resolved, CSU Relation Dangling (Relation Dangling) detects resolved references pointing to nonexistent objects, and CSU Relation Ambiguous (Relation Ambiguous) flags ambiguous float references. View proofs: CSU View Materialization Failure (View Materialization Failure) detects failed view computations.
Allocation: Realized by CSC SW Docs Model (SW Docs Model), with domain proof views in CSC SW Docs Proof Views (SW Docs Proof Views), object types in CSC SW Docs Object Types (SW Docs Object Types), relation types in CSC SW Docs Relation Types (SW Docs Relation Types), specification types in CSC SW Docs Specification Types (SW Docs Specification Types), and view types in CSC SW Docs View Types (SW Docs View Types).
The software documentation domain model extends the default model with traceable object types, domain-specific relation semantics, specification document types, and verification views for MIL-STD-498 and DO-178C compliant software documentation workflows.
Object Type Taxonomy: CSC SW Docs Object Types defines the
TRACEABLE abstract base type providing an inherited status enum (Draft,
Review, Approved, Implemented) that all domain objects extend. The
taxonomy includes requirements (HLR, LLR, NFR), design elements (FD, SF,
DD), architectural decomposition (CSC, CSU), verification artifacts (VC,
TR), and reference types (DIC, SYMBOL). Each type declares its own
attributes, PID prefix, and inheritance chain through the extends field.
Relation Types: CSC SW Docs Relation Types defines
domain-specific traceability relations: REALIZES (FD traces to SF via
the traceability source attribute),
BELONGS (HLR membership in SF via belongs_to), TRACES_TO (general-purpose
@ link traceability), XREF_DIC
(dictionary cross-references), and XREF_DECOMPOSITION (references
targeting CSC/CSU elements). These relations enable automated
traceability matrix generation and coverage analysis.
Specification Types: CSC SW Docs Specification Types defines document types for the software documentation lifecycle: SRS (Software Requirements Specification), SDD (Software Design Description), SVC (Software Verification Cases), SUM (Software User Manual), and TRR (Test Results Report). Each type declares version, status, and date attributes.
View Types: CSC SW Docs View Types defines domain-specific views for generating traceability matrices (TRACEABILITY_MATRIX, TEST_EXECUTION_MATRIX, TEST_RESULTS_MATRIX) and coverage summaries (COVERAGE_SUMMARY, REQUIREMENTS_SUMMARY).
Proof Views: CSC SW Docs Proof Views provides domain-specific proof view queries that enforce traceability constraints: VC must trace to HLR, TR must trace to VC, every HLR must be covered by at least one VC, every FD must trace to at least one CSC and CSU, and every CSC and CSU must have at least one FD allocated. CSC SW Docs Model provides the HTML5 postprocessor that generates a single-file documentation web app with embedded CSS, JS, and SQLite-WASM full-text search.
Component Interaction
The software documentation domain model is realized through six packages that extend the default model with traceable types, domain relations, document types, verification views, and proof constraints.
CSC SW Docs Object Types (SW Docs Object Types) defines the domain object taxonomy. CSU TRACEABLE Base Object Type (TRACEABLE Base Object Type) provides the abstract base with an inherited status enum (Draft, Review, Approved, Implemented) that all domain objects extend. Requirements: CSU HLR Object Type (HLR) for high-level requirements, CSU LLR Object Type (LLR) for low-level requirements, and CSU NFR Object Type (NFR) for non-functional requirements. Design elements: CSU FD Object Type (FD) for functional descriptions and CSU SF Object Type (SF) for software functions. Architecture: CSU CSC Object Type (CSC) for computer software components and CSU CSU Object Type (CSU) for computer software units. Verification: CSU VC Object Type (VC) for verification cases and CSU TR Object Type (TR) for test results. Design decisions: CSU DD Object Type (DD) for design decision records. Reference types: CSU DIC Object Type (DIC) for dictionary entries and CSU SYMBOL Object Type (SYMBOL) for symbol definitions.
CSC SW Docs Relation
Types (SW Docs Relation Types) defines domain-specific traceability
relations. CSU REALIZES
Relation Type (REALIZES) traces FD to SF via the traceability source attribute. CSU BELONGS Relation Type
(BELONGS) establishes HLR membership in SF via belongs_to. CSU TRACES_TO Relation Type (TRACES_TO)
provides general-purpose @ link
traceability between any traceable objects. CSU XREF_DIC Relation Type (XREF_DIC)
resolves dictionary cross-references targeting DIC entries.
CSC SW Docs Specification Types (SW Docs Specification Types) defines document types for the software documentation lifecycle. CSU SRS Specification Type (SRS) for Software Requirements Specifications. CSU SDD Specification Type (SDD) for Software Design Descriptions. CSU SVC Specification Type (SVC) for Software Verification Cases. CSU SUM Specification Type (SUM) for Software User Manuals. CSU TRR Specification Type (TRR) for Test Results Reports. Each type declares version, status, and date attributes.
CSC SW Docs View Types (SW Docs View Types) defines domain-specific views for documentation output. CSU Traceability Matrix View (Traceability Matrix) generates requirement-to-design traceability matrices. CSU Test Execution Matrix View (Test Execution Matrix) generates verification case execution status tables. CSU Test Results Matrix View (Test Results Matrix) generates test result summary tables. CSU Coverage Summary View (Coverage Summary) computes requirement coverage statistics. CSU Requirements Summary View (Requirements Summary) generates requirement status dashboards.
CSC SW Docs Proof Views (SW Docs Proof Views) provides domain-specific traceability verification rules. CSU VC Missing HLR Traceability (VC Missing HLR Traceability) ensures every verification case traces to at least one HLR. CSU TR Missing VC Traceability (TR Missing VC Traceability) ensures every test result traces to a verification case. CSU HLR Missing VC Coverage (HLR Missing VC Coverage) ensures every HLR is covered by at least one VC. CSU FD Missing CSC Traceability (FD Missing CSC Traceability) ensures every FD references at least one CSC. CSU FD Missing CSU Traceability (FD Missing CSU Traceability) ensures every FD references at least one CSU. CSU CSC Missing FD Allocation (CSC Missing FD Allocation) ensures every CSC is allocated to at least one FD. CSU CSU Missing FD Allocation (CSU Missing FD Allocation) ensures every CSU is allocated to at least one FD.
CSC SW Docs Model (SW Docs Model) provides the domain postprocessor. CSU HTML5 Postprocessor (HTML5 Postprocessor) generates a single-file documentation web application with embedded CSS, JS, navigation, and SQLite-WASM full-text search from the SpecIR database.
The following LLRs define implementation-level contracts that are directly verified by executable tests.
name field.
prerequisites array.
validation, build_dir, log, output_format, template, reference_doc, docx, project_root, outputs, html5, bibliography, csl) to handlers.
doc
and spec_id for each processed
document context passed to handlers.
doc=nil
and a derived spec_id.
objects, floats, views, relations, specifications) and register discovered
modules.
handler,
model loading shall call pipeline:register_handler(handler) and
propagate registration errors.
attr_order array in its options, the
handler shall render attributes in the specified sequence first; any
remaining attributes not listed in attr_order shall be appended
alphabetically. When attr_order is
absent, all attributes shall render alphabetically.
id; valid schemas shall
receive category defaults and attribute enum values shall be registered.
SPECCOMPILER_HOME/models/{model} before
checking {cwd}/models/{model}.
SPECCOMPILER_HOME or
project-root models/.
models.{requested}.types.views.{view}
first and fallback to models.default.types.views.{view} when
the requested model module is missing.
sankey
series and view output returns data/links, injection shall write to series[1].data and series[1].links, and clear dataset to prevent conflicts.
DataManager.begin_transaction()
followed by facade insert calls and DataManager.rollback() shall leave no
persisted staged rows.
insert_specification, insert_object, insert_float, insert_relation, insert_view, insert_attribute_value, query_all, query_one, execute) shall persist and retrieve
canonical IR rows in content tables.
string_value, int_value, real_value, bool_value, date_value, enum_ref) and skip updates for invalid
casts.
view is provided or when view
output does not match supported shapes (source or data+links for sankey).
This chapter defines decomposition and design allocation using MIL-STD-498 nomenclature:
CSC (CSC (Computer Software Component)) identifies
structural subsystems/layers.
CSU (CSU (Computer Software Unit)) identifies
concrete implementation units.
Note: Software Functions (SF) are defined in the SRS alongside their constituent HLRs. Functional Descriptions (FD) trace to SFs via the REALIZES relation and are documented in the SDD design files.
[PID](@) and [PID](#)) from object ASTs and stores
them in spec_relations. Type inference and link rewriting are delegated
to relation_type_inferrer (ANALYZE) and relation_link_rewriter
(TRANSFORM).
@ and # links in stored spec_object AST JSON,
replacing them with resolved anchor targets using the relation lookup
built from spec_relations.