SpecCompiler Requirements
This document defines the high-level requirements for SpecCompiler , a document processing pipeline for structured specifications.
The document is organized in two parts. The first part defines the SpecIR data model, the six core types that the Pipeline operates on. The second part specifies the functional requirements, grouped into System Features (SF-001 through SF-006), each decomposed into High-Level Requirements.
SpecIR (see sdd: Type + Content (SpecIR)) is the data model that SpecCompiler builds from source Markdown during the INITIALIZE Phase phase. The core task of parsing is to lower Markdown annotations into a set of typed content tables that the Pipeline can analyze, transform, verify, and emit. The entries below define each of these six content tables as a formal tuple specifying the Markdown syntax that produces it.
A Specification is the root document container created from an H1 header. It represents a complete document like an SRS, SDD, or SVC. Each specification has a type, optional PID, and contains attributes and all spec objects within that document.
A Specification is formally defined as a tuple where:
A Spec Object represents a traceable element in a specification document, created from H2-H6 headers. Objects can be requirements (HLR, LLR), non-functional requirements (NFR), or structural sections (SECTION). Each object has a type, PID and can contain attributes and body content.
A Spec Object is formally defined as a tuple where:
A Spec Float represents a floating element like a
figure, table, or diagram. Floats are created from fenced code blocks
with a syntax:label pattern. They
are automatically numbered within their counter group and can be
cross-referenced by their label. Some floats require external rendering
(e.g., PlantUML diagrams).
A Spec Float is formally defined as a tuple where:
{type_prefix}:{user_label}).
An Attribute stores metadata for specifications and spec objects using an Entity-Attribute-Value (EAV) pattern. Attributes are defined in blockquotes following headers and support multiple datatypes including strings, integers, dates, enums, and rich XHTML content (Pandoc AST). Relations are extracted from the AST. Attribute definitions constrain which attributes each object type can have.
A Spec Attribute is a triple where:
A Spec Relation represents a traceability link
between specification elements. Relations are created from Markdown
links where the link target (URL) acts as a selector
that drives type inference. The relation type is not authored
explicitly, it is inferred from the selector, the source context, and
the resolved target. Resolution is type-driven: the extends chain of the inferred type
determines how targets are resolved.
A Spec Relation is a 4-tuple where:
The relation type is inferred
A Spec View represents a dynamic query or generated content block. Views are materialized during the TRANSFORM phase and can generate tables of contents (TOC), lists of figures (LOF), or custom queries, abbreviations, and inline math. Views enable dynamic document assembly based on specification data.
A Spec View is formally defined as a pair where:
With the data model established, the following sections define the functional requirements for SpecCompiler Core. Requirements are organized into System Features (SF), each covering a distinct functional domain. Every SF is decomposed into High-Level Requirements that state what the system shall do.
Five-phase document processing lifecycle with Handler orchestration and Topological Sort ordering.
The pipeline shall execute handlers in a five-phase lifecycle: INITIALIZE, ANALYZE, TRANSFORM, VERIFY, EMIT.
Each phase serves a distinct purpose in document processing:
The pipeline shall support handler registration with declarative Prerequisites for dependency ordering.
Handlers register via register_handler(handler) with required
fields:
name: Unique string identifier for
the handler
prerequisites: Array of handler names
that must execute before this handler
Handlers declare participation in phases via hook methods (on_initialize, on_analyze, on_verify, on_transform, on_emit). Duplicate handler names cause
registration error.
The pipeline shall order handlers within each phase using topological sort with Kahn’s algorithm.
For each phase, the pipeline:
on_{phase} hooks)
The pipeline shall abort execution after VERIFY phase if any errors are recorded.
diagnostics:has_errors(). If true,
execution halts before EMIT phase, with TRANSFORM already completed.
Error message is logged with error count. This prevents generating
invalid output from documents with specification violations.
The pipeline shall use a single batch dispatch model for all phases where handlers receive all contexts at once.
All handlers implement on_{phase}(data, contexts, diagnostics)
hooks that receive the full contexts array. The pipeline orchestrator
calls each handler’s hook once per phase via run_phase(), passing all document
contexts. Handlers are responsible for iterating over contexts
internally.
This enables cross-document optimizations, transaction batching, and parallel processing within any phase.
The pipeline shall create and propagate context objects containing document metadata and configuration through all phases.
The execute(docs) method creates
context objects for each input document with:
doc: Pandoc document AST (via
DocumentWalker)
spec_id: Specification identifier
derived from filename
config: Preset configuration
(styles, captions, validation)
build_dir: Output directory path
output_format: Target format (docx,
html5, etc.)
template: Template name for model
loading
reference_doc: Path to
reference.docx for styling
docx, html5: Format-specific configuration
outputs: Array of {format, path}
for multi-format output
bibliography, csl: Citation configuration
project_root: Root directory for
resolving relative paths
Context flows through all phases, enriched by handlers (e.g., verification results in VERIFY phase).
SQLite Database-based storage with incremental build support and output caching.
The system shall persist all specification data to SQLite database with ACID guarantees.
The system shall store spec object attributes using Entity-Attribute-Value pattern.
The system shall maintain a build cache for document hash tracking.
The system shall cache output generation state with P-IR hash and timestamps.
The system shall generate per-object-type SQL views that pivot the EAV attribute model into typed columns for external BI queries.
view_{type_lower}_objects is dynamically
generated (e.g., view_hlr_objects,
view_vc_objects). Each view flattens
the EAV join into one row per object with typed attribute columns,
enabling queries like SELECT * FROM view_hlr_objects WHERE status = 'approved'.
These views are NOT used by internal pipeline queries — all internal
code queries the raw EAV tables directly because it needs access to
raw_value, datatype, ast, enum_ref, and other columns that the pivot
views abstract away. Internal queries also frequently operate cross-type
or need COUNT/EXISTS checks that the MAX()-based pivot cannot provide.
The system shall support incremental rebuilds via build graph tracking.
Dynamic type system providing typed containers for Specification, Spec Object, Spec Float, Spec View, Spec Relation, and Attribute.
The type system shall provide a specifications container for registering document-level specification records.
The specifications table stores
metadata for each specification document parsed during INITIALIZE Phase phase:
identifier: Unique specification ID
derived from filename (e.g., “srs-main”)
root_path: Source file path for the
specification
long_name: Human-readable title
extracted from L1 header
type_ref: Specification type
(validated against spec_specification_types)
pid: Optional PID from @PID syntax in
L1 header
L1 headers register as specifications. Type validation checks spec_specification_types table. Invalid
types fall back to default or emit warning.
The type system shall provide a spec_objects container for hierarchical specification objects.
The spec_objects table stores
structured specification items extracted from L2+ headers:
identifier: SHA1 hash of source path
+ line + title (content-addressable)
specification_ref: Foreign key to
parent specification
type_ref: Object type (validated
against spec_object_types)
from_file: Source file path
file_seq: Document order sequence
number
pid: Project ID from @PID syntax
(e.g., “REQ-001”)
title_text: Header text without type
prefix or PID
label: Unified label for
cross-referencing (format: {type_lower}:{title_slug})
level: Header level (2-6)
start_line, end_line: Source line range
ast: Serialized Pandoc AST (JSON) for
section content
Type resolution order: explicit TYPE: prefix, implicit alias lookup, default type fallback.
The type system shall provide a spec_floats container for numbered floating content (figures, tables, listings).
The spec_floats table stores
content blocks that receive sequential numbering:
identifier: Short format
“float-{8-char-sha1}” for DOCX compatibility
specification_ref: Foreign key to
parent specification
type_ref: Float type resolved from
aliases (e.g., “csv” -> “TABLE”, “puml” -> “FIGURE”)
from_file: Source file path
file_seq: Document order for
numbering
label: User-provided label for
cross-referencing
number: Sequential number within
counter_group (assigned in TRANSFORM Phase)
caption: Caption text from attributes
raw_content: Original code block text
raw_ast: Serialized Pandoc CodeBlock
(JSON)
parent_object_ref: Foreign key to
containing spec_object
attributes: JSON-serialized
attributes (caption, source, language)
syntax_key: Original class syntax for
backend matching
Counter Group share numbering (e.g., FIGURE, CHART, PLANTUML all increment “FIGURE” counter).
csv:data instead of TABLE:data). Counter groups enable
semantic grouping of related float types under a single numbering
sequence.
The type system shall provide a spec_views container for data-driven view definitions.
The spec_views table stores view
definitions from code blocks and inline syntax:
identifier: SHA1 hash of
specification + sequence + content
specification_ref: Foreign key to
parent specification
view_type_ref: Uppercase view type
(e.g., “SELECT”, “SYMBOL”, “MATH”, “ABBREV”)
from_file: Source file path
file_seq: Document order sequence
number
raw_ast: View definition content (SQL
query, symbol path, expression)
View types with needs_external_render = 1 in spec_view_types are delegated to
specialized renderers. Inline views use type: content syntax (e.g., symbol: Class.method).
The type system shall provide a spec_relations container for tracking links between specification elements.
The spec_relations table stores
inter-element references:
identifier: SHA1 hash of
specification + target + type + parent
specification_ref: Foreign key to
parent specification
source_ref: Foreign key to source
spec_object
target_text: Raw link target from
syntax (e.g., “REQ-001”, “fig:diagram”)
target_ref: Resolved target
identifier (populated in ANALYZE Phase phase)
type_ref: Relation type from spec_relation_types (e.g., “TRACES”,
“XREF_FIGURE”)
from_file: Source file path
Link syntax: [PID](@) for PID
references, [type:label](#) for
float references, [@citation] for
bibliographic citations. Default relation types are determined by is_default and link_selector columns in spec_relation_types.
The type system shall provide a spec_attributes container for structured metadata on specification objects.
The spec_attributes table
stores typed attribute values extracted from blockquote syntax:
identifier: SHA1 hash of
specification + owner + name + value
specification_ref: Foreign key to
parent specification
owner_ref: Foreign key to owning
spec_object
name: Attribute name (field name
without colon)
raw_value: Original string value
string_value, int_value, real_value, bool_value, date_value: Type-specific columns
enum_ref: Foreign key to enum_values for ENUM types
ast: JSON-serialized Pandoc AST for
rich content (XHTML type)
datatype: Resolved datatype from
spec_attribute_types
Attribute syntax: > name: value in blockquotes
following headers. Datatypes include STRING, INTEGER, REAL, BOOLEAN,
DATE, ENUM, XHTML. Multi-line attributes use continuation blocks.
The type system shall provide proof views that detect data integrity violations across all specification containers.
The type system described above is not fixed at compile time. The following section defines how TERM-33 directories extend it with custom object types, float renderers, TERM-35 generators, and style presets.
Proof views are SQL queries registered in the VERIFY Phase phase that check for constraint violations:
The validation policy (configurable in project.yaml) determines severity: error, warn, or ignore.
The system shall load type-specific handlers from model directories.
Type handlers control how specification content is rendered during the TRANSFORM Phase phase. The loading mechanism supports:
models/{model}/types/objects/{type}.lua
models/{model}/types/specifications/{type}.lua
models/{model}/types/floats/{type}.lua
models/{model}/types/views/{type}.lua
Module loading uses require()
with path models.{model}.types.{category}.{type}.
Type names are converted to lowercase for file lookup (e.g., “HLR” ->
“hlr.lua”).
The system shall organize model content in a standardized directory hierarchy.
Each model follows this structure:
models/{model_name}/
types/
objects/ -- Spec object type handlers (HLR, LLR, VC, etc.)
specifications/-- Specification type handlers (SRS, SDD, SVC)
floats/ -- Float type handlers (TABLE, PLANTUML, CHART)
views/ -- View type handlers (ABBREV, SYMBOL, MATH)
relations/ -- Relation type definitions (TRACES_TO, etc.)
data_views/ -- Data view generators for chart data injection
filters/ -- Pandoc filters (docx.lua, html.lua, markdown.lua)
postprocessors/ -- Format-specific postprocessors
styles/ -- Style presets and templates
Model names are referenced via project configuration template field or context model_name. The “default” model provides
base implementations with fallback behavior.
Type handlers shall provide standardized registration interfaces for pipeline integration.
Each handler category defines specific interfaces:
Object Type Handlers export:
M.object: Type schema with id,
long_name, description, attributes
M.handler.on_render_SpecObject(obj, ctx):
Render function returning Pandoc blocks
Specification Type Handlers export:
M.specification: Type schema with id,
long_name, attributes
M.handler.on_render_Specification(ctx, pandoc, data):
Render document title
Float Type Handlers export:
M.float: Type schema with id,
caption_format, counter_group, aliases, needs_external_render
M.transform(raw_content, type_ref, log):
For internal transforms (TABLE, CSV)
external_render.register_renderer(type_ref, callbacks):
For external renders (PLANTUML, CHART)
View Type Handlers export:
M.view: Type schema with id,
inline_prefix, aliases
M.handler.on_render_Code(code, ctx):
Inline code rendering
Type definitions shall declare metadata schema that controls registration and behavior.
Type schemas provide metadata stored in registry tables:
Object Types (spec_object_types):
M.object = {
id = "HLR", -- Unique identifier (uppercase)
long_name = "High-Level Requirement",
description = "A top-level system requirement",
extends = "TRACEABLE", -- Base type for inheritance
header_unnumbered = true, -- Exclude from section numbering
header_style_id = "Heading2", -- Custom-style for headers
body_style_id = "Normal", -- Custom-style for body
attributes = { -- Attribute definitions
{ name = "status", type = "ENUM", values = {...}, min_occurs = 1 },
{ name = "rationale", type = "XHTML" },
{ name = "created", type = "DATE" },
}
}Float Types (spec_float_types):
M.float = {
id = "CHART",
caption_format = "Figure", -- Caption prefix
counter_group = "FIGURE", -- Counter sharing (FIGURE, CHART, PLANTUML)
aliases = { "echarts" }, -- Alternative syntax identifiers
needs_external_render = true, -- Requires external tool
}View Types (spec_view_types):
M.view = {
id = "ABBREV",
inline_prefix = "abbrev", -- Syntax: `abbrev: content`
aliases = { "sigla", "acronym" },
needs_external_render = false,
}The system shall resolve model paths using environment configuration with fallback.
Model path resolution follows this order:
SPECCOMPILER_HOME
environment variable: $SPECCOMPILER_HOME/models/{model}
./models/{model}
For type modules not found in the specified model, the system falls back to the “default” model:
-- Try model-specific path first
local module = require("models." .. model_name .. ".types.floats.table")
-- Fallback to default model
local module = require("models.default.types.floats.table")This enables partial model customization where models only override specific types.
External renderers shall register callbacks for task preparation and result handling.
Float types requiring external tools (PlantUML, ECharts, etc.) register with the external render handler:
external_render.register_renderer("PLANTUML", {
prepare_task = function(float, build_dir, log, data, model_name)
-- Return task descriptor with cmd, args, output_path, context
end,
handle_result = function(task, success, stdout, stderr, data, log)
-- Update resolved_ast in database
end
})The core orchestrates: query items -> prepare tasks -> cache filter -> batch spawn -> dispatch results. This enables parallel execution across all external renders.
The system shall load data view generators from model directories for chart data injection.
Data views are Lua modules that generate data for charts:
-- models/{model}/data_views/{view_name}.lua
local M = {}
function M.generate(params, data)
-- params: user parameters from code block attributes
-- data: DataManager instance for SQL queries
return { source = { {"x", "y"}, {1, 10}, {2, 20} } }
end
return MViews are loaded via data_loader.load_view(view_name, model_name, data, params).
Resolution tries the specified model first, then falls back to
default.
Usage in code blocks:
```chart:gaussian{view="gaussian" sigma=2.0}
{...echarts config...}
The system shall cache loaded type handlers to avoid repeated module loading.
Each handler loader maintains a cache keyed by {model}:{type_ref}:
local type_handlers = {}
local function load_type_handler(type_ref, model_name)
local cache_key = model_name .. ":" .. type_ref
if type_handlers[cache_key] ~= nil then
return type_handlers[cache_key]
end
-- Load module via require()
local ok, module = pcall(require, module_path)
if ok and module then
type_handlers[cache_key] = module.handler
return module.handler
end
type_handlers[cache_key] = false -- Cache negative result
return nil
endCache stores false for failed
lookups to avoid repeated require() calls for non-existent modules.
Assembles transformed content and publishes DOCX/HTML5 outputs with cache-aware emission.
The system shall assemble final documents from database content, reconstructing AST from stored fragments.
The system shall resolve Float references, replacing raw AST with rendered content (SVG, images).
The system shall assign sequential numbers to Floats within Counter Groups across all documents.
The system shall generate outputs in multiple formats (DOCX, HTML5) from a single processed document.
The system shall generate DOCX output via Pandoc with custom reference documents and OOXML post-processing.
The system shall generate HTML5 output via Pandoc with web-specific templates and assets.
Once documents are assembled and published, the system must also guarantee that its builds are reproducible and its processing is auditable. The following section addresses these integrity concerns.
Deterministic compilation, reproducible builds, and audit trail integrity.
The glossary below defines the domain vocabulary used throughout this specification. Each term corresponds to a cross-reference encountered in the requirements above and is defined here with its purpose, scope, and usage context.
The second phase in the pipeline that resolves references and infers types.
Purpose: Resolves cross-references between spec objects and infers missing type information.
Position: Second phase after INITIALIZE, before TRANSFORM.
SHA1 hashes for detecting document changes.
Purpose: Stores content hashes to detect which documents have changed since last build.
Implementation: Compares current file hash against cached hash to skip unchanged files.
Float types sharing a numbering sequence.
Purpose: Groups related float types to share sequential numbering.
Example: FIG and DIAGRAM types may share a counter, producing Figure 1, Figure 2, etc.
A MIL-STD-498 architectural decomposition element representing a subsystem, layer, package, or service.
Purpose: Groups software units into higher-level structural components for design allocation.
Examples: src/core, src/db, src/infra.
A MIL-STD-498 implementation decomposition element representing a source file or code unit.
Purpose: Captures file-level implementation units allocated to functional descriptions.
Examples: src/core/pipeline.lua, src/db/manager.lua.
A Lua module generating data for chart injection.
Purpose: Produces structured data that can be injected into chart floats.
Implementation: Lua scripts that query database and return chart-compatible data structures.
Entity-Attribute-Value pattern for typed attribute storage.
Purpose: Flexible schema for storing typed attributes on spec objects.
Structure: Entity (spec object), Attribute (key name), Value (typed content).
The final phase in the pipeline that assembles and outputs documents.
Purpose: Assembles transformed content and writes final output documents.
Position: Final phase after VERIFY.
A numbered element (table, figure, diagram) with caption and cross-reference. See Spec Float for full definition.
Subprocess-based rendering for types like PLANTUML, CHART.
Purpose: Delegates rendering to external tools via subprocess execution.
Examples: PlantUML JAR for diagrams, chart libraries for data visualization.
A modular component that processes specific content types through pipeline phases.
Purpose: Encapsulates processing logic for a content type across all pipeline phases.
Structure: Implements phase methods (initialize, analyze, verify, transform, emit) for its content type.
The first phase in the pipeline that parses AST and populates IR containers.
Purpose: Parses markdown AST and populates intermediate representation containers.
Position: First phase, entry point for document processing.
A collection of type definitions, handlers, and styles for a domain.
Purpose: Bundles related type definitions, handlers, and styling for specific documentation domains.
Examples: SRS model for software requirements, HRS model for hardware requirements.
Timestamps for incremental output generation.
Purpose: Tracks when outputs were last generated to enable incremental builds.
Implementation: Compares source modification time against cached output timestamp.
A distinct stage in document processing with specific responsibilities.
Purpose: Separates document processing into well-defined sequential stages.
Phases: INITIALIZE, ANALYZE, TRANSFORM, VERIFY, EMIT.
The 5-phase processing system (INITIALIZE -> ANALYZE -> TRANSFORM -> VERIFY -> EMIT).
Purpose: Orchestrates document processing through sequential phases.
Flow: Each phase completes for all handlers before the next phase begins.
Handler dependencies that determine execution order.
Purpose: Declares which handlers must complete before a given handler can execute.
Usage: Handlers declare prerequisites to ensure data dependencies are satisfied.
Kahn’s algorithm for ordering handlers by prerequisites.
Purpose: Determines valid execution order for handlers based on dependencies.
Algorithm: Uses Kahn’s algorithm to produce a topologically sorted handler sequence.
The third phase in the pipeline that materializes views and rewrites content.
Purpose: Materializes database views into content and applies content transformations.
Position: Third phase after ANALYZE, before VERIFY.
Alternative syntax identifier for a type (e.g., “csv” -> “TABLE”).
Purpose: Provides shorthand or alternative names for types.
Example: csv
is an alias for the TABLE type in float definitions.
System that discovers and loads type handlers from model directories.
Purpose: Dynamically discovers and instantiates type handlers from model definitions.
Implementation: Scans model directories for handler definitions and registers them.
Database tables (spec_*_types) storing type definitions.
Purpose: Stores type definitions including attributes, aliases, and validation rules.
Tables: spec_object_types, spec_float_types, spec_attribute_types, etc.
The fourth phase in the pipeline that validates content via proof views.
Purpose: Validates document content using proof views and constraint checking.
Position: Fourth phase after TRANSFORM, before EMIT.
The tree representation of document structure produced by Pandoc.
Purpose: Represents document structure as a hierarchical tree of elements.
Source: Pandoc parses Markdown and produces JSON AST.
Usage: Handlers walk the AST to extract spec objects, floats, and relations.
FTS5 virtual tables enabling search across specification content.
Purpose: Indexes specification text for fast full-text search queries.
Implementation: SQLite FTS5 virtual tables populated during EMIT phase.
Usage: Web application uses FTS for search functionality.
A top-level functional or non-functional requirement that captures what the system must do or satisfy.
Purpose: Defines system-level requirements that guide design and implementation.
Traceability: HLRs trace to verification cases (VC) and are realized by functional descriptions (FD).
The database-backed representation of parsed document content.
Purpose: Stores parsed specification content in queryable form.
Storage: SQLite database with spec_objects, spec_floats, spec_relations tables.
Lifecycle: Populated during INITIALIZE, queried and modified through remaining phases.
A unique identifier assigned to spec objects for
cross-referencing (e.g., @REQ-001).
Purpose: Provides unique, human-readable identifiers for traceability and cross-referencing.
Syntax: Written as @PID in header text (e.g., ## HLR: Requirement Title @REQ-001).
Auto-generation: PIDs can be auto-generated from type prefix and sequence number.
A SQL query that validates data integrity constraints during the VERIFY phase.
Purpose: Defines validation rules as SQL queries that detect specification errors.
Execution: Run during the VERIFY phase; violations are reported as diagnostics.
Examples: Missing required attributes, unresolved relations, cardinality violations.
The embedded database engine storing the IR and build cache.
Purpose: Provides persistent, portable storage for the intermediate representation.
Benefits: Single-file storage, ACID transactions, SQL query capability.
Usage: All pipeline phases read/write to SQLite via the database manager.
A specification object that participates in traceability relationships.
Purpose: Base type for objects that can be linked via traceability relations.
Types: Any spec object type registered in the model (e.g., HLR, LLR, SECTION).
Relations: Model-defined relation types (e.g., XREF_FIGURE, XREF_CITATION) inferred by specificity matching.
A category definition that governs behavior for objects, floats, relations, or views.
Purpose: Defines the schema, validation rules, and rendering behavior for a category of elements.
Categories: Object types (HLR, SECTION), float types (FIGURE, TABLE), relation types (TRACES_TO), view types (TOC, LOF).
Registration: Types are loaded from model directories and stored in the type registry.
A test specification that verifies a requirement or set of requirements.
Purpose: Defines how requirements are verified through test procedures and expected results.
Traceability: VCs trace to HLRs via traceability attribute links.
Naming: VC PIDs follow the pattern VC-{category}-{seq} (e.g., VC-PIPE-001).