SpecCompiler Requirements

1 Scope

This document defines the high-level requirements for SpecCompiler , a document processing pipeline for structured specifications.

The document is organized in two parts. The first part defines the SpecIR data model, the six core types that the Pipeline operates on. The second part specifies the functional requirements, grouped into System Features (SF-001 through SF-006), each decomposed into High-Level Requirements.

2 SpecIR Types

SpecIR (see sdd: Type + Content (SpecIR)) is the data model that SpecCompiler builds from source Markdown during the INITIALIZE Phase phase. The core task of parsing is to lower Markdown annotations into a set of typed content tables that the Pipeline can analyze, transform, verify, and emit. The entries below define each of these six content tables as a formal tuple specifying the Markdown syntax that produces it.

2.1 Specification

A Specification is the root document container created from an H1 header. It represents a complete document like an SRS, SDD, or SVC. Each specification has a type, optional PID, and contains attributes and all spec objects within that document.

2.1.1 Declaration

A Specification is formally defined as a tuple S=(τ,n,pid,A,O) where:

2.1.2 Syntax

Figure Document and Specification productions

2.2 Spec Object

A Spec Object represents a traceable element in a specification document, created from H2-H6 headers. Objects can be requirements (HLR, LLR), non-functional requirements (NFR), or structural sections (SECTION). Each object has a type, PID and can contain attributes and body content.

2.2.1 Declaration

A Spec Object is formally defined as a tuple O=(τ,title,pid,β,A,F,R,V,O) where:

2.2.2 Syntax

Figure Spec Object productions

2.3 Spec Float

A Spec Float represents a floating element like a figure, table, or diagram. Floats are created from fenced code blocks with a syntax:label pattern. They are automatically numbered within their counter group and can be cross-referenced by their label. Some floats require external rendering (e.g., PlantUML diagrams).

2.3.1 Declaration

A Spec Float is formally defined as a tuple F=(τ,label,kv,content) where:

2.3.2 Syntax

Figure Spec Float productions

2.4 Attribute

An Attribute stores metadata for specifications and spec objects using an Entity-Attribute-Value (EAV) pattern. Attributes are defined in blockquotes following headers and support multiple datatypes including strings, integers, dates, enums, and rich XHTML content (Pandoc AST). Relations are extracted from the AST. Attribute definitions constrain which attributes each object type can have.

2.4.1 Declaration

A Spec Attribute is a triple A=(τ,β,R) where:

2.4.2 Syntax

Figure Attribute productions

2.5 Spec Relation

A Spec Relation represents a traceability link between specification elements. Relations are created from Markdown links where the link target (URL) acts as a selector that drives type inference. The relation type is not authored explicitly, it is inferred from the selector, the source context, and the resolved target. Resolution is type-driven: the extends chain of the inferred type determines how targets are resolved.

2.5.1 Declaration

A Spec Relation is a 4-tuple R=(s,t,σ,α) where:

The relation type ρΓ.R is inferred ρ=infer(σ,α,τs,τt)

2.5.2 Syntax

Figure Spec Relation productions

3 Spec View

A Spec View represents a dynamic query or generated content block. Views are materialized during the TRANSFORM phase and can generate tables of contents (TOC), lists of figures (LOF), or custom queries, abbreviations, and inline math. Views enable dynamic document assembly based on specification data.

3.1 Declaration

A Spec View is formally defined as a pair V=(τ,ω) where:

3.2 Syntax

Figure Spec View productions

4 Functional Requirements

With the data model established, the following sections define the functional requirements for SpecCompiler Core. Requirements are organized into System Features (SF), each covering a distinct functional domain. Every SF is decomposed into High-Level Requirements that state what the system shall do.

5 Pipeline Requirements

SF-001: Pipeline Execution

Five-phase document processing lifecycle with Handler orchestration and Topological Sort ordering.

DESCRIPTION:
Groups requirements for the core Pipeline that drives document processing through INITIALIZE Phase , ANALYZE Phase , TRANSFORM Phase , VERIFY Phase , EMIT Phase phases with declarative handler dependencies.
RATIONALE:
A structured processing pipeline enables separation of concerns, validation gates, and deterministic handler ordering.

HLR-PIPE-001: Five-Phase Lifecycle

The pipeline shall execute handlers in a five-phase lifecycle: INITIALIZE, ANALYZE, TRANSFORM, VERIFY, EMIT.

BELONGS TO:
SF-001
DESCRIPTION:

Each phase serves a distinct purpose in document processing:

  1. INITIALIZE: Parse document AST and populate database with specifications, spec_objects, floats, relations, views, and attributes
  2. ANALYZE: Resolve relations between objects (link target resolution, type inference)
  3. TRANSFORM: Pre-compute views, render external content (PlantUML, charts), prepare for output
  4. VERIFY: Run proof views to validate data integrity, type constraints, cardinality rules
  5. EMIT: Assemble final documents and write to output formats (docx, html5, markdown, json)
RATIONALE:
Separation of concerns enables validation between phases, allows early abort on errors, and supports format-agnostic processing until the final output stage.
STATUS:
Approved

HLR-PIPE-002: Handler Registration and Prerequisites

The pipeline shall support handler registration with declarative Prerequisites for dependency ordering.

BELONGS TO:
SF-001
DESCRIPTION:

Handlers register via register_handler(handler) with required fields:

  • name: Unique string identifier for the handler
  • prerequisites: Array of handler names that must execute before this handler

Handlers declare participation in phases via hook methods (on_initialize, on_analyze, on_verify, on_transform, on_emit). Duplicate handler names cause registration error.

RATIONALE:
Declarative prerequisites decouple handler ordering from registration order, enabling modular handler development and preventing implicit ordering dependencies.
STATUS:
Approved

HLR-PIPE-003: Topological Ordering via Kahn’s Algorithm

The pipeline shall order handlers within each phase using topological sort with Kahn’s algorithm.

BELONGS TO:
SF-001
DESCRIPTION:

For each phase, the pipeline:

  1. Identifies handlers participating in the phase (those with on_{phase} hooks)
  2. Builds dependency graph from prerequisites (only for participating handlers)
  3. Executes Kahn’s algorithm to produce execution order
  4. Sorts alphabetically at each level for deterministic output
  5. Detects and reports circular dependencies with error listing remaining nodes
RATIONALE:
Kahn’s algorithm provides O(V+E) complexity, clear cycle detection, and deterministic ordering through alphabetic tie-breaking.
STATUS:
Approved

HLR-PIPE-004: Phase Abort on VERIFY Errors

The pipeline shall abort execution after VERIFY phase if any errors are recorded.

BELONGS TO:
SF-001
DESCRIPTION:
After running VERIFY phase, the pipeline checks diagnostics:has_errors(). If true, execution halts before EMIT phase, with TRANSFORM already completed. Error message is logged with error count. This prevents generating invalid output from documents with specification violations.
RATIONALE:
Early abort on verification failures saves computation and prevents distribution of invalid specification documents. Errors in VERIFY indicate data integrity issues that would produce incorrect outputs.
STATUS:
Approved

HLR-PIPE-005: Batch Dispatch for All Phases

The pipeline shall use a single batch dispatch model for all phases where handlers receive all contexts at once.

BELONGS TO:
SF-001
DESCRIPTION:

All handlers implement on_{phase}(data, contexts, diagnostics) hooks that receive the full contexts array. The pipeline orchestrator calls each handler’s hook once per phase via run_phase(), passing all document contexts. Handlers are responsible for iterating over contexts internally.

This enables cross-document optimizations, transaction batching, and parallel processing within any phase.

RATIONALE:
A uniform dispatch model simplifies the pipeline engine, eliminates the dual-path batch/per-doc dispatch, and allows handlers in any phase to optimize across all documents (e.g., wrapping DB operations in a single transaction, parallel output generation in EMIT).
STATUS:
Approved

HLR-PIPE-006: Context Creation and Propagation

The pipeline shall create and propagate context objects containing document metadata and configuration through all phases.

BELONGS TO:
SF-001
DESCRIPTION:

The execute(docs) method creates context objects for each input document with:

  • doc: Pandoc document AST (via DocumentWalker)
  • spec_id: Specification identifier derived from filename
  • config: Preset configuration (styles, captions, validation)
  • build_dir: Output directory path
  • output_format: Target format (docx, html5, etc.)
  • template: Template name for model loading
  • reference_doc: Path to reference.docx for styling
  • docx, html5: Format-specific configuration
  • outputs: Array of {format, path} for multi-format output
  • bibliography, csl: Citation configuration
  • project_root: Root directory for resolving relative paths

Context flows through all phases, enriched by handlers (e.g., verification results in VERIFY phase).

RATIONALE:
Unified context object provides handlers with consistent access to document metadata and build configuration without global state, enabling testable and isolated handler implementations.
STATUS:
Approved

6 Storage Requirements

SF-002: Specification Persistence

SQLite Database-based storage with incremental build support and output caching.

DESCRIPTION:
Groups requirements for the persistence layer including ACID-compliant storage, EAV Model attribute model, Build Cache , Output Cache , and incremental rebuild support.
RATIONALE:
Reliable persistence with change detection enables efficient rebuilds for large specification projects.

HLR-STOR-001: SQLite Persistence

The system shall persist all specification data to SQLite database with ACID guarantees.

BELONGS TO:
SF-002
DESCRIPTION:
All specifications, spec_objects, floats, relations, views, and attribute values stored in SQLite database. Database operations wrapped in transactions to ensure atomicity, consistency, isolation, and durability.
RATIONALE:
SQLite provides a reliable, single-file persistence layer suitable for specification documents. ACID guarantees prevent data corruption during concurrent access or system failures.
STATUS:
Approved

HLR-STOR-002: EAV Attribute Model

The system shall store spec object attributes using Entity-Attribute-Value pattern.

BELONGS TO:
SF-002
DESCRIPTION:
Attribute values stored in attribute_values table with polymorphic typed columns (string_value, int_value, real_value, bool_value, date_value, enum_ref). Each attribute record links to owner object via owner_ref and stores datatype for proper retrieval.
RATIONALE:
EAV pattern enables flexible attribute schemas without database migrations. Different spec object types (HLR, LLR, VC) have different attributes that can evolve independently.
STATUS:
Approved

HLR-STOR-003: Build Cache

The system shall maintain a build cache for document hash tracking.

BELONGS TO:
SF-002
DESCRIPTION:
Source file hashes stored in source_files table (path, sha1). Build cache module provides is_document_dirty() to check if document content has changed since last build. Hash comparison enables change detection.
RATIONALE:
Hash-based change detection allows the build system to skip unchanged documents, reducing rebuild times for large specification sets.
STATUS:
Approved

HLR-STOR-004: Output Cache

The system shall cache output generation state with P-IR hash and timestamps.

BELONGS TO:
SF-002
DESCRIPTION:
Output cache stored in output_cache table (spec_id, output_path, pir_hash, generated_at). P-IR (Processed Intermediate Representation) hash captures complete specification state. is_output_current() checks if output file exists and P-IR hash matches cached value.
RATIONALE:
Output caching avoids regenerating unchanged outputs (docx, html5). P-IR hash ensures output is regenerated when any upstream data changes, not just source file changes.
STATUS:
Approved

HLR-STOR-006: EAV Pivot Views for External Queries

The system shall generate per-object-type SQL views that pivot the EAV attribute model into typed columns for external BI queries.

BELONGS TO:
SF-002
DESCRIPTION:
For each non-composite spec_object_type, a view named view_{type_lower}_objects is dynamically generated (e.g., view_hlr_objects, view_vc_objects). Each view flattens the EAV join into one row per object with typed attribute columns, enabling queries like SELECT * FROM view_hlr_objects WHERE status = 'approved'. These views are NOT used by internal pipeline queries — all internal code queries the raw EAV tables directly because it needs access to raw_value, datatype, ast, enum_ref, and other columns that the pivot views abstract away. Internal queries also frequently operate cross-type or need COUNT/EXISTS checks that the MAX()-based pivot cannot provide.
RATIONALE:
External BI tools, ad-hoc SQL queries, and custom model scripts benefit from a flat relational interface over the EAV model. Generating views at runtime from the type system ensures the columns always match the current model configuration without manual maintenance.
STATUS:
Approved

HLR-STOR-005: Incremental Rebuild Support

The system shall support incremental rebuilds via build graph tracking.

BELONGS TO:
SF-002
DESCRIPTION:
Build graph stored in build_graph table (root_path, node_path, node_sha1). Tracks include file dependencies for each root document. is_document_dirty_with_includes() checks root document and all includes. update_build_graph() refreshes dependency tree after successful build.
RATIONALE:
Specification documents often include sub-files. Incremental builds must detect changes in any included file to trigger rebuild of parent document. Build graph captures this dependency structure.
STATUS:
Approved

7 Types Domain Requirements

SF-003: Type System

Dynamic type system providing typed containers for Specification, Spec Object, Spec Float, Spec View, Spec Relation, and Attribute.

DESCRIPTION:
Groups requirements for the six core containers that store parsed specification data plus the proof-view validation framework.
RATIONALE:
A typed container model enables schema validation, type-specific rendering, and data integrity checking through SQL proof views.

HLR-TYPE-001: Specifications Container

The type system shall provide a specifications container for registering document-level specification records.

BELONGS TO:
SF-003
DESCRIPTION:

The specifications table stores metadata for each specification document parsed during INITIALIZE Phase phase:

  • identifier: Unique specification ID derived from filename (e.g., “srs-main”)
  • root_path: Source file path for the specification
  • long_name: Human-readable title extracted from L1 header
  • type_ref: Specification type (validated against spec_specification_types)
  • pid: Optional PID from @PID syntax in L1 header

L1 headers register as specifications. Type validation checks spec_specification_types table. Invalid types fall back to default or emit warning.

RATIONALE:
Specifications represent the top-level organizational unit for document hierarchies. Storing specification metadata enables cross-document linking and multi-document project support.
STATUS:
Approved

HLR-TYPE-002: Spec Objects Container

The type system shall provide a spec_objects container for hierarchical specification objects.

BELONGS TO:
SF-003
DESCRIPTION:

The spec_objects table stores structured specification items extracted from L2+ headers:

  • identifier: SHA1 hash of source path + line + title (content-addressable)
  • specification_ref: Foreign key to parent specification
  • type_ref: Object type (validated against spec_object_types)
  • from_file: Source file path
  • file_seq: Document order sequence number
  • pid: Project ID from @PID syntax (e.g., “REQ-001”)
  • title_text: Header text without type prefix or PID
  • label: Unified label for cross-referencing (format: {type_lower}:{title_slug})
  • level: Header level (2-6)
  • start_line, end_line: Source line range
  • ast: Serialized Pandoc AST (JSON) for section content

Type resolution order: explicit TYPE: prefix, implicit alias lookup, default type fallback.

RATIONALE:
Content-addressable identifiers enable change detection for incremental builds. PID-based anchors provide stable cross-references independent of title changes.
STATUS:
Approved

HLR-TYPE-003: Spec Floats Container

The type system shall provide a spec_floats container for numbered floating content (figures, tables, listings).

BELONGS TO:
SF-003
DESCRIPTION:

The spec_floats table stores content blocks that receive sequential numbering:

  • identifier: Short format “float-{8-char-sha1}” for DOCX compatibility
  • specification_ref: Foreign key to parent specification
  • type_ref: Float type resolved from aliases (e.g., “csv” -> “TABLE”, “puml” -> “FIGURE”)
  • from_file: Source file path
  • file_seq: Document order for numbering
  • label: User-provided label for cross-referencing
  • number: Sequential number within counter_group (assigned in TRANSFORM Phase)
  • caption: Caption text from attributes
  • raw_content: Original code block text
  • raw_ast: Serialized Pandoc CodeBlock (JSON)
  • parent_object_ref: Foreign key to containing spec_object
  • attributes: JSON-serialized attributes (caption, source, language)
  • syntax_key: Original class syntax for backend matching

Counter Group share numbering (e.g., FIGURE, CHART, PLANTUML all increment “FIGURE” counter).

RATIONALE:
Type aliasing supports user-friendly syntax (e.g., csv:data instead of TABLE:data). Counter groups enable semantic grouping of related float types under a single numbering sequence.
STATUS:
Approved

HLR-TYPE-004: Spec Views Container

The type system shall provide a spec_views container for data-driven view definitions.

BELONGS TO:
SF-003
DESCRIPTION:

The spec_views table stores view definitions from code blocks and inline syntax:

  • identifier: SHA1 hash of specification + sequence + content
  • specification_ref: Foreign key to parent specification
  • view_type_ref: Uppercase view type (e.g., “SELECT”, “SYMBOL”, “MATH”, “ABBREV”)
  • from_file: Source file path
  • file_seq: Document order sequence number
  • raw_ast: View definition content (SQL query, symbol path, expression)

View types with needs_external_render = 1 in spec_view_types are delegated to specialized renderers. Inline views use type: content syntax (e.g., symbol: Class.method).

RATIONALE:
Separating view definitions from rendering enables format-agnostic processing. External render delegation supports complex transformations (PlantUML, charts) without core handler changes.
STATUS:
Approved

HLR-TYPE-005: Spec Relations Container

The type system shall provide a spec_relations container for tracking links between specification elements.

BELONGS TO:
SF-003
DESCRIPTION:

The spec_relations table stores inter-element references:

  • identifier: SHA1 hash of specification + target + type + parent
  • specification_ref: Foreign key to parent specification
  • source_ref: Foreign key to source spec_object
  • target_text: Raw link target from syntax (e.g., “REQ-001”, “fig:diagram”)
  • target_ref: Resolved target identifier (populated in ANALYZE Phase phase)
  • type_ref: Relation type from spec_relation_types (e.g., “TRACES”, “XREF_FIGURE”)
  • from_file: Source file path

Link syntax: [PID](@) for PID references, [type:label](#) for float references, [@citation] for bibliographic citations. Default relation types are determined by is_default and link_selector columns in spec_relation_types.

RATIONALE:
Deferred resolution (target_text -> target_ref) enables forward references and cross-document linking. Type inference from source/target context reduces explicit markup requirements.
STATUS:
Approved

HLR-TYPE-006: Spec Attributes Container

The type system shall provide a spec_attributes container for structured metadata on specification objects.

BELONGS TO:
SF-003
DESCRIPTION:

The spec_attributes table stores typed attribute values extracted from blockquote syntax:

  • identifier: SHA1 hash of specification + owner + name + value
  • specification_ref: Foreign key to parent specification
  • owner_ref: Foreign key to owning spec_object
  • name: Attribute name (field name without colon)
  • raw_value: Original string value
  • string_value, int_value, real_value, bool_value, date_value: Type-specific columns
  • enum_ref: Foreign key to enum_values for ENUM types
  • ast: JSON-serialized Pandoc AST for rich content (XHTML type)
  • datatype: Resolved datatype from spec_attribute_types

Attribute syntax: > name: value in blockquotes following headers. Datatypes include STRING, INTEGER, REAL, BOOLEAN, DATE, ENUM, XHTML. Multi-line attributes use continuation blocks.

RATIONALE:
Multi-column typed storage enables SQL queries with type-appropriate comparisons. Storing original AST preserves formatting for XHTML attributes with links, emphasis, or lists.
STATUS:
Approved

HLR-TYPE-007: Type Validation

The type system shall provide proof views that detect data integrity violations across all specification containers.

The type system described above is not fixed at compile time. The following section defines how TERM-33 directories extend it with custom object types, float renderers, TERM-35 generators, and style presets.

BELONGS TO:
SF-003
DESCRIPTION:

Proof views are SQL queries registered in the VERIFY Phase phase that check for constraint violations:

  • Specification-level (missing required attributes, invalid types)
  • Object-level (missing required, cardinality, cast failures, invalid enum/date, bounds)
  • Float-level (orphans, duplicate labels, render failures, invalid types)
  • Relation-level (unresolved, dangling, ambiguous)
  • View-level (materialization failures, query errors)

The validation policy (configurable in project.yaml) determines severity: error, warn, or ignore.

RATIONALE:
Automated validation enables early detection of specification errors before document generation. Configurable severity allows projects to gradually enforce stricter quality standards.
STATUS:
Approved

8 Extension Requirements

SF-005: Extension Framework

Model-based extensibility for type Handler, renderers, and Data View.

DESCRIPTION:
Groups requirements for the extension mechanism that enables custom models to provide type handlers, External Renderer , data view generators, and style presets.
RATIONALE:
Extensibility through model directories enables domain-specific customization without modifying the core pipeline.

HLR-EXT-001: Model-Specific Type Handler Loading

The system shall load type-specific handlers from model directories.

BELONGS TO:
SF-005
DESCRIPTION:

Type handlers control how specification content is rendered during the TRANSFORM Phase phase. The loading mechanism supports:

  • Object types: Loaded from models/{model}/types/objects/{type}.lua
  • Specification types: Loaded from models/{model}/types/specifications/{type}.lua
  • Float types: Loaded from models/{model}/types/floats/{type}.lua
  • View types: Loaded from models/{model}/types/views/{type}.lua

Module loading uses require() with path models.{model}.types.{category}.{type}. Type names are converted to lowercase for file lookup (e.g., “HLR” -> “hlr.lua”).

RATIONALE:
Separating type handlers into model directories enables domain-specific customization. Organizations can define their own requirement types, document types, and rendering behavior without modifying core code.
STATUS:
Approved

HLR-EXT-002: Model Directory Structure

The system shall organize model content in a standardized directory hierarchy.

BELONGS TO:
SF-005
DESCRIPTION:

Each model follows this structure:

models/{model_name}/
  types/
    objects/       -- Spec object type handlers (HLR, LLR, VC, etc.)
    specifications/-- Specification type handlers (SRS, SDD, SVC)
    floats/        -- Float type handlers (TABLE, PLANTUML, CHART)
    views/         -- View type handlers (ABBREV, SYMBOL, MATH)
    relations/     -- Relation type definitions (TRACES_TO, etc.)
  data_views/      -- Data view generators for chart data injection
  filters/         -- Pandoc filters (docx.lua, html.lua, markdown.lua)
  postprocessors/  -- Format-specific postprocessors
  styles/          -- Style presets and templates

Model names are referenced via project configuration template field or context model_name. The “default” model provides base implementations with fallback behavior.

RATIONALE:
Standardized structure enables consistent discovery of type modules across models and provides clear extension points for each content category.
STATUS:
Approved

HLR-EXT-003: Handler Registration Interface

Type handlers shall provide standardized registration interfaces for pipeline integration.

BELONGS TO:
SF-005
DESCRIPTION:

Each handler category defines specific interfaces:

Object Type Handlers export:

  • M.object: Type schema with id, long_name, description, attributes
  • M.handler.on_render_SpecObject(obj, ctx): Render function returning Pandoc blocks

Specification Type Handlers export:

  • M.specification: Type schema with id, long_name, attributes
  • M.handler.on_render_Specification(ctx, pandoc, data): Render document title

Float Type Handlers export:

  • M.float: Type schema with id, caption_format, counter_group, aliases, needs_external_render
  • M.transform(raw_content, type_ref, log): For internal transforms (TABLE, CSV)
  • external_render.register_renderer(type_ref, callbacks): For external renders (PLANTUML, CHART)

View Type Handlers export:

  • M.view: Type schema with id, inline_prefix, aliases
  • M.handler.on_render_Code(code, ctx): Inline code rendering
RATIONALE:
Consistent interfaces enable the core pipeline to discover and invoke handlers without knowledge of specific type implementations. This separation maintains extensibility.
STATUS:
Approved

HLR-EXT-004: Type Definition Schema

Type definitions shall declare metadata schema that controls registration and behavior.

BELONGS TO:
SF-005
DESCRIPTION:

Type schemas provide metadata stored in registry tables:

Object Types (spec_object_types):

M.object = {
    id = "HLR",                    -- Unique identifier (uppercase)
    long_name = "High-Level Requirement",
    description = "A top-level system requirement",
    extends = "TRACEABLE",         -- Base type for inheritance
    header_unnumbered = true,      -- Exclude from section numbering
    header_style_id = "Heading2",  -- Custom-style for headers
    body_style_id = "Normal",      -- Custom-style for body
    attributes = {                 -- Attribute definitions
        { name = "status", type = "ENUM", values = {...}, min_occurs = 1 },
        { name = "rationale", type = "XHTML" },
        { name = "created", type = "DATE" },
    }
}

Float Types (spec_float_types):

M.float = {
    id = "CHART",
    caption_format = "Figure",     -- Caption prefix
    counter_group = "FIGURE",      -- Counter sharing (FIGURE, CHART, PLANTUML)
    aliases = { "echarts" },       -- Alternative syntax identifiers
    needs_external_render = true,  -- Requires external tool
}

View Types (spec_view_types):

M.view = {
    id = "ABBREV",
    inline_prefix = "abbrev",      -- Syntax: `abbrev: content`
    aliases = { "sigla", "acronym" },
    needs_external_render = false,
}
RATIONALE:
Declarative schemas enable automatic registration into database tables during initialization, provide validation rules for content, and configure rendering behavior without procedural code.
STATUS:
Approved

HLR-EXT-005: Model Path Resolution

The system shall resolve model paths using environment configuration with fallback.

BELONGS TO:
SF-005
DESCRIPTION:

Model path resolution follows this order:

  1. Check SPECCOMPILER_HOME environment variable: $SPECCOMPILER_HOME/models/{model}
  2. Fall back to current working directory: ./models/{model}

For type modules not found in the specified model, the system falls back to the “default” model:

-- Try model-specific path first
local module = require("models." .. model_name .. ".types.floats.table")
-- Fallback to default model
local module = require("models.default.types.floats.table")

This enables partial model customization where models only override specific types.

RATIONALE:
Environment-based configuration supports deployment flexibility. Fallback to default model reduces duplication by allowing models to inherit base implementations.
STATUS:
Approved

HLR-EXT-006: External Renderer Registration

External renderers shall register callbacks for task preparation and result handling.

BELONGS TO:
SF-005
DESCRIPTION:

Float types requiring external tools (PlantUML, ECharts, etc.) register with the external render handler:

external_render.register_renderer("PLANTUML", {
    prepare_task = function(float, build_dir, log, data, model_name)
        -- Return task descriptor with cmd, args, output_path, context
    end,
    handle_result = function(task, success, stdout, stderr, data, log)
        -- Update resolved_ast in database
    end
})

The core orchestrates: query items -> prepare tasks -> cache filter -> batch spawn -> dispatch results. This enables parallel execution across all external renders.

RATIONALE:
Registration pattern decouples type-specific rendering logic from core orchestration. Callbacks enable types to control task preparation and result interpretation while core handles parallelization and caching.
STATUS:
Approved

HLR-EXT-007: Data View Generator Loading

The system shall load data view generators from model directories for chart data injection.

BELONGS TO:
SF-005
DESCRIPTION:

Data views are Lua modules that generate data for charts:

-- models/{model}/data_views/{view_name}.lua
local M = {}

function M.generate(params, data)
    -- params: user parameters from code block attributes
    -- data: DataManager instance for SQL queries
    return { source = { {"x", "y"}, {1, 10}, {2, 20} } }
end

return M

Views are loaded via data_loader.load_view(view_name, model_name, data, params). Resolution tries the specified model first, then falls back to default.

Usage in code blocks:

```chart:gaussian{view="gaussian" sigma=2.0}
{...echarts config...}
RATIONALE:
Data views separate data generation from chart configuration. This enables reusable data sources and database-driven visualizations without embedding SQL in markdown.
STATUS:
Approved

HLR-EXT-008: Handler Caching

The system shall cache loaded type handlers to avoid repeated module loading.

BELONGS TO:
SF-005
DESCRIPTION:

Each handler loader maintains a cache keyed by {model}:{type_ref}:

local type_handlers = {}

local function load_type_handler(type_ref, model_name)
    local cache_key = model_name .. ":" .. type_ref
    if type_handlers[cache_key] ~= nil then
        return type_handlers[cache_key]
    end

    -- Load module via require()
    local ok, module = pcall(require, module_path)
    if ok and module then
        type_handlers[cache_key] = module.handler
        return module.handler
    end

    type_handlers[cache_key] = false  -- Cache negative result
    return nil
end

Cache stores false for failed lookups to avoid repeated require() calls for non-existent modules.

RATIONALE:
Caching improves performance for documents with many objects of the same type. Negative caching prevents repeated filesystem access for types without custom handlers.
STATUS:
Approved

9 Output Requirements

SF-004: Multi-Format Publication

Assembles transformed content and publishes DOCX/HTML5 outputs with cache-aware emission.

DESCRIPTION:
Single-source, multi-target publication. Groups requirements for document assembly, float resolution/numbering, and format-specific output generation.
RATIONALE:
Technical documentation must be publishable in multiple formats from a single Markdown source.

HLR-OUT-001: Document Assembly

The system shall assemble final documents from database content, reconstructing AST from stored fragments.

BELONGS TO:
SF-004
RATIONALE:
Decouples parsing from rendering for flexibility

HLR-OUT-002: Float Resolution

The system shall resolve Float references, replacing raw AST with rendered content (SVG, images).

BELONGS TO:
SF-004
RATIONALE:
Integrates external rendering results into final output

HLR-OUT-003: Float Numbering

The system shall assign sequential numbers to Floats within Counter Groups across all documents.

BELONGS TO:
SF-004
RATIONALE:
Ensures consistent cross-document numbering (Figure 1, 2, 3…)

HLR-OUT-004: Multi-Format Output

The system shall generate outputs in multiple formats (DOCX, HTML5) from a single processed document.

BELONGS TO:
SF-004
RATIONALE:
Single-source publishing to multiple targets

HLR-OUT-005: DOCX Generation

The system shall generate DOCX output via Pandoc with custom reference documents and OOXML post-processing.

BELONGS TO:
SF-004
RATIONALE:
Produces Word documents with proper styling

HLR-OUT-006: HTML5 Generation

The system shall generate HTML5 output via Pandoc with web-specific templates and assets.

Once documents are assembled and published, the system must also guarantee that its builds are reproducible and its processing is auditable. The following section addresses these integrity concerns.

BELONGS TO:
SF-004
RATIONALE:
Produces web-ready documentation

10 Audit & Integrity Requirements

SF-006: Audit and Integrity

Deterministic compilation, reproducible builds, and audit trail integrity.

The glossary below defines the domain vocabulary used throughout this specification. Each term corresponds to a cross-reference encountered in the requirements above and is defined here with its purpose, scope, and usage context.

DESCRIPTION:
Encompasses content-addressed hashing for incremental build detection, structured NDJSON logging for audit trails, and include dependency tracking for proper cache invalidation.
RATIONALE:
Certification environments require reproducible builds and auditable processing trails for traceability evidence.

11 System Concepts

TERM-20: ANALYZE Phase

The second phase in the pipeline that resolves references and infers types.

DESCRIPTION:

Purpose: Resolves cross-references between spec objects and infers missing type information.

Position: Second phase after INITIALIZE, before TRANSFORM.

TERM-30: Build Cache

SHA1 hashes for detecting document changes.

DESCRIPTION:

Purpose: Stores content hashes to detect which documents have changed since last build.

Implementation: Compares current file hash against cached hash to skip unchanged files.

TERM-28: Counter Group

Float types sharing a numbering sequence.

DESCRIPTION:

Purpose: Groups related float types to share sequential numbering.

Example: FIG and DIAGRAM types may share a counter, producing Figure 1, Figure 2, etc.

TERM-36: CSC (Computer Software Component)

A MIL-STD-498 architectural decomposition element representing a subsystem, layer, package, or service.

DESCRIPTION:

Purpose: Groups software units into higher-level structural components for design allocation.

Examples: src/core, src/db, src/infra.

TERM-37: CSU (Computer Software Unit)

A MIL-STD-498 implementation decomposition element representing a source file or code unit.

DESCRIPTION:

Purpose: Captures file-level implementation units allocated to functional descriptions.

Examples: src/core/pipeline.lua, src/db/manager.lua.

TERM-35: Data View

A Lua module generating data for chart injection.

DESCRIPTION:

Purpose: Produces structured data that can be injected into chart floats.

Implementation: Lua scripts that query database and return chart-compatible data structures.

TERM-EAV: EAV Model

Entity-Attribute-Value pattern for typed attribute storage.

DESCRIPTION:

Purpose: Flexible schema for storing typed attributes on spec objects.

Structure: Entity (spec object), Attribute (key name), Value (typed content).

TERM-23: EMIT Phase

The final phase in the pipeline that assembles and outputs documents.

DESCRIPTION:

Purpose: Assembles transformed content and writes final output documents.

Position: Final phase after VERIFY.

TERM-04: Float

A numbered element (table, figure, diagram) with caption and cross-reference. See Spec Float for full definition.

TERM-34: External Renderer

Subprocess-based rendering for types like PLANTUML, CHART.

DESCRIPTION:

Purpose: Delegates rendering to external tools via subprocess execution.

Examples: PlantUML JAR for diagrams, chart libraries for data visualization.

TERM-16: Handler

A modular component that processes specific content types through pipeline phases.

DESCRIPTION:

Purpose: Encapsulates processing logic for a content type across all pipeline phases.

Structure: Implements phase methods (initialize, analyze, verify, transform, emit) for its content type.

TERM-19: INITIALIZE Phase

The first phase in the pipeline that parses AST and populates IR containers.

DESCRIPTION:

Purpose: Parses markdown AST and populates intermediate representation containers.

Position: First phase, entry point for document processing.

TERM-33: Model

A collection of type definitions, handlers, and styles for a domain.

DESCRIPTION:

Purpose: Bundles related type definitions, handlers, and styling for specific documentation domains.

Examples: SRS model for software requirements, HRS model for hardware requirements.

TERM-31: Output Cache

Timestamps for incremental output generation.

DESCRIPTION:

Purpose: Tracks when outputs were last generated to enable incremental builds.

Implementation: Compares source modification time against cached output timestamp.

TERM-17: Phase

A distinct stage in document processing with specific responsibilities.

DESCRIPTION:

Purpose: Separates document processing into well-defined sequential stages.

Phases: INITIALIZE, ANALYZE, TRANSFORM, VERIFY, EMIT.

TERM-15: Pipeline

The 5-phase processing system (INITIALIZE -> ANALYZE -> TRANSFORM -> VERIFY -> EMIT).

DESCRIPTION:

Purpose: Orchestrates document processing through sequential phases.

Flow: Each phase completes for all handlers before the next phase begins.

TERM-24: Prerequisites

Handler dependencies that determine execution order.

DESCRIPTION:

Purpose: Declares which handlers must complete before a given handler can execute.

Usage: Handlers declare prerequisites to ensure data dependencies are satisfied.

TERM-25: Topological Sort

Kahn’s algorithm for ordering handlers by prerequisites.

DESCRIPTION:

Purpose: Determines valid execution order for handlers based on dependencies.

Algorithm: Uses Kahn’s algorithm to produce a topologically sorted handler sequence.

TERM-22: TRANSFORM Phase

The third phase in the pipeline that materializes views and rewrites content.

DESCRIPTION:

Purpose: Materializes database views into content and applies content transformations.

Position: Third phase after ANALYZE, before VERIFY.

TERM-27: Type Alias

Alternative syntax identifier for a type (e.g., “csv” -> “TABLE”).

DESCRIPTION:

Purpose: Provides shorthand or alternative names for types.

Example: csv is an alias for the TABLE type in float definitions.

TERM-38: Type Loader

System that discovers and loads type handlers from model directories.

DESCRIPTION:

Purpose: Dynamically discovers and instantiates type handlers from model definitions.

Implementation: Scans model directories for handler definitions and registers them.

TERM-26: Type Registry

Database tables (spec_*_types) storing type definitions.

DESCRIPTION:

Purpose: Stores type definitions including attributes, aliases, and validation rules.

Tables: spec_object_types, spec_float_types, spec_attribute_types, etc.

TERM-21: VERIFY Phase

The fourth phase in the pipeline that validates content via proof views.

DESCRIPTION:

Purpose: Validates document content using proof views and constraint checking.

Position: Fourth phase after TRANSFORM, before EMIT.

TERM-AST: Abstract Syntax Tree

The tree representation of document structure produced by Pandoc.

ACRONYM:
AST
DESCRIPTION:

Purpose: Represents document structure as a hierarchical tree of elements.

Source: Pandoc parses Markdown and produces JSON AST.

Usage: Handlers walk the AST to extract spec objects, floats, and relations.

DOMAIN:
Core
TERM:
Abstract Syntax Tree

TERM-FTS: Full-Text Search

FTS5 virtual tables enabling search across specification content.

ACRONYM:
FTS
DESCRIPTION:

Purpose: Indexes specification text for fast full-text search queries.

Implementation: SQLite FTS5 virtual tables populated during EMIT phase.

Usage: Web application uses FTS for search functionality.

DOMAIN:
Database
TERM:
Full-Text Search

TERM-HLR: High-Level Requirement

A top-level functional or non-functional requirement that captures what the system must do or satisfy.

ACRONYM:
HLR
DESCRIPTION:

Purpose: Defines system-level requirements that guide design and implementation.

Traceability: HLRs trace to verification cases (VC) and are realized by functional descriptions (FD).

DOMAIN:
Core
TERM:
High-Level Requirement

TERM-IR: Intermediate Representation

The database-backed representation of parsed document content.

ACRONYM:
IR
DESCRIPTION:

Purpose: Stores parsed specification content in queryable form.

Storage: SQLite database with spec_objects, spec_floats, spec_relations tables.

Lifecycle: Populated during INITIALIZE, queried and modified through remaining phases.

DOMAIN:
Core
TERM:
Intermediate Representation

TERM-PID: Project Identifier

A unique identifier assigned to spec objects for cross-referencing (e.g., @REQ-001).

ACRONYM:
PID
DESCRIPTION:

Purpose: Provides unique, human-readable identifiers for traceability and cross-referencing.

Syntax: Written as @PID in header text (e.g., ## HLR: Requirement Title @REQ-001).

Auto-generation: PIDs can be auto-generated from type prefix and sequence number.

DOMAIN:
Core
TERM:
Project Identifier

TERM-PROOF: Proof View

A SQL query that validates data integrity constraints during the VERIFY phase.

ACRONYM:
-
DESCRIPTION:

Purpose: Defines validation rules as SQL queries that detect specification errors.

Execution: Run during the VERIFY phase; violations are reported as diagnostics.

Examples: Missing required attributes, unresolved relations, cardinality violations.

DOMAIN:
Core
TERM:
Proof View

TERM-SQLITE: SQLite Database

The embedded database engine storing the IR and build cache.

ACRONYM:
-
DESCRIPTION:

Purpose: Provides persistent, portable storage for the intermediate representation.

Benefits: Single-file storage, ACID transactions, SQL query capability.

Usage: All pipeline phases read/write to SQLite via the database manager.

DOMAIN:
Database
TERM:
SQLite Database

TERM-TRACEABLE: Traceable Object

A specification object that participates in traceability relationships.

ACRONYM:
-
DESCRIPTION:

Purpose: Base type for objects that can be linked via traceability relations.

Types: Any spec object type registered in the model (e.g., HLR, LLR, SECTION).

Relations: Model-defined relation types (e.g., XREF_FIGURE, XREF_CITATION) inferred by specificity matching.

DOMAIN:
Core
TERM:
Traceable Object

TERM-TYPE: Type

A category definition that governs behavior for objects, floats, relations, or views.

ACRONYM:
-
DESCRIPTION:

Purpose: Defines the schema, validation rules, and rendering behavior for a category of elements.

Categories: Object types (HLR, SECTION), float types (FIGURE, TABLE), relation types (TRACES_TO), view types (TOC, LOF).

Registration: Types are loaded from model directories and stored in the type registry.

DOMAIN:
Core
TERM:
Type

TERM-VC: Verification Case

A test specification that verifies a requirement or set of requirements.

ACRONYM:
VC
DESCRIPTION:

Purpose: Defines how requirements are verified through test procedures and expected results.

Traceability: VCs trace to HLRs via traceability attribute links.

Naming: VC PIDs follow the pattern VC-{category}-{seq} (e.g., VC-PIPE-001).

DOMAIN:
Core
TERM:
Verification Case