1 Scope

DESCRIPTION:

Each phase serves a distinct purpose in document processing:

INITIALIZE: Parse document AST and populate database with specifications, spec_objects, floats, relations, views, and attributes
ANALYZE: Resolve relations between objects (link target resolution, type inference)
TRANSFORM: Pre-compute views, render external content (PlantUML, charts), prepare for output
VERIFY: Run proof views to validate data integrity, type constraints, cardinality rules
EMIT: Assemble final documents and write to output formats (docx, html5, markdown, json)

RATIONALE:

Separation of concerns enables validation between phases, allows early abort on errors, and supports format-agnostic processing until the final output stage.

STATUS:

Approved

HLR-PIPE-002: Handler Registration and Prerequisites

The pipeline shall support handler registration with declarative Prerequisites for dependency ordering.

BELONGS TO:

DESCRIPTION:

Handlers register via register_handler(handler) with required fields:

name: Unique string identifier for the handler
prerequisites: Array of handler names that must execute before this handler

Handlers declare participation in phases via hook methods (on_initialize, on_analyze, on_verify, on_transform, on_emit). Duplicate handler names cause registration error.

RATIONALE:

Declarative prerequisites decouple handler ordering from registration order, enabling modular handler development and preventing implicit ordering dependencies.

STATUS:

Approved

HLR-PIPE-003: Topological Ordering via Kahn’s Algorithm

The pipeline shall order handlers within each phase using topological sort with Kahn’s algorithm.

BELONGS TO:

DESCRIPTION:

For each phase, the pipeline:

Identifies handlers participating in the phase (those with on_{phase} hooks)
Builds dependency graph from prerequisites (only for participating handlers)
Executes Kahn’s algorithm to produce execution order
Sorts alphabetically at each level for deterministic output
Detects and reports circular dependencies with error listing remaining nodes

RATIONALE:

Kahn’s algorithm provides O(V+E) complexity, clear cycle detection, and deterministic ordering through alphabetic tie-breaking.

STATUS:

Approved

HLR-PIPE-004: Phase Abort on VERIFY Errors

The pipeline shall abort execution after VERIFY phase if any errors are recorded.

BELONGS TO:: SF-001
DESCRIPTION:: After running VERIFY phase, the pipeline checks diagnostics:has_errors(). If true, execution halts before EMIT phase, with TRANSFORM already completed. Error message is logged with error count. This prevents generating invalid output from documents with specification violations.
RATIONALE:: Early abort on verification failures saves computation and prevents distribution of invalid specification documents. Errors in VERIFY indicate data integrity issues that would produce incorrect outputs.
STATUS:: Approved

HLR-PIPE-005: Batch Dispatch for All Phases

The pipeline shall use a single batch dispatch model for all phases where handlers receive all contexts at once.

BELONGS TO:

DESCRIPTION:

All handlers implement on_{phase}(data, contexts, diagnostics) hooks that receive the full contexts array. The pipeline orchestrator calls each handler’s hook once per phase via run_phase(), passing all document contexts. Handlers are responsible for iterating over contexts internally.

This enables cross-document optimizations, transaction batching, and parallel processing within any phase.

RATIONALE:

A uniform dispatch model simplifies the pipeline engine, eliminates the dual-path batch/per-doc dispatch, and allows handlers in any phase to optimize across all documents (e.g., wrapping DB operations in a single transaction, parallel output generation in EMIT).

STATUS:

Approved

HLR-PIPE-006: Context Creation and Propagation

The pipeline shall create and propagate context objects containing document metadata and configuration through all phases.

BELONGS TO:

DESCRIPTION:

The execute(docs) method creates context objects for each input document with:

doc: Pandoc document AST (via DocumentWalker)
spec_id: Specification identifier derived from filename
config: Preset configuration (styles, captions, validation)
build_dir: Output directory path
output_format: Target format (docx, html5, etc.)
template: Template name for model loading
reference_doc: Path to reference.docx for styling
docx, html5: Format-specific configuration
outputs: Array of {format, path} for multi-format output
bibliography, csl: Citation configuration
project_root: Root directory for resolving relative paths

Context flows through all phases, enriched by handlers (e.g., verification results in VERIFY phase).

RATIONALE:

Unified context object provides handlers with consistent access to document metadata and build configuration without global state, enabling testable and isolated handler implementations.

STATUS:

Approved

6 Storage Requirements

SF-002: Specification Persistence

SQLite Database-based storage with incremental build support and output caching.

DESCRIPTION:: Groups requirements for the persistence layer including ACID-compliant storage, EAV Model attribute model, Build Cache , Output Cache , and incremental rebuild support.
RATIONALE:: Reliable persistence with change detection enables efficient rebuilds for large specification projects.

HLR-STOR-001: SQLite Persistence

The system shall persist all specification data to SQLite database with ACID guarantees.

BELONGS TO:: SF-002
DESCRIPTION:: All specifications, spec_objects, floats, relations, views, and attribute values stored in SQLite database. Database operations wrapped in transactions to ensure atomicity, consistency, isolation, and durability.
RATIONALE:: SQLite provides a reliable, single-file persistence layer suitable for specification documents. ACID guarantees prevent data corruption during concurrent access or system failures.
STATUS:: Approved

HLR-STOR-002: EAV Attribute Model

The system shall store spec object attributes using Entity-Attribute-Value pattern.

BELONGS TO:: SF-002
DESCRIPTION:: Attribute values stored in attribute_values table with polymorphic typed columns (string_value, int_value, real_value, bool_value, date_value, enum_ref). Each attribute record links to owner object via owner_ref and stores datatype for proper retrieval.
RATIONALE:: EAV pattern enables flexible attribute schemas without database migrations. Different spec object types (HLR, LLR, VC) have different attributes that can evolve independently.
STATUS:: Approved

HLR-STOR-003: Build Cache

The system shall maintain a build cache for document hash tracking.

BELONGS TO:: SF-002
DESCRIPTION:: Source file hashes stored in source_files table (path, sha1). Build cache module provides is_document_dirty() to check if document content has changed since last build. Hash comparison enables change detection.
RATIONALE:: Hash-based change detection allows the build system to skip unchanged documents, reducing rebuild times for large specification sets.
STATUS:: Approved

HLR-STOR-004: Output Cache

The system shall cache output generation state with P-IR hash and timestamps.

BELONGS TO:: SF-002
DESCRIPTION:: Output cache stored in output_cache table (spec_id, output_path, pir_hash, generated_at). P-IR (Processed Intermediate Representation) hash captures complete specification state. is_output_current() checks if output file exists and P-IR hash matches cached value.
RATIONALE:: Output caching avoids regenerating unchanged outputs (docx, html5). P-IR hash ensures output is regenerated when any upstream data changes, not just source file changes.
STATUS:: Approved

HLR-STOR-006: EAV Pivot Views for External Queries

The system shall generate per-object-type SQL views that pivot the EAV attribute model into typed columns for external BI queries.

BELONGS TO:: SF-002
DESCRIPTION:: For each non-composite spec_object_type, a view named view_{type_lower}_objects is dynamically generated (e.g., view_hlr_objects, view_vc_objects). Each view flattens the EAV join into one row per object with typed attribute columns, enabling queries like SELECT * FROM view_hlr_objects WHERE status = 'approved'. These views are NOT used by internal pipeline queries — all internal code queries the raw EAV tables directly because it needs access to raw_value, datatype, ast, enum_ref, and other columns that the pivot views abstract away. Internal queries also frequently operate cross-type or need COUNT/EXISTS checks that the MAX()-based pivot cannot provide.
RATIONALE:: External BI tools, ad-hoc SQL queries, and custom model scripts benefit from a flat relational interface over the EAV model. Generating views at runtime from the type system ensures the columns always match the current model configuration without manual maintenance.
STATUS:: Approved

HLR-STOR-005: Incremental Rebuild Support

The system shall support incremental rebuilds via build graph tracking.

BELONGS TO:: SF-002
DESCRIPTION:: Build graph stored in build_graph table (root_path, node_path, node_sha1). Tracks include file dependencies for each root document. is_document_dirty_with_includes() checks root document and all includes. update_build_graph() refreshes dependency tree after successful build.
RATIONALE:: Specification documents often include sub-files. Incremental builds must detect changes in any included file to trigger rebuild of parent document. Build graph captures this dependency structure.
STATUS:: Approved

7 Types Domain Requirements

SF-003: Type System

Dynamic type system providing typed containers for Specification, Spec Object, Spec Float, Spec View, Spec Relation, and Attribute.

DESCRIPTION:: Groups requirements for the six core containers that store parsed specification data plus the proof-view validation framework.
RATIONALE:: A typed container model enables schema validation, type-specific rendering, and data integrity checking through SQL proof views.

HLR-TYPE-001: Specifications Container

The type system shall provide a specifications container for registering document-level specification records.

BELONGS TO:

DESCRIPTION:

The specifications table stores metadata for each specification document parsed during INITIALIZE Phase phase:

identifier: Unique specification ID derived from filename (e.g., “srs-main”)
root_path: Source file path for the specification
long_name: Human-readable title extracted from L1 header
type_ref: Specification type (validated against spec_specification_types)
pid: Optional PID from @PID syntax in L1 header

L1 headers register as specifications. Type validation checks spec_specification_types table. Invalid types fall back to default or emit warning.

RATIONALE:

Specifications represent the top-level organizational unit for document hierarchies. Storing specification metadata enables cross-document linking and multi-document project support.

STATUS:

Approved

HLR-TYPE-002: Spec Objects Container

The type system shall provide a spec_objects container for hierarchical specification objects.

BELONGS TO:

DESCRIPTION:

The spec_objects table stores structured specification items extracted from L2+ headers:

identifier: SHA1 hash of source path + line + title (content-addressable)
specification_ref: Foreign key to parent specification
type_ref: Object type (validated against spec_object_types)
from_file: Source file path
file_seq: Document order sequence number
pid: Project ID from @PID syntax (e.g., “REQ-001”)
title_text: Header text without type prefix or PID
label: Unified label for cross-referencing (format: {type_lower}:{title_slug})
level: Header level (2-6)
start_line, end_line: Source line range
ast: Serialized Pandoc AST (JSON) for section content

Type resolution order: explicit TYPE: prefix, implicit alias lookup, default type fallback.

RATIONALE:

Content-addressable identifiers enable change detection for incremental builds. PID-based anchors provide stable cross-references independent of title changes.

STATUS:

Approved

HLR-TYPE-003: Spec Floats Container

The type system shall provide a spec_floats container for numbered floating content (figures, tables, listings).

BELONGS TO:

DESCRIPTION:

The spec_floats table stores content blocks that receive sequential numbering:

identifier: Short format “float-{8-char-sha1}” for DOCX compatibility
specification_ref: Foreign key to parent specification
type_ref: Float type resolved from aliases (e.g., “csv” -> “TABLE”, “puml” -> “FIGURE”)
from_file: Source file path
file_seq: Document order for numbering
label: User-provided label for cross-referencing
number: Sequential number within counter_group (assigned in TRANSFORM Phase)
caption: Caption text from attributes
raw_content: Original code block text
raw_ast: Serialized Pandoc CodeBlock (JSON)
parent_object_ref: Foreign key to containing spec_object
attributes: JSON-serialized attributes (caption, source, language)
syntax_key: Original class syntax for backend matching

Counter Group share numbering (e.g., FIGURE, CHART, PLANTUML all increment “FIGURE” counter).

RATIONALE:

Type aliasing supports user-friendly syntax (e.g., csv:data instead of TABLE:data). Counter groups enable semantic grouping of related float types under a single numbering sequence.

STATUS:

Approved

HLR-TYPE-004: Spec Views Container

The type system shall provide a spec_views container for data-driven view definitions.

BELONGS TO:

DESCRIPTION:

The spec_views table stores view definitions from code blocks and inline syntax:

identifier: SHA1 hash of specification + sequence + content
specification_ref: Foreign key to parent specification
view_type_ref: Uppercase view type (e.g., “SELECT”, “SYMBOL”, “MATH”, “ABBREV”)
from_file: Source file path
file_seq: Document order sequence number
raw_ast: View definition content (SQL query, symbol path, expression)

View types with needs_external_render = 1 in spec_view_types are delegated to specialized renderers. Inline views use type: content syntax (e.g., symbol: Class.method).

RATIONALE:

Separating view definitions from rendering enables format-agnostic processing. External render delegation supports complex transformations (PlantUML, charts) without core handler changes.

STATUS:

Approved

HLR-TYPE-005: Spec Relations Container

The type system shall provide a spec_relations container for tracking links between specification elements.

BELONGS TO:

DESCRIPTION:

The spec_relations table stores inter-element references:

identifier: SHA1 hash of specification + target + type + parent
specification_ref: Foreign key to parent specification
source_ref: Foreign key to source spec_object
target_text: Raw link target from syntax (e.g., “REQ-001”, “fig:diagram”)
target_ref: Resolved target identifier (populated in ANALYZE Phase phase)
type_ref: Relation type from spec_relation_types (e.g., “TRACES”, “XREF_FIGURE”)
from_file: Source file path

Link syntax: [PID](@) for PID references, [type:label](#) for float references, [@citation] for bibliographic citations. Default relation types are determined by is_default and link_selector columns in spec_relation_types.

RATIONALE:

Deferred resolution (target_text -> target_ref) enables forward references and cross-document linking. Type inference from source/target context reduces explicit markup requirements.

STATUS:

Approved

HLR-TYPE-006: Spec Attributes Container

The type system shall provide a spec_attributes container for structured metadata on specification objects.

BELONGS TO:

DESCRIPTION:

The spec_attributes table stores typed attribute values extracted from blockquote syntax:

identifier: SHA1 hash of specification + owner + name + value
specification_ref: Foreign key to parent specification
owner_ref: Foreign key to owning spec_object
name: Attribute name (field name without colon)
raw_value: Original string value
string_value, int_value, real_value, bool_value, date_value: Type-specific columns
enum_ref: Foreign key to enum_values for ENUM types
ast: JSON-serialized Pandoc AST for rich content (XHTML type)
datatype: Resolved datatype from spec_attribute_types

Attribute syntax: > name: value in blockquotes following headers. Datatypes include STRING, INTEGER, REAL, BOOLEAN, DATE, ENUM, XHTML. Multi-line attributes use continuation blocks.

RATIONALE:

Multi-column typed storage enables SQL queries with type-appropriate comparisons. Storing original AST preserves formatting for XHTML attributes with links, emphasis, or lists.

STATUS:

Approved

HLR-TYPE-007: Type Validation

The type system shall provide proof views that detect data integrity violations across all specification containers.

The type system described above is not fixed at compile time. The following section defines how TERM-33 directories extend it with custom object types, float renderers, TERM-35 generators, and style presets.

BELONGS TO:

DESCRIPTION:

Proof views are SQL queries registered in the VERIFY Phase phase that check for constraint violations:

Specification-level (missing required attributes, invalid types)
Object-level (missing required, cardinality, cast failures, invalid enum/date, bounds)
Float-level (orphans, duplicate labels, render failures, invalid types)
Relation-level (unresolved, dangling, ambiguous)
View-level (materialization failures, query errors)

The validation policy (configurable in project.yaml) determines severity: error, warn, or ignore.

RATIONALE:

Automated validation enables early detection of specification errors before document generation. Configurable severity allows projects to gradually enforce stricter quality standards.

STATUS:

Approved

8 Extension Requirements

SF-005: Extension Framework

Model-based extensibility for type Handler, renderers, and Data View.

DESCRIPTION:: Groups requirements for the extension mechanism that enables custom models to provide type handlers, External Renderer , data view generators, and style presets.
RATIONALE:: Extensibility through model directories enables domain-specific customization without modifying the core pipeline.

HLR-EXT-001: Model-Specific Type Handler Loading

The system shall load type-specific handlers from model directories.

BELONGS TO:

DESCRIPTION:

Type handlers control how specification content is rendered during the TRANSFORM Phase phase. The loading mechanism supports:

Object types: Loaded from models/{model}/types/objects/{type}.lua
Specification types: Loaded from models/{model}/types/specifications/{type}.lua
Float types: Loaded from models/{model}/types/floats/{type}.lua
View types: Loaded from models/{model}/types/views/{type}.lua

Module loading uses require() with path models.{model}.types.{category}.{type}. Type names are converted to lowercase for file lookup (e.g., “HLR” -> “hlr.lua”).

RATIONALE:

Separating type handlers into model directories enables domain-specific customization. Organizations can define their own requirement types, document types, and rendering behavior without modifying core code.

STATUS:

Approved

HLR-EXT-002: Model Directory Structure

The system shall organize model content in a standardized directory hierarchy.

BELONGS TO:

DESCRIPTION:

Each model follows this structure:

models/{model_name}/
  types/
    objects/       -- Spec object type handlers (HLR, LLR, VC, etc.)
    specifications/-- Specification type handlers (SRS, SDD, SVC)
    floats/        -- Float type handlers (TABLE, PLANTUML, CHART)
    views/         -- View type handlers (ABBREV, SYMBOL, MATH)
    relations/     -- Relation type definitions (TRACES_TO, etc.)
  data_views/      -- Data view generators for chart data injection
  filters/         -- Pandoc filters (docx.lua, html.lua, markdown.lua)
  postprocessors/  -- Format-specific postprocessors
  styles/          -- Style presets and templates

Model names are referenced via project configuration template field or context model_name. The “default” model provides base implementations with fallback behavior.

RATIONALE:

Standardized structure enables consistent discovery of type modules across models and provides clear extension points for each content category.

STATUS:

Approved

HLR-EXT-003: Handler Registration Interface

Type handlers shall provide standardized registration interfaces for pipeline integration.

BELONGS TO:

DESCRIPTION:

Each handler category defines specific interfaces:

Object Type Handlers export:

M.object: Type schema with id, long_name, description, attributes
M.handler.on_render_SpecObject(obj, ctx): Render function returning Pandoc blocks

Specification Type Handlers export:

M.specification: Type schema with id, long_name, attributes
M.handler.on_render_Specification(ctx, pandoc, data): Render document title

Float Type Handlers export:

M.float: Type schema with id, caption_format, counter_group, aliases, needs_external_render
M.transform(raw_content, type_ref, log): For internal transforms (TABLE, CSV)
external_render.register_renderer(type_ref, callbacks): For external renders (PLANTUML, CHART)

View Type Handlers export:

M.view: Type schema with id, inline_prefix, aliases
M.handler.on_render_Code(code, ctx): Inline code rendering

RATIONALE:

Consistent interfaces enable the core pipeline to discover and invoke handlers without knowledge of specific type implementations. This separation maintains extensibility.

STATUS:

Approved

HLR-EXT-004: Type Definition Schema

Type definitions shall declare metadata schema that controls registration and behavior.

BELONGS TO:

DESCRIPTION:

Type schemas provide metadata stored in registry tables:

Object Types (spec_object_types):

M.object = {
    id = "HLR",                    -- Unique identifier (uppercase)
    long_name = "High-Level Requirement",
    description = "A top-level system requirement",
    extends = "TRACEABLE",         -- Base type for inheritance
    header_unnumbered = true,      -- Exclude from section numbering
    header_style_id = "Heading2",  -- Custom-style for headers
    body_style_id = "Normal",      -- Custom-style for body
    attributes = {                 -- Attribute definitions
        { name = "status", type = "ENUM", values = {...}, min_occurs = 1 },
        { name = "rationale", type = "XHTML" },
        { name = "created", type = "DATE" },
    }
}

Float Types (spec_float_types):

M.float = {
    id = "CHART",
    caption_format = "Figure",     -- Caption prefix
    counter_group = "FIGURE",      -- Counter sharing (FIGURE, CHART, PLANTUML)
    aliases = { "echarts" },       -- Alternative syntax identifiers
    needs_external_render = true,  -- Requires external tool
}

View Types (spec_view_types):

M.view = {
    id = "ABBREV",
    inline_prefix = "abbrev",      -- Syntax: `abbrev: content`
    aliases = { "sigla", "acronym" },
    needs_external_render = false,
}

RATIONALE:

Declarative schemas enable automatic registration into database tables during initialization, provide validation rules for content, and configure rendering behavior without procedural code.

STATUS:

Approved

HLR-EXT-005: Model Path Resolution

The system shall resolve model paths using environment configuration with fallback.

BELONGS TO:

DESCRIPTION:

Model path resolution follows this order:

Check SPECCOMPILER_HOME environment variable: $SPECCOMPILER_HOME/models/{model}
Fall back to current working directory: ./models/{model}

For type modules not found in the specified model, the system falls back to the “default” model:

-- Try model-specific path first
local module = require("models." .. model_name .. ".types.floats.table")
-- Fallback to default model
local module = require("models.default.types.floats.table")

This enables partial model customization where models only override specific types.

RATIONALE:

Environment-based configuration supports deployment flexibility. Fallback to default model reduces duplication by allowing models to inherit base implementations.

STATUS:

Approved

HLR-EXT-006: External Renderer Registration

External renderers shall register callbacks for task preparation and result handling.

BELONGS TO:

DESCRIPTION:

Float types requiring external tools (PlantUML, ECharts, etc.) register with the external render handler:

external_render.register_renderer("PLANTUML", {
    prepare_task = function(float, build_dir, log, data, model_name)
        -- Return task descriptor with cmd, args, output_path, context
    end,
    handle_result = function(task, success, stdout, stderr, data, log)
        -- Update resolved_ast in database
    end
})

The core orchestrates: query items -> prepare tasks -> cache filter -> batch spawn -> dispatch results. This enables parallel execution across all external renders.

RATIONALE:

Registration pattern decouples type-specific rendering logic from core orchestration. Callbacks enable types to control task preparation and result interpretation while core handles parallelization and caching.

STATUS:

Approved

HLR-EXT-007: Data View Generator Loading

The system shall load data view generators from model directories for chart data injection.

BELONGS TO:

DESCRIPTION:

Data views are Lua modules that generate data for charts:

-- models/{model}/data_views/{view_name}.lua
local M = {}

function M.generate(params, data)
    -- params: user parameters from code block attributes
    -- data: DataManager instance for SQL queries
    return { source = { {"x", "y"}, {1, 10}, {2, 20} } }
end

return M

Views are loaded via data_loader.load_view(view_name, model_name, data, params). Resolution tries the specified model first, then falls back to default.

Usage in code blocks:

```chart:gaussian{view="gaussian" sigma=2.0}
{...echarts config...}

RATIONALE:

Data views separate data generation from chart configuration. This enables reusable data sources and database-driven visualizations without embedding SQL in markdown.

STATUS:

Approved

HLR-EXT-008: Handler Caching

The system shall cache loaded type handlers to avoid repeated module loading.

BELONGS TO: