Guide: Creating a Custom Model

1 Introduction

A model defines the vocabulary and behavior of your specification documents. It declares what types of spec objects exist (requirements, design items, test cases), what floats are available (diagrams, tables, code listings), how cross-references resolve, and what validation rules apply.

SpecCompiler ships with a default model that provides base types (SECTION, FIGURE, TABLE, PLANTUML, etc.). You create a custom model when your domain needs additional types, specialized validation, or custom rendering.

1.1 Overlay

Models work as overlays on top of default. When you set template: mymodel in project.yaml, the engine loads types in order:

  1. models/default/types/ – Always loaded first.
  2. models/mymodel/types/ – Loaded second; types with the same id override the default.

This means your custom model only needs to define the types it adds or overrides. Everything else inherits from default.

2 Model Directory Layout

Listing : Model directory structure
models/{name}/
  types/
    objects/          -- Spec object types (e.g., hlr.lua, vc.lua)
    specifications/   -- Specification types (e.g., srs.lua)
    floats/           -- Float types (e.g., figure.lua, chart.lua)
    views/            -- View types (e.g., abbrev.lua, math_inline.lua)
    relations/        -- Relation types (e.g., xref_decomposition.lua)
  proofs/             -- Validation proof queries (e.g., sd_601_*.lua)
  postprocessors/     -- Format post-processing (docx.lua, html5.lua)
  filters/            -- Pandoc Lua filters per output format
  styles/             -- Style presets (preset.lua, docx.lua, html.lua)
  data_views/         -- Chart data generators
  handlers/           -- Custom pipeline handlers

Only types/ is required. All other directories are optional.

3 Type Definition Pattern

Every type module is a Lua file that returns a table with two optional keys:

  • A schema key (M.object, M.float, M.relation, M.view, or M.specification) that declares the type’s metadata and gets registered into the database.
  • An optional M.handler table that hooks into the pipeline lifecycle.

3.1 Schema Keys by Category

Table : Schema keys by type category
Category Schema Key Example File
Spec Objects M.object types/objects/section.lua
Floats M.float types/floats/figure.lua
Relations M.relation types/relations/xref_citation.lua
Views M.view types/views/abbrev.lua
Specifications M.specification types/specifications/srs.lua

3.2 Handler Lifecycle

Handlers hook into pipeline phases via callback functions:

Table : Handler callback functions
Callback Phase Purpose
on_initialize INITIALIZE Parse content from Pandoc AST, store in database
on_analyze ANALYZE Validate, resolve references, generate PIDs
on_transform TRANSFORM Render content, resolve external resources
on_render_SpecObject EMIT Convert spec object to Pandoc blocks for output
on_render_Code EMIT Convert inline code to Pandoc inlines (views)
on_render_CodeBlock EMIT Convert code block to Pandoc blocks (floats)

The prerequisites field controls execution order: a handler with prerequisites = {"spec_views"} runs after the spec_views handler.

4 Walkthrough: Custom Object Type

This example creates a High-Level Requirement (HLR) type with required attributes.

4.1 Step 1: Create the Type File

Create models/mymodel/types/objects/hlr.lua:

Listing : Custom object type: hlr.lua
local M = {}

M.object = {
    id = "HLR",
    long_name = "High-Level Requirement",
    description = "A top-level system requirement",
    pid_prefix = "HLR",           -- Auto-PID prefix
    pid_format = "%s-%03d",       -- Produces HLR-001, HLR-002, etc.
    attributes = {
        {
            name = "priority",
            type = "ENUM",
            values = { "High", "Medium", "Low" },
            min_occurs = 1,       -- Required
            max_occurs = 1,
        },
        {
            name = "status",
            type = "ENUM",
            values = { "Draft", "Approved", "Implemented" },
            min_occurs = 1,
            max_occurs = 1,
        },
        {
            name = "rationale",
            type = "XHTML",       -- Rich text
            min_occurs = 0,       -- Optional
        },
    },
}

return M

4.2 Step 2: Use in Markdown

Listing : Using the custom object type
## hlr: User Authentication @HLR-001

> priority: High

> status: Draft

> rationale: Required by security policy section 4.2

The system shall authenticate users via username and password.

4.3 Step 3: Add a Handler (Optional)

If the type needs custom behavior during pipeline phases, add M.handler:

Listing : Object type with handler
local Queries = require("db.queries")

M.handler = {
    name = "hlr_handler",
    prerequisites = {},

    on_analyze = function(data, contexts, diagnostics)
        for _, ctx in ipairs(contexts) do
            local spec_id = ctx.spec_id or "default"
            local objects = data:query_all(
                Queries.content.objects_by_spec_type,
                { spec_id = spec_id, type_ref = "HLR" }
            )
            for _, obj in ipairs(objects or {}) do
                -- Custom validation logic here
            end
        end
    end,
}

4.4 Object Schema Fields Reference

Table : Object schema fields
Field Type Default Description
id string required Unique identifier (uppercase convention)
long_name string same as id Human-readable name
description string "" Description text
extends string nil Base type for inheritance
is_default boolean false If true, headers without explicit type match this
is_composite boolean false Composite object flag
pid_prefix string nil Prefix for auto-generated PIDs
pid_format string nil Printf format string for PIDs
aliases list nil Alternative identifiers for syntax matching
attributes list nil Attribute definitions (see Attribute Schema)

4.5 Attribute Schema

Table : Attribute definition fields
Field Type Default Description
name string required Attribute identifier
type string "STRING" Datatype: STRING, INTEGER, REAL, BOOLEAN, DATE, ENUM, XHTML
min_occurs integer 0 Minimum values (0 = optional, 1 = required)
max_occurs integer 1 Maximum values
min_value number nil Lower bound for numeric types
max_value number nil Upper bound for numeric types
values list nil Valid enum values (required when type = "ENUM")
datatype_ref string nil Explicit datatype ID (overrides auto-generated)

5 Walkthrough: Custom Float Type

Floats are numbered elements declared in fenced code blocks. This example creates a custom float type for diagrams.

5.1 Step 1: Create the Type File

Create models/mymodel/types/floats/sequence_diagram.lua:

Listing : Custom float type: sequence_diagram.lua
local M = {}

M.float = {
    id = "SEQUENCE",
    long_name = "Sequence Diagram",
    description = "UML Sequence Diagram rendered via PlantUML",
    caption_format = "Figure",        -- Caption prefix in output
    counter_group = "FIGURE",         -- Shares counter with FIGURE, PLANTUML
    aliases = { "seq", "sequence" },  -- Syntax: ```seq:label or ```sequence:label
    needs_external_render = true,     -- Requires external tool
}

return M

5.2 Float Schema Fields Reference

Table : Float schema fields
Field Type Default Description
id string required Unique identifier (uppercase)
caption_format string same as id Prefix used in output captions
counter_group string same as id Counter sharing group (e.g., FIGURE, TABLE)
aliases list nil Alternative syntax identifiers
needs_external_render boolean false Whether rendering requires an external tool
style_id string nil Custom style identifier for output formatting

5.3 Counter Groups

Multiple float types can share a numbering sequence by using the same counter_group. For example, FIGURE, PLANTUML, and CHART all use counter_group = "FIGURE", so they are numbered sequentially as Figure 1, Figure 2, Figure 3 regardless of which specific type each is.

6 Walkthrough: Float with External Rendering

When a float type needs an external tool to produce its output (PlantUML for diagrams, Deno for charts, etc.), it uses the external render handler. This handler collects all items that need rendering, spawns external processes in parallel, and dispatches results back to type-specific callbacks.

6.1 How External Rendering Works

The pipeline flow for external renders:

  1. INITIALIZE – The float is parsed from the Markdown code block and stored in spec_floats with raw_content.
  2. TRANSFORM – The external render handler (src/pipeline/transform/external_render_handler.lua) queries all floats where needs_external_render = 1 and resolved_ast IS NULL.
  3. Prepare – For each float, the handler calls the registered prepare_task callback, which writes input files and builds a command descriptor.
  4. Cache check – If output_path exists on disk (from a previous build), the task is skipped and handle_result is called immediately with the cached path.
  5. Batch spawn – All non-cached tasks are spawned in parallel via task_runner.spawn_batch.
  6. Dispatch – Results (stdout, stderr, exit code) are dispatched to each type’s handle_result callback, which updates resolved_ast in the database.

6.2 Registering a Renderer

External renderers are registered at module load time by calling external_render.register_renderer(type_ref, callbacks). The callbacks table must provide two functions:

Table : External render callback functions
Callback Signature and Purpose
prepare_task function(float, build_dir, log, data, model_name) -> task|nil – Writes input files, builds command descriptor. Returns nil to skip rendering.
handle_result function(task, success, stdout, stderr, data, log) – Processes output. Updates resolved_ast in the database via float_base.update_resolved_ast.

6.3 Task Descriptor

The prepare_task callback returns a task descriptor table:

Table : Task descriptor fields
Field Type Description
cmd string Command to execute (e.g., "plantuml", "deno")
args list Command arguments
opts table Options: cwd (working directory), timeout (milliseconds)
output_path string Expected output file path; if it exists, the task is skipped (cache hit)
context table Arbitrary data passed through to handle_result (float record, hash, paths, etc.)

6.4 Example: PlantUML Renderer

The built-in PlantUML renderer demonstrates the full pattern:

Listing : PlantUML external renderer (simplified)
local float_base = require("pipeline.shared.float_base")
local task_runner = require("infra.process.task_runner")
local external_render = require("pipeline.transform.external_render_handler")

local M = {}

M.float = {
    id = "PLANTUML",
    long_name = "PlantUML Diagram",
    caption_format = "Figure",
    counter_group = "FIGURE",
    aliases = { "puml", "plantuml", "uml" },
    needs_external_render = true,   -- Enables external render pipeline
}

external_render.register_renderer("PLANTUML", {
    prepare_task = function(float, build_dir, log)
        local content = float.raw_content or ''
        -- Ensure @startuml/@enduml wrapper
        if not content:match('@startuml') then
            content = '@startuml\n' .. content .. '\n@enduml'
        end

        local hash = pandoc.sha1(content)
        local diagrams_path = build_dir .. "/diagrams"
        local puml_file = diagrams_path .. "/" .. hash .. ".puml"
        local png_file = diagrams_path .. "/" .. hash .. ".png"

        task_runner.ensure_dir(diagrams_path)
        task_runner.write_file(puml_file, content)

        return {
            cmd = "plantuml",
            args = { "-tpng", puml_file },
            opts = { timeout = 30000 },
            output_path = png_file,       -- Cache key: skip if PNG exists
            context = {
                hash = hash,
                float = float,
                relative_path = "diagrams/" .. hash .. ".png",
            }
        }
    end,

    handle_result = function(task, success, stdout, stderr, data, log)
        local ctx = task.context
        if not success then
            log.warn("PlantUML failed for %s: %s",
                ctx.float.identifier:sub(1,12), stderr)
            return
        end

        -- Store resolved path as JSON in resolved_ast
        local json = string.format(
            '{"png_paths":["%s"]}',
            ctx.relative_path
        )
        float_base.update_resolved_ast(data, ctx.float.identifier, json)
    end
})

return M

6.5 Example: Chart Renderer with Data Injection

The chart renderer adds a data injection step before rendering, loading data views from models/{model}/data_views/:

Listing : Chart external renderer (simplified)
local float_base = require("pipeline.shared.float_base")
local task_runner = require("infra.process.task_runner")
local data_loader = require("core.data_loader")
local external_render = require("pipeline.transform.external_render_handler")

local M = {}

M.float = {
    id = "CHART",
    long_name = "Chart",
    caption_format = "Figure",
    counter_group = "FIGURE",
    aliases = { "echarts", "echart" },
    needs_external_render = true,
}

external_render.register_renderer("CHART", {
    prepare_task = function(float, build_dir, log, data, model_name)
        local attrs = float_base.decode_attributes(float)
        local json_content = float.raw_content or '{}'

        -- Data injection: load view module and merge data into ECharts config
        local view_name = attrs.view
        if view_name and data then
            local inject_attrs = { view = view_name, model = model_name }
            local config = pandoc.json.decode(json_content)
            local injected = data_loader.inject_chart_data(
                config, inject_attrs, data, log)
            if injected then
                json_content = pandoc.json.encode(injected)
            end
        end

        local hash = pandoc.sha1(json_content)
        local charts_path = build_dir .. "/charts"
        local json_file = charts_path .. "/" .. hash .. ".json"
        local png_file = charts_path .. "/" .. hash .. ".png"

        task_runner.ensure_dir(charts_path)
        task_runner.write_file(json_file, json_content)

        return {
            cmd = "deno",
            args = {
                "run", "--allow-read", "--allow-write", "--allow-env",
                "echarts-render.ts", json_file, png_file,
                tostring(attrs.width or 600),
                tostring(attrs.height or 400)
            },
            opts = { timeout = 60000 },
            output_path = png_file,
            context = {
                hash = hash,
                float = float,
                relative_path = "charts/" .. hash .. ".png",
            }
        }
    end,

    handle_result = function(task, success, stdout, stderr, data, log)
        local ctx = task.context
        if not success then
            log.warn("Chart render failed: %s", stderr)
            return
        end

        local json = string.format('{"png_path":"%s"}', ctx.relative_path)
        float_base.update_resolved_ast(data, ctx.float.identifier, json)
    end
})

return M

6.6 Creating Your Own External Renderer

To create a float type that uses an external tool:

  1. Set needs_external_render = true in the float schema.
  2. Register callbacks with external_render.register_renderer("YOUR_TYPE", { ... }) at module load time (top-level code, not inside a function).
  3. In prepare_task: Write input content to a temporary file, build the command and arguments, and return a task descriptor with output_path for file-based caching.
  4. In handle_result: Parse the output (stdout, generated files), serialize the result as JSON, and call float_base.update_resolved_ast(data, identifier, json) to store it.
  5. Do not define M.handler.on_transform – the external render handler orchestrates the TRANSFORM phase for all registered types. Defining your own on_transform would bypass the parallel batch execution.

Key utilities available:

Table : Utility functions for external renderers
Function Purpose
task_runner.ensure_dir(path) Create directory if it does not exist
task_runner.write_file(path, content) Write content to a file; returns ok, err
task_runner.file_exists(path) Check if a file exists on disk
task_runner.command_exists(cmd) Check if a command is available in PATH
float_base.decode_attributes(float) Parse float’s pandoc_attributes JSON into a Lua table
float_base.update_resolved_ast(data, id, json) Store the rendering result in the database

6.7 File-Based Caching

The external render handler provides automatic file-based caching via the output_path field in the task descriptor. If the output file already exists on disk when prepare_task returns, the handler skips spawning the external process and immediately calls handle_result with empty stdout/stderr. This means:

  • The content hash should be part of the output filename (e.g., diagrams/{sha1}.png) so that content changes produce a new filename and trigger re-rendering.
  • The handle_result callback should work correctly whether called after a fresh render or a cache hit (it receives the same task.context).
  • Deleting the output files forces re-rendering on the next build (the database resolved_ast is also cleared during INITIALIZE).

6.8 External Rendering for Views

The same external_render.register_renderer mechanism works for views that need external tools. For example, math_inline.lua registers a renderer for the MATH_INLINE view type to convert AsciiMath to MathML/OMML via an external script. The handler queries spec_views (instead of spec_floats) with needs_external_render = 1 and dispatches to the same callback interface.

7 Walkthrough: Custom Relation with Inference Rules

Relations connect spec objects via link syntax. The relation resolver uses specificity scoring to infer the relation type.

7.1 Step 1: Create the Type File

Create models/mymodel/types/relations/traces_to.lua:

Listing : Custom relation type: traces_to.lua
local M = {}

M.relation = {
    id = "TRACES_TO",
    long_name = "Traces To",
    description = "Traceability link from LLR to HLR",
    link_selector = "@",             -- Uses [PID](@) syntax
    source_type_ref = "LLR",        -- Only from LLR objects
    target_type_ref = "HLR",        -- Only to HLR objects
    aliases = nil,                   -- No alias prefix
    is_default = false,
}

return M

7.2 Step 2: Use in Markdown

Listing : Using the relation type
### llr: Password Length Check @LLR-001

Passwords must be at least 8 characters. Traces to [HLR-001](@).

7.3 Inference Scoring

When multiple relation types could match a link, the resolver scores each candidate:

Table : Inference scoring dimensions
Dimension Match Constraint mismatch No constraint (NULL)
Selector (@ or #) +1 Eliminated +0
Source attribute +1 Eliminated +0
Source type +1 Eliminated +0
Target type +1 Eliminated +0

The highest-scoring candidate wins. If two candidates tie, the relation is flagged as ambiguous (relation_ambiguous). Constraints set to nil act as wildcards (+0) rather than eliminating the candidate.

7.4 Relation Schema Fields Reference

Table : Relation schema fields
Field Type Default Description
id string required Unique identifier (uppercase)
link_selector string nil Required selector: "@" for PID refs, "#" for label refs
source_type_ref string nil Constrain source to this object type (nil = any)
target_type_ref string nil Constrain target to this object type (nil = any)
source_attribute string nil Constrain to links within this attribute context
aliases list nil Prefix aliases for [alias:key](#) syntax
is_default boolean false Default relation for its selector when no better match

7.5 Relations with Handlers

A relation type can include a handler for custom transform behavior. For example, xref_citation.lua rewrites citation links to Pandoc Cite elements during the TRANSFORM phase:

Listing : Relation type with handler
M.handler = {
    name = "my_relation_handler",
    prerequisites = {"spec_relations"},  -- Run after relations are stored

    on_transform = function(data, contexts, diagnostics)
        for _, ctx in ipairs(contexts) do
            -- Custom transform logic
        end
    end
}

7.6 Base Types and Inheritance

Relation types support inheritance via base types. Instead of repeating link_selector and resolution logic in every type, you extend a base type:

  • traceable (models/default/types/relations/traceable.lua) — base for @ (PID) selector
  • xref (models/default/types/relations/xref.lua) — base for # (label) selector

Use extend() to create a concrete type:

Listing : Extending the traceable base type
local traceable = require("models.default.types.relations.traceable")
local M = {}

M.relation = traceable.extend({
    id = "TRACES_TO",
    long_name = "Traces To",
    description = "Traceability link from one object to another",
})

return M

The extend() call inherits link_selector = "@" from the base and merges your overrides. For # selector types, use xref.extend() instead.

By default, object references display the target’s PID and float references display the caption format with the float number (e.g., “Figure 3”). To customize display text, add a standard M.handler with an on_transform hook using the shared link_rewrite_utils utility:

Listing : Custom display text via on_transform
local traceable = require("models.default.types.relations.traceable")
local link_rewrite = require("pipeline.shared.link_rewrite_utils")
local M = {}

M.relation = traceable.extend({
    id = "XREF_DIC",
    long_name = "Dictionary Reference",
    description = "Cross-reference to a dictionary entry",
    target_type_ref = "DIC",
})

M.handler = {
    name = "xref_dic_handler",
    prerequisites = {"spec_relations"},
    on_transform = function(data, contexts, _diagnostics)
        link_rewrite.rewrite_display_for_type(data, contexts, "XREF_DIC", function(target)
            if target.title_text and target.title_text ~= "" then
                return target.title_text
            end
        end)
    end
}

return M

A link [DIC-AUTH-001](@) would display as “Authentication” instead of “DIC-AUTH-001”.

The display_fn receives a target table with fields pid, type_ref, and title_text. Return a string for custom display text, or nil to keep the default.

8 Walkthrough: Custom View

Views are inline elements declared with backtick syntax (`prefix: content`).

8.1 Step 1: Create the Type File

Create models/mymodel/types/views/symbol.lua:

Listing : Custom view type: symbol.lua
local M = {}
local Queries = require("db.queries")

M.view = {
    id = "SYMBOL",
    long_name = "Symbol",
    description = "Engineering symbol with unit definition",
    aliases = { "sym" },
    inline_prefix = "symbol",         -- Enables `symbol: content` syntax
    needs_external_render = false,
}

M.handler = {
    name = "symbol_handler",
    prerequisites = {"spec_views"},

    on_initialize = function(data, contexts, diagnostics)
        for _, ctx in ipairs(contexts) do
            local doc = ctx.doc
            if not doc or not doc.blocks then goto continue end

            local spec_id = ctx.spec_id or "default"
            local file_seq = 0

            local visitor = {
                Code = function(c)
                    local content = (c.text or ""):match("^symbol:%s*(.+)$")
                        or (c.text or ""):match("^sym:%s*(.+)$")
                    if not content then return nil end

                    file_seq = file_seq + 1
                    local identifier = pandoc.sha1(spec_id .. ":" .. file_seq .. ":" .. content)

                    data:execute(Queries.content.insert_view, {
                        identifier = identifier,
                        specification_ref = spec_id,
                        view_type_ref = "SYMBOL",
                        from_file = ctx.source_path or "unknown",
                        file_seq = file_seq,
                        raw_ast = content
                    })
                end
            }

            for _, block in ipairs(doc.blocks) do
                pandoc.walk_block(block, visitor)
            end
            ::continue::
        end
    end,

    on_render_Code = function(code, ctx)
        local content = (code.text or ""):match("^symbol:%s*(.+)$")
            or (code.text or ""):match("^sym:%s*(.+)$")
        if not content then return nil end

        -- Render as emphasized text
        return { pandoc.Emph({ pandoc.Str(content) }) }
    end,
}

return M

8.2 Step 2: Use in Markdown

Listing : Using the custom view type
The force is defined as `symbol: F = ma` where `symbol: F` is force in Newtons.

8.3 View Schema Fields Reference

Table : View schema fields
Field Type Default Description
id string required Unique identifier (uppercase)
inline_prefix string nil Prefix for inline code dispatch (e.g., "math" enables math: syntax)
aliases list nil Alternative prefixes for the same view type
needs_external_render boolean false Whether rendering requires an external tool (batch processing)
materializer_type string nil Materializer strategy (e.g., ‘toc’, ‘lof’, ‘custom’)
counter_group string nil Counter group for numbered views

9 Validation Proofs

Proofs are SQL-based validation rules that run during the VERIFY phase. Each proof creates a SQL view; if the view returns any rows, those rows represent violations.

9.1 Proof File Pattern

Create models/mymodel/proofs/vc_missing_hlr_traceability.lua:

Listing : Validation proof module
local M = {}

M.proof = {
    view = "view_traceability_vc_missing_hlr",
    policy_key = "traceability_vc_to_hlr", -- Key in project.yaml validation section
    sql = [[
CREATE VIEW IF NOT EXISTS view_traceability_vc_missing_hlr AS
SELECT
  vc.identifier AS object_id,
  vc.pid AS object_pid,
  vc.title_text AS object_title,
  vc.from_file,
  vc.start_line
FROM spec_objects vc
WHERE vc.type_ref = 'VC'
  AND NOT EXISTS (
    SELECT 1
    FROM spec_relations r
    JOIN spec_objects target ON target.identifier = r.target_ref
    WHERE r.source_ref = vc.identifier
      AND target.type_ref = 'HLR'
  );
]],
    message = function(row)
        local label = row.object_pid or row.object_title or row.object_id
        return string.format(
            "Verification case '%s' has no traceability link to an HLR",
            label
        )
    end
}

return M

9.2 Proof Schema

Table : Proof module fields
Field Type Description
view string SQL view name (must match the CREATE VIEW name)
policy_key string Key for suppression in project.yaml validation section
sql string SQL CREATE VIEW statement; rows returned = violations
message function Takes a row table, returns a diagnostic message string

9.3 Suppressing Proofs

Users suppress proofs in project.yaml using the policy_key:

Listing : Suppressing a validation proof
validation:
  traceability_vc_to_hlr: ignore   # Suppress this proof

10 Model Overlay/Extension Pattern

10.1 How Overlays Work

The type loader (src/core/type_loader.lua) loads models in two passes:

  1. Default model: Scans models/default/types/{category}/ and registers all types.
  2. Custom model: Scans models/{template}/types/{category}/ and registers all types.

Since type registration uses INSERT OR REPLACE, a custom model type with the same id as a default type replaces it entirely. Types with new IDs are added alongside the defaults.

10.2 Path Resolution

The loader resolves model paths in order:

  1. $SPECCOMPILER_HOME/models/{name}/types/ (Docker/production)
  2. ./models/{name}/types/ (local development)

10.3 Partial Customization Example

A model that only adds an HLR type and a custom proof:

Listing : Minimal overlay model
models/mymodel/
  types/
    objects/
      hlr.lua         -- Adds HLR type (default has no HLR)
    relations/
      traces_to.lua   -- Adds traceability relation
  proofs/
    sd_601_vc_missing_hlr.lua  -- Domain-specific validation

All other types (SECTION, FIGURE, TABLE, etc.) are inherited from default.

11 project.yaml Integration

Set the template field to use your custom model:

Listing : Using a custom model in project.yaml
project:
  code: MYPROJ
  name: My Project

template: mymodel   # Loads models/default/ then models/mymodel/

doc_files:
  - srs.md

Guide: DOCX Customization

1 Introduction

SpecCompiler generates DOCX output through a multi-stage pipeline:

  1. SpecIR – Structured data in SQLite (objects, relations, floats, attributes).
  2. Pandoc AST – The emitter assembles a Pandoc document from the SpecIR.
  3. Pandoc DOCX Writer – Pandoc converts the AST to DOCX using a reference.docx for styles.
  4. Lua Filter – A format-specific filter converts SpecCompiler markers to OOXML (captions, bookmarks, math).
  5. Postprocessor – Manipulates the generated DOCX ZIP archive (positioned floats, caption orphan prevention, template-specific OOXML).

Customization is available at three levels: style presets (fonts, spacing, page layout), filters (AST-to-OOXML conversion), and postprocessors (raw OOXML manipulation).

2 How Pandoc reference.docx Works

Pandoc uses a reference document as a style template for DOCX output. The reference document defines paragraph styles (Normal, Heading 1, Caption, etc.), page dimensions, margins, and default formatting. Pandoc does not copy content from the reference document – only styles and settings.

SpecCompiler manages the reference document in two ways:

  1. Auto-generated from presets (default): SpecCompiler builds a reference.docx from Lua style presets, storing it at {output_dir}/reference.docx.
  2. User-provided: Set docx.reference_doc in project.yaml to use your own Word template.

3 Style Presets

Style presets are Lua files that declaratively define DOCX styles. They are located at:

models/{template}/styles/{preset}/preset.lua

3.1 Preset Table Structure

A preset file returns a Lua table with the following top-level keys:

Listing : Preset table structure
return {
    name = "My Preset",
    description = "Custom document styles",

    -- Page configuration
    page = {
        size = "A4",              -- "Letter" or "A4"
        orientation = "portrait", -- "portrait" or "landscape"
        margins = {
            top = "2.5cm",
            bottom = "2.5cm",
            left = "3cm",
            right = "2cm",
        },
    },

    -- Paragraph styles (array of style definitions)
    paragraph_styles = { ... },

    -- Table styles (array of table style definitions)
    table_styles = { ... },

    -- Caption formats per float type
    captions = { ... },

    -- Document settings
    settings = {
        default_tab_stop = 720,   -- In twips (720 = 0.5 inch)
        language = "en-US",
    },

    -- Optional: inherit from another preset
    extends = {
        template = "default",     -- Base template
        preset = "base",          -- Base preset name
    },
}

3.2 Paragraph Style Fields

Table : Paragraph style fields
Field Type Default Description
id string required Internal style ID (e.g., “Heading1”)
name string required Display name in Word (e.g., “Heading 1”)
based_on string nil Parent style ID for inheritance
next string nil Style to apply to the next paragraph
font.name string nil Font family name
font.size number nil Font size in points
font.color string nil Hex color without # (e.g., “2F5496”)
font.bold boolean nil Bold text
font.italic boolean nil Italic text
spacing.line number nil Line spacing multiplier (1.0 = single, 1.15, 2.0, etc.)
spacing.before number nil Space before paragraph in points
spacing.after number nil Space after paragraph in points
alignment string nil Text alignment: “left”, “center”, “right”, “both” (justified)
indent.left string nil Left indent (e.g., “0.5in”, “1cm”)
indent.right string nil Right indent
keep_next boolean nil Keep with next paragraph (prevent orphaning)
outline_level integer nil Outline level for TOC (0 = Heading 1, 1 = Heading 2, etc.)

3.3 Paragraph Style Example

Listing : Paragraph style definitions
paragraph_styles = {
    {
        id = "Normal",
        name = "Normal",
        font = { name = "Calibri", size = 11 },
        spacing = { line = 1.15, after = 8 },
        alignment = "left",
    },
    {
        id = "Heading1",
        name = "Heading 1",
        based_on = "Normal",
        next = "Normal",
        font = { name = "Calibri Light", size = 16, color = "2F5496" },
        spacing = { before = 12, after = 0, line = 1.15 },
        keep_next = true,
        outline_level = 0,
    },
    {
        id = "Caption",
        name = "Caption",
        based_on = "Normal",
        font = { name = "Calibri", size = 9, italic = true },
        spacing = { before = 0, after = 10, line = 1.15 },
    },
}

3.4 Table Styles

Listing : Table style definition
table_styles = {
    {
        id = "TableGrid",
        name = "Table Grid",
        borders = {
            top    = { style = "single", width = 0.5, color = "000000" },
            bottom = { style = "single", width = 0.5, color = "000000" },
            left   = { style = "single", width = 0.5, color = "000000" },
            right  = { style = "single", width = 0.5, color = "000000" },
            inside_h = { style = "single", width = 0.5, color = "000000" },
            inside_v = { style = "single", width = 0.5, color = "000000" },
        },
        cell_margins = {
            top = "0.05in",
            bottom = "0.05in",
            left = "0.08in",
            right = "0.08in",
        },
        autofit = true,
    },
}

3.5 Caption Configuration

Listing : Caption configuration per float type
captions = {
    figure = {
        template = "{prefix} {number}: {title}",
        prefix = "Figure",
        separator = ": ",
        style = "Caption",
    },
    table = {
        template = "{prefix} {number}: {title}",
        prefix = "Table",
        separator = ": ",
        style = "Caption",
    },
    listing = {
        template = "{prefix} {number}: {title}",
        prefix = "Listing",
        separator = ": ",
        style = "Caption",
    },
}

3.6 Preset Inheritance

Presets can extend other presets using the extends field. The child preset deeply merges with the base, with child values taking precedence:

Listing : Preset inheritance
-- models/mymodel/styles/academic/preset.lua
return {
    name = "Academic",
    description = "Academic paper styles",

    extends = {
        template = "default",    -- Base template
        preset = "default",      -- Base preset name
    },

    -- Override only what changes
    page = {
        size = "A4",
        margins = { top = "2.5cm", bottom = "2.5cm", left = "3cm", right = "2cm" },
    },

    paragraph_styles = {
        {
            id = "Normal",
            name = "Normal",
            font = { name = "Times New Roman", size = 12 },
            spacing = { line = 1.5, after = 0 },
            alignment = "both",   -- Justified
        },
    },
}

The loader detects circular dependencies and reports them as errors.

3.7 Format-Specific Style Overrides

Beyond the main preset.lua, you can provide format-specific style files:

  • models/{template}/styles/{preset}/docx.lua – DOCX-specific overrides
  • models/{template}/styles/{preset}/html.lua – HTML-specific overrides

These files return tables with keys like float_styles and object_styles that are merged with the base preset at emit time.

4 Postprocessors

Postprocessors manipulate the generated DOCX file after Pandoc produces it. They operate on raw OOXML inside the ZIP archive.

4.1 Loading

The base postprocessor (models/default/postprocessors/docx.lua) is always loaded. It handles:

  • Positioned floats – Converts inline images to anchored format with margin-relative positioning.
  • Caption orphan prevention – Adds keepNext to Caption-styled paragraphs.

Template-specific postprocessors are loaded from models/{template}/postprocessors/docx.lua.

4.2 Hook Interface

A template postprocessor exports functions that are called in sequence:

Table : Postprocessor hook functions
Hook Input Purpose
process_document(content, config, log) document.xml content Modify main document body
process_styles(content, log, config) styles.xml content Modify or inject style definitions
process_numbering(content, log) numbering.xml content Modify list numbering definitions
process_content_types(content, log) [Content_Types].xml content Add content type declarations
process_settings(content, log) settings.xml content Modify document settings
process_rels(content, log) document.xml.rels content Add/modify relationship entries
create_additional_parts(temp_dir, log, config) Temp directory path Create new parts (headers, footers)

All hooks are optional. Each receives the current XML content as a string and returns the modified content.

4.3 Writing a Custom Postprocessor

Create models/mymodel/postprocessors/docx.lua:

Listing : Custom DOCX postprocessor
local M = {}

function M.process_document(content, config, log)
    local modified = content

    -- Example: Add custom watermark text to every paragraph
    -- (Real implementations would use proper OOXML patterns)

    log.debug("[MYMODEL-POST] Processing document.xml")
    return modified
end

function M.process_styles(content, log, config)
    local modified = content

    -- Example: Inject a custom paragraph style
    log.debug("[MYMODEL-POST] Processing styles.xml")
    return modified
end

return M

4.4 The config Parameter

The config table passed to hooks contains:

  • template – The template name
  • docx – DOCX configuration from project.yaml
  • spec_metadata – Specification-level attributes (in create_additional_parts)

5 Filters

Pandoc Lua filters run during the DOCX write phase and convert SpecCompiler format markers to OOXML. The default filter (models/default/filters/docx.lua) handles:

Table : Default DOCX filter conversions
Input Marker Output
RawBlock("speccompiler", "page-break") OOXML page break
RawBlock("speccompiler", "vertical-space:NNNN") OOXML spacing (in twips)
RawBlock("speccompiler", "bookmark-start:ID:NAME") OOXML bookmark start
RawBlock("speccompiler", "math-omml:OMML") OOXML math element
Div.speccompiler-caption OOXML caption with SEQ field
Div.speccompiler-numbered-equation OOXML numbered equation with tab layout
Div.speccompiler-positioned-float Position markers for postprocessor
Link with .ext target Rewritten to .docx target

5.1 When to Use Filters vs Postprocessors

  • Filters operate on the Pandoc AST before DOCX generation. Use them when you need to convert SpecCompiler markers to OOXML elements that Pandoc will then place in the document.
  • Postprocessors operate on the raw OOXML after DOCX generation. Use them when you need to manipulate the final XML directly (style injection, image positioning, headers/footers).

6 project.yaml Configuration

6.1 DOCX-Specific Settings

Listing : DOCX configuration in project.yaml
# Output format entry
outputs:
  - format: docx
    path: build/docx/{spec_id}.docx

# DOCX-specific configuration
docx:
  preset: default              # Style preset name
  # reference_doc: assets/reference.docx  # Custom reference (overrides preset)

6.2 Configuration Precedence

  1. If docx.reference_doc is set, that file is used directly as the Pandoc reference document.
  2. If docx.preset is set (or defaults to the model’s styles), SpecCompiler generates {output_dir}/reference.docx from the preset.
  3. If neither is set, Pandoc uses its built-in default styles.

7 Reference Document Cache

When using presets, SpecCompiler caches the generated reference.docx to avoid regenerating it on every build.

The cache works as follows:

  1. Compute SHA-1 hash of the preset file content.
  2. Compare against the stored hash in the build_meta table (key-value store in specir.db).
  3. If the hashes match and reference.docx exists on disk, skip generation.
  4. If the preset changed or reference.docx is missing, regenerate and update the cache.

To force regeneration of the reference document, delete it:

Listing : Force reference document regeneration
rm -f build/reference.docx
./bin/speccompiler-core

SpecCompiler Core User Manual

1 Introduction

1.1 What is SpecCompiler?

SpecCompiler is a document processing pipeline that transforms structured Markdown specifications into multiple output formats (DOCX, HTML5). It provides:

  • Structured authoring: Define requirements, designs, and verification cases using a consistent syntax.
  • Traceability: Link objects together with Project Identifier (PID) and #label references.
  • Validation: Automatically verify data integrity through proof views backed by Structured Query Language (SQL) queries against the Specification Intermediate Representation (SpecIR).
  • Multi-format output: Generate Word documents and web content from a single source.

SpecCompiler processes documents through a five-phase pipeline: INITIALIZE, ANALYZE, TRANSFORM, VERIFY, and EMIT, as illustrated in Figure 1.

1.2 Scope

This manual covers:

  • Installation and verification of the SpecCompiler-Core Docker image (see Introduction).
  • Configuration of project files (project.yaml) as described in Project Configuration.
  • Authoring specification documents using the SpecCompiler Markdown syntax (Project Configuration).
  • Invocation of the tool and interpretation of its outputs (Invocation).
  • Verification diagnostics and error code reference.
  • Incremental build behavior and cache management.
  • Type system configuration and custom model creation; for a detailed walkthrough, see creating-a-model: Model Directory Layout in the companion model guide.
  • Troubleshooting common problems.

1.3 Pipeline Summary

The processing pipeline consists of five phases:

  1. INITIALIZE – Parse Markdown input via Pandoc Abstract Syntax Tree (AST), extract specifications, spec objects, attributes, floats, relations, and views into the SpecIR stored in SQLite Database (SQLite).
  2. ANALYZE – Resolve relation types and cross-references using specificity-based inference rules (see Equation 1).
  3. TRANSFORM – Resolve floats (render PlantUML, charts, tables), materialize views, rewrite links, and render spec objects using type-specific handlers.
  4. VERIFY – Execute proof views (SQL queries) against the SpecIR database to detect constraint violations; see creating-a-model: Validation Proofs in the model guide.
  5. EMIT – Assemble Pandoc documents from the SpecIR and generate output files in configured formats via parallel Pandoc subprocess invocations.
Figure Processing Pipeline

Figure Operations per Pipeline Phase

The handler counts shown in Figure 2 reflect the default model. Custom models may add handlers in any phase.

2 Installation

2.1 Prerequisites

The following are required to run SpecCompiler-Core:

Table : Runtime prerequisites
Prerequisite Minimum Version Notes
Docker runtime 20.10+ Docker Desktop or Docker Engine (daemon must be running)
Disk space 2 GB For the Docker image and build artifacts
Host OS Linux, macOS, or Windows (with WSL2) The container runs Debian Bookworm (slim)

All dependency versions are pinned in scripts/versions.env.

2.2 Building the Image

Run the Docker installer from the repository root:

Listing : Build and install via Docker
bash scripts/install.sh

This performs three steps:

  1. Docker build – Executes a multi-stage Docker build:
    • Toolchain – Compiles Lua, Pandoc (with GHC), and native Lua extensions (luv, lsqlite3, zip) from source. Builds Deno TypeScript utilities. Downloads and wraps PlantUML with a minimal JRE. This stage is cached and only rebuilt when scripts/versions.env, scripts/build.sh, or src/tools/ change.
    • Runtime-base – Copies only runtime artifacts into a lean Debian Bookworm image without build tools. Installs runtime dependencies (Python/reqif, graphviz, lcov). This is the stable base for code-only updates.
    • Runtime – Overlays src/ and models/ onto runtime-base to produce the final image.
  2. Wrapper generation – Creates a specc Command-Line Interface (CLI) command at ~/.local/bin/specc.
  3. Config – Writes the image reference to ~/.config/speccompiler/env.

The installer supports three modes:

  • Default (bash scripts/install.sh) – Builds the full image if not present.
  • Force (bash scripts/install.sh --force) – Rebuilds everything from scratch, including the toolchain.
  • Code-only (bash scripts/install.sh --code-only) – Updates only src/ and models/ layers without recompiling the toolchain. Always builds from the stable runtime-base image, preventing Docker layer accumulation. Dangling images from previous builds are automatically pruned.

2.3 Verifying Installation

After building, verify the image is available:

Listing : Verify Docker image availability
docker images speccompiler-core

To verify the tool runs correctly, navigate to a directory containing a project.yaml file and run:

Listing : Run wrapper command
specc build

2.4 The specc Wrapper

The specc command is a Docker wrapper generated by the installer. It supports three subcommands:

Table : Wrapper subcommands
Command Description
specc build [project.yaml] Build the project (default file: project.yaml)
specc clean Remove build/ directory and specir.db
specc shell Open an interactive Bash shell inside the container

The build subcommand runs docker run --rm with:

  • --user "$(id -u):$(id -g)" – Preserves host UID/GID.
  • -v "$(pwd):/workspace" – Mounts current directory.
  • -e "SPECCOMPILER_HOME=/opt/speccompiler" – Sets installation root.
  • -e "SPECCOMPILER_DIST=/opt/speccompiler" – Sets distribution root.
  • -e "SPECCOMPILER_LOG_LEVEL=${SPECCOMPILER_LOG_LEVEL:-INFO}" – Passes log level.

Inside the container, the speccompiler-core entry point invokes Pandoc with the SpecCompiler Lua filter. The -o /dev/null Pandoc flag is intentional – actual output files are generated by the EMIT phase.

3 Project Configuration

All project configuration is specified in a project.yaml file located in the project root directory.

3.1 Complete Configuration Reference

Listing : project.yaml reference
# ============================================================================
# Project Identification (REQUIRED)
# ============================================================================
project:
  code: MYPROJ          # Project code identifier (string, required)
  name: My Project SRS  # Human-readable project name (string, required)

# ============================================================================
# Type Model (REQUIRED)
# ============================================================================
template: default        # Type model name (string, default: "default")
                         # Must match a directory under models/

# ============================================================================
# Logging Configuration (OPTIONAL)
# ============================================================================
logging:
  level: info            # DEBUG | INFO | WARN | ERROR (default: "INFO")
  format: auto           # auto | json | text (default: "auto")
  color: true            # ANSI color codes (default: true)

# ============================================================================
# Validation Policy (OPTIONAL)
# ============================================================================
validation:
  missing_required: ignore
  cardinality_over: ignore
  invalid_cast: ignore
  invalid_enum: ignore
  invalid_date: ignore
  bounds_violation: ignore
  dangling_relation: ignore
  unresolved_relation: ignore

# ============================================================================
# Input Files (REQUIRED)
# ============================================================================
output_dir: build/       # Base output directory (default: "build")

doc_files:               # Markdown files to process, in order
  - srs.md
  - sdd.md

# ============================================================================
# Output Format Configurations (OPTIONAL)
# ============================================================================
outputs:
  - format: docx
    path: build/docx/{spec_id}.docx
  - format: html5
    path: build/www/{spec_id}.html

# ============================================================================
# DOCX Configuration (OPTIONAL)
# ============================================================================
docx:
  preset: null           # Style preset name (models/{template}/presets/)
  # reference_doc: assets/reference.docx  # Custom Word reference

# ============================================================================
# HTML5 Configuration (OPTIONAL)
# ============================================================================
html5:
  number_sections: true
  table_of_contents: true
  toc_depth: 3
  standalone: true
  embed_resources: true
  resource_path: build

# ============================================================================
# Bibliography and Citations (OPTIONAL)
# ============================================================================
bibliography: refs.bib
csl: ieee.csl

3.2 Required Fields

Table : Required project fields
Field Type Description
project.code string Project code identifier
project.name string Human-readable project name
doc_files list One or more Markdown file paths to process

3.3 Default Values

Table : Default configuration values
Field Default Notes
template default Built-in base model is always loaded
output_dir build Also stores specir.db
logging.level INFO Overridden by SPECCOMPILER_LOG_LEVEL env var

4 Document Authoring

SpecCompiler extends standard Markdown with a structured overlay for specification documents. The syntax uses existing Markdown constructs (headers, blockquotes, code blocks, links) with specific patterns that the pipeline recognizes.

4.1 Specifications

Level 1 headers declare the top-level document container.

Pattern: # type: Title @PID

Listing : Specification declaration
# srs: Software Requirements Specification @SRS-001

4.2 Spec Objects

Level 2-6 headers declare requirements, design elements, sections, or any typed element.

Pattern: ## type: Title @PID

Listing : Spec object declaration
## hlr: User Authentication @HLR-001
### llr: Password Validation @LLR-001
#### section: Implementation Notes

If @PID is omitted, a PID is auto-generated using the type’s pid_prefix and pid_format.

4.3 Attributes

Blockquotes declare attributes using the key: value pattern. They belong to the most recently opened Specification or SpecObject header and do not need to appear immediately after it:

Listing : Attribute declaration
## hlr: User Authentication @HLR-001

> priority: High

> status: Draft

> rationale: Required by security policy

Rules:

  • Each attribute blockquote must be separated by a blank line.
  • The first line must match key: value (where key is [A-Za-z0-9_]+). If not, the blockquote is treated as prose. The key does not need to be a registered attribute type; unregistered keys default to STRING datatype.
  • Multi-line values are supported: continuation lines append to the preceding attribute.
  • Supported datatypes: STRING, INTEGER, REAL, BOOLEAN, DATE (YYYY-MM-DD), ENUM, XHTML.

4.4 Floats

Fenced code blocks with a typed first class declare numbered elements.

Pattern: ```type.lang:label{key="val"}

4.4.1 PlantUML Diagram

Listing : PlantUML float syntax
```plantuml:diag-state{caption="State Machine"}
@startuml
[*] --> Active
Active --> Inactive
@enduml
```

4.4.2 Table

Listing : Table float syntax
```list-table:tbl-interfaces{caption="External Interfaces"}
> header-rows: 1
> aligns: l,l,l

* - Interface
  - Protocol
  - Direction
* - GPS
  - ARINC-429
  - Input
```

4.4.3 CSV Table

The Comma-Separated Values (CSV) float alias provides a compact syntax for tabular data:

Listing : CSV float syntax
```csv:tbl-data{caption="Sample Data"}
Name,Value,Unit
Temperature,72.5,F
Pressure,1013.25,hPa
```

Both csv and list-table produce TABLE floats. Use csv for simple tabular data and list-table for tables with rich Markdown content in cells. See Floats in Practice for live examples of each.

4.4.4 Listing (Code)

Listing : Listing float syntax
```listing.c:lst-init{caption="Initialization Routine"}
void init(void) {
    setup_hardware();
}
```

4.4.5 Chart (ECharts)

Listing : Chart float syntax
```chart:chart-coverage{caption="Test Coverage"}
{
  "xAxis": { "data": ["Module A", "Module B"] },
  "series": [{ "type": "bar", "data": [95, 87] }]
}
```

Charts support data injection via view modules. Add view="gauss" and params="mean=0,sigma=1" to the code fence attributes to inject generated data into the ECharts configuration at render time. See Figure 5 for a working example.

4.4.6 Math

Listing : Math float syntax
```math:eq-force{caption="Newton's Second Law"}
F = ma
```

Math floats use AsciiMath notation and are rendered to MathML for HTML5 output and OMML for DOCX. See Equation 1 and Equation 2 for live examples in this manual.

4.4.7 Float Syntax Summary

Table : Float syntax components
Component Description
type Float type identifier (for example figure, plantuml, csv, list-table, listing, chart, math)
.lang Optional language hint for syntax highlighting
:label Float label for cross-referencing; must be unique within the specification
{key="val"} Key-value attributes; common attribute: caption

4.5 Relations (Links)

Links use the pattern [content](selector). Selectors are not hardcoded – they are registered by relation types in the model’s type system. Each relation type declares a link_selector field, and the pipeline uses it for resolution and type inference. The default model registers the following selectors:

Table : Default model selectors
Selector Registered by Resolution
@ traceable base (XREF_SEC, and model-specific types) PID lookup: same-spec first, then cross-document fallback
# xref base (XREF_FIGURE, XREF_TABLE, XREF_LISTING, XREF_MATH, XREF_SECP) Scoped label resolution: local scope, then same-spec, then global
@cite XREF_CITATION Rewritten to pandoc Cite element (parenthetical)
@citep XREF_CITATION Rewritten to pandoc Cite element (in-text)

Custom models can register additional selectors by defining relation types with new link_selector values.

Table : Relation syntax patterns
Syntax Example Description
[PID](@) [HLR-001](@) Reference by PID
[type:label](#) [fig:diagram](#) Typed float reference
[scope:type:label](#) [REQ-001:fig:detail](#) Scoped float reference
[key](@cite) [smith2024](@cite) Parenthetical citation
[key](@citep) [smith2024](@citep) In-text citation

4.5.1 Type Inference

After a link is resolved, the inference algorithm scores it against all registered relation types using 4 unweighted dimensions. Each matching dimension adds +1 to the specificity score. A constraint mismatch eliminates the candidate entirely. The total score for a candidate is computed as:

S=i=14di,  di{0,1}
(1)

The four dimensions (d1 through d4) correspond to selector, source attribute, source type, and target type as shown in Table 8:

Table : Type inference scoring dimensions
Dimension Match Constraint mismatch No constraint (NULL)
Selector (@, #, @cite, etc.) +1 Eliminated +0
Source attribute +1 Eliminated +0
Source type +1 Eliminated +0
Target type +1 Eliminated +0

The highest-scoring candidate wins. If two candidates tie, the relation is marked ambiguous. For example, [fig:diagram](#) resolving to a FIGURE float will match XREF_FIGURE (selector # + target type FIGURE = specificity S=2) over the generic xref base (selector # only = specificity S=1).

4.6 Views

Inline code with a specific prefix declares view placeholders:

Listing : Inline view placeholder
`toc:`

Default model view types:

Table : Default view types
Type Aliases Description
toc - — Table of Contents (TOC) from spec object headings
lof lot List of floats (figures, tables, etc.)
abbrev sigla, acronym Define an abbreviation inline: Full Meaning (ABBR)
abbrev_list sigla_list, acronym_list Render a sorted table of all abbreviations defined via abbrev:
math_inline eq, formula Inline math expression rendered to MathML/OMML
gauss gaussian, normal Generate Gaussian distribution data for chart floats

4.7 Body Content

Prose paragraphs, lists, and tables between headers accumulate to the most recently opened Specification or Spec Object.

4.8 File Includes

Split large documents into multiple files using fenced code blocks with the include class:

Listing : Include directive syntax
```include
path/to/chapter1.md
path/to/chapter2.md
```

Each line is a file path relative to the including document’s directory. Absolute paths are also supported. Lines starting with # are treated as comments and ignored.

Include blocks are expanded recursively before the pipeline runs. Circular includes are detected and produce an error. The maximum nesting depth is 100 levels.

Included files are tracked in the build graph for incremental builds – a change to any included file triggers a rebuild.

5 Using the Default Model

5.1 Rationale

The default model ships a complete document authoring toolkit so that authors can write structured technical documents without defining custom types. It provides:

  • Numbered floats – figures, tables, code listings, math equations, PlantUML diagrams, and ECharts charts, each with automatic numbering and captions.
  • Typed cross-references – relation types that resolve @ and # links to specific float and object categories, enabling the pipeline to render appropriate display text (for example, “Figure 3” or “Table 1”).
  • Bibliography citations – integration with Pandoc’s citeproc for parenthetical and in-text citation rendering from BibTeX files.
  • Content views – generated content blocks such as TOC, list of figures, abbreviation tables, and inline math.

The following subsections demonstrate these features with live floats and cross-references. Every float, view, and link shown below is processed by SpecCompiler when this manual is built.

5.2 Floats in Practice

A SpecCompiler document can use all default float types. Each float has a type prefix, a label for cross-referencing, and a caption. The examples below are live – they are rendered when this manual is processed.

5.2.1 Architecture Diagram (PlantUML)

Figure Layered Architecture

5.2.2 Component Table (list-table)

Table : System Components
Component Layer Technology
Web UI Presentation React
Auth Service Business Logic Node.js
Data Service Business Logic Python
Database Persistence PostgreSQL

5.2.3 Performance Metrics (CSV)

Table : Performance Metrics
Metric Target Actual Status
Response time (ms) 200 185 Pass
Throughput (req/s) 1000 1120 Pass
Error rate (%) 1.0 0.3 Pass
Memory usage (MB) 512 487 Pass

5.2.4 Initialization Code (Listing)

Listing : Service Initialization
def initialize(config):
    db = connect(config.db_url)
    auth = AuthService(db)
    return Application(auth, db)

5.2.5 Latency Model (Math)

L=Ttwork+Tprocessing+Tdb
(2)

5.2.6 Throughput Chart (ECharts)

Figure Throughput by Module

5.3 Cross-References

Every float and object defined above can be referenced from prose. The following paragraph demonstrates cross-reference resolution using the # selector.

The system architecture is depicted in Figure 3. Component details, including the technology stack for each layer, are listed in Table 10. Performance targets and actuals are compared in Table 11 – all four metrics pass their thresholds. The initialization logic is shown in Listing 16, and the latency model driving performance requirements is defined by Equation 2. Finally, throughput measurements by module are visualized in Figure 4.

The @ selector resolves by PID and works across documents. For example, this sentence references the introduction of this manual: INDEX. Cross-document references to the companion guides also work; see creating-a-model: Model Directory Layout for the model directory layout and docx-customization: Style Presets for DOCX style presets.

Table : Cross-reference selector comparison
Selector Syntax Resolution
@ (PID) [PID](@) Exact PID lookup. Same-spec first, then cross-document fallback. Never ambiguous.
# (Label) [type:label](#) Scoped resolution: local scope, then same specification, then global. May be ambiguous if multiple matches at the same scope level.

5.4 Section References

Headers without an explicit TYPE: prefix default to the SECTION type. Sections receive auto-generated PIDs and labels that can be used for cross-referencing:

  • PID format: {spec_pid}-sec{depth.numbers} – for example, SRS-sec1, SRS-sec1.2, SRS-sec2.3.1. Use the @ selector: [SRS-sec1.2](@).
  • Label format: section:{title-slug} – for example, ## Introduction produces the label section:introduction. Use the # selector: [section:introduction](#).

The @ selector performs an exact PID lookup and is never ambiguous. The # selector uses scoped resolution (closest scope wins), which is useful when multiple specifications have sections with similar names.

For cross-document section references with the # selector, use the explicit scope syntax: [SPEC-A:section:design](#) to target a section labeled “design” within the specification whose PID is SPEC-A.

This manual references its own sections using both selectors. Here are examples that resolve within this document:

Cross-document references work identically. Because the companion guides are listed in the same project.yaml, these links resolve at build time:

5.5 Citations and Bibliography

SpecCompiler integrates with Pandoc’s citeproc processor for scholarly citations.

Step 1. Add bibliography configuration to project.yaml:

Listing : Bibliography configuration in project.yaml
bibliography: refs.bib
csl: ieee.csl

Step 2. Create a BibTeX file (refs.bib):

Listing : Example BibTeX file
@article{smith2024,
  author  = {Smith, John},
  title   = {Advances in Systems Engineering},
  journal = {IEEE Transactions},
  year    = {2024}
}
@book{jones2023,
  author    = {Jones, Alice},
  title     = {Software Architecture Patterns},
  publisher = {O'Reilly},
  year      = {2023}
}

Step 3. Use citation syntax in your document:

Listing : Citation syntax examples
Recent work [smith2024](@cite) demonstrates the approach.

As Smith [smith2024](@citep) argues, the method is effective.

Multiple sources support this [smith2024;jones2023](@cite).
  • [key](@cite) produces a parenthetical citation – for example, “(Smith, 2024)” in author-date styles or “[1]” in numeric styles.
  • [key](@citep) produces an in-text citation – for example, “Smith (2024)” or “Smith [1]”.
  • Multiple keys separated by ; produce a grouped citation.

Processing pipeline: During the TRANSFORM phase, citation links are rewritten to Pandoc Cite elements. During EMIT, Pandoc’s citeproc processor formats citations and appends a bibliography list to the document according to the configured CSL style.

5.6 Views in Practice

Views generate content blocks from the SpecIR.

5.6.1 Abbreviations

The abbrev: view defines abbreviations inline. On first use, the full meaning is displayed alongside the abbreviation. All definitions are collected for the abbrev_list view shown in the List of Abbreviations appendix.

This manual defines abbreviations on first use throughout the text. For example, Entity-Attribute-Value (EAV) is the database pattern used for flexible attributes, and Newline-Delimited JSON (NDJSON) is the format used for diagnostic output.

The syntax is: `abbrev: Full Meaning Text (ABBREVIATION)`. The abbreviation goes in parentheses at the end.

5.6.2 Inline Math

The eq: prefix renders inline math expressions using AsciiMath notation. For example, the quadratic formula is x=b±b24ac2a, and Euler’s identity is eiπ+1=0.

Inline math is useful for formulas within prose paragraphs, while block math: floats (like Equation 1 and Equation 2) provide numbered equations with captions.

5.6.3 Chart with Data View Injection (Gauss)

Charts can load data dynamically from view modules using the view attribute. The gauss view generates a Gaussian probability density function and injects it into the ECharts dataset. The chart below demonstrates this – the view="gauss" attribute triggers the data injection pipeline:

Figure Standard Normal Distribution

The params attribute passes mean, sigma, xmin, xmax, and points to the Gauss view’s generate() function. The function returns an ECharts dataset that replaces the chart’s placeholder data at render time. This same mechanism supports custom data views that query the SpecIR database; see creating-a-model: Walkthrough: Custom View in the model guide for details on creating view modules.

5.6.4 Generated Lists

The [LOF] and [LOT] views produce navigable lists of figures and tables. These are rendered in the appendices of this manual:

6 Invocation

6.1 Basic Usage

Listing : Basic invocation
specc build

Processes all files from doc_files in the current directory’s project.yaml. An alternative project file can be specified: specc build my-project.yaml.

6.2 Environment Variables

Table : Environment variables
Variable Default Description
SPECCOMPILER_LOG_LEVEL INFO Override log level: DEBUG, INFO, WARN, ERROR
SPECCOMPILER_HOME /opt/speccompiler SpecCompiler installation root (model and binary lookup)
SPECCOMPILER_DIST /opt/speccompiler Distribution root (used internally for external renderers)
SPECCOMPILER_IMAGE speccompiler-core:latest Docker image reference (overrides default in wrapper)
NO_COLOR (unset) Disable ANSI color codes in output

6.3 Exit Codes

Table : Exit Codes
Code Meaning
0 Success: all documents processed and outputs generated
1 Failure: Docker not running or missing config or pipeline error

7 Output Formats

Four output formats are supported. Multiple formats can be generated in a single run.

7.1 DOCX (Microsoft Word)

  • Style presets via docx.preset or custom docx.reference_doc.
  • Model-specific postprocessors for format transformations.

For a complete guide on customizing DOCX output – including paragraph styles, table styles, caption configuration, and postprocessors – see docx-customization: Style Presets and docx-customization: Postprocessors in the companion DOCX Customization guide.

7.2 HTML5

Table : HTML5 output options
Option Type Default Description
number_sections boolean false Add section numbering
table_of_contents boolean false Generate table of contents
toc_depth integer 3 Heading depth for TOC
standalone boolean false Produce complete HTML document
embed_resources boolean false Embed CSS and images inline

7.3 Markdown (GitHub-Flavored Markdown (GFM))

GitHub-Flavored Markdown. Useful for review platforms and static site generators.

7.4 JSON (Pandoc AST)

Full Pandoc AST for programmatic integration with other tools.

8 Verification and Diagnostics

8.1 Diagnostic Output

Diagnostics are emitted in NDJSON format to stderr:

Listing : Diagnostic NDJSON example
{"level":"error","message":"[object_missing_required] Object missing required attribute 'priority' on HLR-001","file":"srs.md","line":42}

8.2 Diagnostic Reference

Table : Validation diagnostics
Policy Key Description
spec_missing_required Specification missing required attribute
spec_invalid_type Invalid specification type reference
object_missing_required Spec object missing required attribute
object_cardinality_over Attribute cardinality exceeded
object_cast_failures Attribute type cast failure
object_invalid_enum Invalid enum value
object_invalid_date Invalid date format (expected YYYY-MM-DD)
object_bounds_violation Value outside declared bounds
object_duplicate_pid Duplicate PID across spec objects
float_orphan Float has no parent object (orphan)
float_duplicate_label Duplicate float label in specification
float_render_failure External render failure
float_invalid_type Invalid float type reference
relation_unresolved Unresolved link (PIDs are case-sensitive)
relation_dangling Dangling relation (target not found)
relation_ambiguous Ambiguous float reference
view_materialization_failure View materialization failure

8.3 Suppressing Validation Rules

Every diagnostic listed in Table 16 can be suppressed or downgraded in project.yaml using its policy key:

Listing : Suppressing a validation rule
validation:
  float_orphan: ignore              # suppress entirely
  relation_unresolved: warn         # downgrade to warning

All proofs default to error (halt the build). Set a key to warn to emit a warning without halting, or ignore to suppress the diagnostic entirely. Custom proofs can define their own policy keys; see creating-a-model: Validation Proofs in the model guide.

9 Incremental Builds

9.1 Build Cache Mechanism

  1. File hashing – SHA-1 hash of each input file.
  2. Include dependency tracking – Tracked in build_graph table.
  3. Cache comparison – Current hashes vs build_cache table.
  4. Skip decision – Unchanged documents reuse cached SpecIR data.

9.2 Forcing a Full Rebuild

Listing : Force full rebuild
specc clean
specc build

10 Type System and Models

10.1 Custom Models

Set template: mymodel in project.yaml. Types load in order:

  1. models/default/types/ – Always loaded first.
  2. models/mymodel/types/ – Loaded as overlay.

For a complete walkthrough on creating custom types, including object types, float types, relation types with inference rules, and view types, see the companion Creating a Custom Model guide. Key sections include:

10.2 Built-in Models

SpecCompiler ships with default and sw_docs. The default model provides general-purpose types (specifications, sections, floats, cross-references, views). The sw_docs model overlays default with types for requirements engineering and traceability:

  • Object types: HLR, LLR, NFR, VC, TR, FD, CSC, CSU, DIC, DD, SF (all extend a common TRACEABLE base with status attribute and PID auto-generation)
  • Specification types: SRS, SDD, SVC, SUM, TRR (document templates with version, status, date)
  • Relation types: TRACES_TO, BELONGS, REALIZES, XREF_DECOMPOSITION, XREF_DIC (traceability links with specificity-based inference)
  • View types: TRACEABILITY_MATRIX, TEST_RESULTS_MATRIX, TEST_EXECUTION_MATRIX, COVERAGE_SUMMARY, REQUIREMENTS_SUMMARY (query-based tables materialized from the SpecIR)
  • Proofs: Traceability chain validation (VC-HLR, TR-VC, FD-CSC/CSU coverage)
  • Postprocessor: Interactive single-file HTML5 web application

The docs/engineering_docs/ directory in this repository uses sw_docs and serves as a living example of the model in practice.

10.3 Type Directory Structure

Listing : Type model directory structure
models/{template}/
  types/
    objects/          # Spec object types
    floats/           # Float types
    relations/        # Relation types
    views/            # View types
    specifications/   # Specification types
  postprocessors/     # Format-specific post-processing
  styles/             # DOCX style presets
  filters/            # Pandoc Lua filters per output format

10.4 Type Module Structure

Object type:

Listing : Object type module example
local M = {}
M.object = {
    id = "HLR",
    long_name = "High-Level Requirement",
    pid_prefix = "HLR",
    pid_format = "%s-%03d",
    attributes = {
        { name = "priority", datatype_ref = "PRIORITY_ENUM",
          min_occurs = 1, max_occurs = 1,
          values = {"High", "Medium", "Low"} },
    }
}
return M

Relation type:

Listing : Relation type module example
local M = {}
M.relation = {
    id = "TRACES_TO",
    link_selector = "@",
    source_type_ref = "LLR",
    target_type_ref = "HLR",
}
return M

10.5 Attribute Schema Fields

Table : Attribute schema fields
Field Type Default Description
name string required Attribute identifier
datatype_ref string STRING STRING, INTEGER, REAL, BOOLEAN, DATE, ENUM, XHTML
min_occurs integer 0 Minimum values (0 optional, 1 required)
max_occurs integer 1 Maximum values
min_value number nil Lower bound for numeric values
max_value number nil Upper bound for numeric values
values list nil Valid enum values

11 Troubleshooting

11.1 Docker Not Running

Error: Docker is not running – Start Docker daemon, verify with docker info.

11.2 No project.yaml Found

Run specc build from the directory containing project.yaml.

11.3 PlantUML Render Failure

Verify PlantUML syntax, ensure Docker image has Java JRE, check @startuml/@enduml markers.

11.4 Unresolved Relations

PIDs are case-sensitive. Verify target PID exists in doc_files. For cross-document references, ensure both documents are listed in the same project.yaml.

11.5 Build Seems Stale

Listing : Clean stale build cache
specc clean
specc build

11.6 Debugging

Listing : Enable debug logging
SPECCOMPILER_LOG_LEVEL=DEBUG specc build

12 Known Limitations

  • No interactive validation – Batch mode only, no LSP or watch mode.
  • Docker-only distribution – Native install requires replicating the full build environment (scripts/build.sh --install is provided but requires all system dependencies).
  • Single-writer SQLite – Concurrent builds cause locking errors; use separate output directories.
  • Float labels per-specification – Same label can exist across specs; use scoped syntax for cross-spec references.
  • PID case sensitivity[hlr-001](@) will not match @HLR-001.

13 List of Figures

14 List of Tables

15 List of Listings

16 List of Abbreviations

AST Pandoc Abstract Syntax Tree
CLI Command-Line Interface
CSV Comma-Separated Values
EAV Entity-Attribute-Value
GFM GitHub-Flavored Markdown
NDJSON Newline-Delimited JSON
PID Project Identifier
SpecIR Specification Intermediate Representation
SQL Structured Query Language
SQLite SQLite Database