Interpretive Provenance Chain (IPC) and Hashing System

Introduction

The Axis Descriptor Lab generates descriptive text from deterministic axis payloads using a constrained LLM layer (Ollama). The system is built on a simple premise: a set of numerical scores and labels (the axes) are authoritative facts about an entity, and the LLM produces prose that interprets those facts without overriding them.

This raises a question that is deceptively difficult to answer:

When the output changes, what caused the change?

Was it a different axis score? A modified system prompt? A model upgrade? A different sampling temperature? Or just the inherent randomness of language model inference?

Without a systematic way to fingerprint every variable that influences a generation, any experiment with the lab is observational – you can see what happened, but you cannot determine why. Behavioural shifts may be misattributed to axis changes when the real cause was a prompt edit. Model upgrades may alter output in ways that are invisible without a baseline. Seed variance may introduce noise that masks real signal.

The Interpretive Provenance Chain (IPC) is the project’s answer to this problem. It is a composite SHA-256 fingerprint of every variable that influences a generation, implemented as a chain of individual hashes combined into a single identifier. The guarantee is simple:

Two generations with the same IPC ID used identical inputs in every respect. If their outputs differ, the difference is attributable solely to LLM stochasticity.

The IPC upgrades the lab from an observational tool to a reproducible scientific instrument.

Pipe-Works Design Philosophy

The IPC is grounded in four principles from the broader Pipe-Works project:

  • Determinism over optimisation – the system controls what it can; the LLM is allowed to vary only within fingerprinted boundaries.

  • Inspectability over mystique – every variable is visible, hashed, and stored.

  • Programmatic truth over narrative authority – the LLM interprets but never decides.

  • Failure as data – even when the LLM produces unexpected output, the IPC ensures the conditions are recorded for analysis.

Source: _working/axis_lab/pipeworks_interpretive_provenance_chains.md, section 8.

What this document covers:

  1. The authoritative/ornamental boundary that makes IPC meaningful

  2. The four hash functions and their normalisation rules

  3. The normalisation philosophy in depth

  4. A complete generation walkthrough showing the chain in action

  5. Every integration point in the codebase

  6. Practical use cases for drift detection and reproducibility

  7. Design decisions and trade-offs

  8. How the hashing system is tested

  9. References to the original design documents

  10. A glossary of key terms


The Authoritative/Ornamental Boundary

The Axis Descriptor Lab operates on a strict two-layer architecture. Understanding this boundary is essential to understanding why the IPC exists and what it fingerprints.

The Authoritative Layer (Deterministic)

The authoritative layer consists of data that the system controls completely:

  • Axis scores and labels – numerical values (0.0–1.0) and their human-readable descriptors (e.g., “weary”, “threadbare”). These are the entity’s ground truth.

  • Policy rules – the mapping table that assigns labels to score ranges (e.g., age 0.75 maps to “old”). Identified by a policy_hash.

  • Seed – the deterministic RNG seed that produced the axis scores. Same seed, same scores.

  • World ID – the Pipe-Works world context identifier.

The Ornamental Layer (Stochastic)

The ornamental layer is the LLM’s output:

  • A paragraph of descriptive prose that interprets the authoritative data.

  • It modulates tone, chooses words, and compresses structured state into natural language.

  • It never makes decisions, overrides scores, or introduces facts not present in the payload.

This principle is enforced at every level of the system. The system prompt itself contains the line:

“The system is authoritative. You are ornamental.”

app/lab_only/prompts/character_description/system_prompt_v01.txt, line 29

Why the Boundary Matters for Hashing

The boundary between authoritative and ornamental is exactly the boundary the IPC fingerprints. The input side (payload, prompt, model, parameters) is the experimental condition. The output side (LLM text) is the experimental observation.

AUTHORITATIVE (deterministic)           ORNAMENTAL (stochastic)
=================================       ========================
 AxisPayload (axes, scores, seed)        LLM-generated paragraph
 Policy rules (policy_hash)              - interprets, never decides
 System prompt (constraint text)         - modulates tone
 Model + sampling parameters             - compresses state to prose
          |                                        |
          +---- IPC fingerprints this boundary ----+
                      completely

The IPC captures the complete experimental condition so that observations can be meaningfully compared. If two runs share the same IPC ID, any difference in output is pure stochastic variation – not a change in the experiment.


The Four Hashes

The IPC system uses four SHA-256 hash functions, each targeting a different component of the generation pipeline. All are implemented in the dedicated app/hashing.py module and return 64-character lowercase hexadecimal digest strings.

Payload Hash (compute_payload_hash)

What it fingerprints: The complete deterministic entity state – all axis labels, all axis scores, the world ID, the policy hash, and the seed. This is the authoritative data that drives the generation.

Normalisation: The payload dictionary is serialised to JSON with sort_keys=True to eliminate Python dict insertion-order variation. ensure_ascii=False preserves Unicode characters faithfully.

canonical = json.dumps(payload_dict, sort_keys=True, ensure_ascii=False)
return hashlib.sha256(canonical.encode("utf-8")).hexdigest()

Key design choice: The function accepts a plain dict, not a Pydantic model, to keep the hashing module dependency-free. Callers convert their models first:

input_hash = compute_payload_hash(payload.model_dump())

Example:

Given this payload:

{
  "axes": {
    "health": {"label": "weary", "score": 0.5},
    "age": {"label": "old", "score": 0.7}
  },
  "policy_hash": "abc123",
  "seed": 42,
  "world_id": "test_world"
}

The canonical JSON (sorted keys) is:

{"axes": {"age": {"label": "old", "score": 0.7}, "health": {"label": "weary", "score": 0.5}}, "policy_hash": "abc123", "seed": 42, "world_id": "test_world"}

Note how age now appears before health due to key sorting. This ensures that the same payload always produces the same hash, regardless of the order keys were inserted in Python.

See also: app.hashing.compute_payload_hash()

System Prompt Hash (compute_system_prompt_hash)

What it fingerprints: The constraint layer – the system prompt that tells the LLM what it is and is not allowed to do.

Why it matters: System prompts evolve. Small changes such as adding “avoid metaphor”, changing sentence limits, or adjusting escalation rules can materially change LLM behaviour. Without hashing the prompt, behavioural shifts may be misattributed to axis changes or model differences. The prompt text is part of the experimental condition and must be fingerprinted.

Source: _working/axis_lab/pipeworks_lab_prompt_hash.md, section 3.

Normalisation rules (applied in order):

  1. Split the text into individual lines.

  2. Strip leading and trailing whitespace from each line. This removes editor-introduced indentation and trailing spaces without altering the words on the line.

  3. Rejoin the stripped lines with \n.

  4. Strip leading and trailing blank lines from the entire result. Internal blank lines (paragraph breaks) are preserved because they may carry structural meaning in multi-section prompts.

  5. Do not lowercase. Case is semantic – “NEVER” and “never” may carry different emphasis for the LLM.

Example:

These two prompt strings produce the same hash:

Raw A:  "  line one  \n  line two  "
Raw B:  "line one\nline two"

Both normalise to "line one\nline two" before hashing.

See also: app.hashing.compute_system_prompt_hash() and app.hashing._normalise_system_prompt()

Output Hash (compute_output_hash)

What it fingerprints: The observable result – the exact text the LLM produced.

Why it exists: The output hash enables detecting whether two runs produced identical output, even when the input conditions were the same. If two generations share the same IPC ID but have different output hashes, the LLM exhibited stochastic drift – it produced different prose from identical inputs.

Normalisation rules (applied in order):

  1. Strip leading and trailing whitespace from the entire string.

  2. Collapse runs of two or more consecutive ASCII space characters (U+0020) into a single space. Only ASCII space is targeted – newlines, tabs, and other whitespace are left intact so that paragraph structure is preserved.

  3. Preserve punctuation exactly as-is.

  4. Preserve letter casing exactly as-is.

  5. Preserve sentence and line order exactly as-is.

The core normalisation is a single regex:

collapsed = re.sub(r" {2,}", " ", stripped)

This targets only ASCII space runs (" {2,}"), not the broader \s pattern, which would also collapse newlines and tabs.

Important

The output hash normalisation is intentionally more conservative than the system prompt normalisation. LLM output may use tabs or specific spacing patterns as part of its structure. Only the most clearly non-semantic variation (runs of multiple spaces) is normalised.

See also: app.hashing.compute_output_hash() and app.hashing._normalise_output()

IPC Identifier (compute_ipc_id)

What it fingerprints: The complete provenance chain – every variable that could influence the generation, combined into a single identifier.

Formula:

IPC_ID = SHA-256(
    input_hash
    + ":" + system_prompt_hash
    + ":" + model
    + ":" + str(temperature)
    + ":" + str(max_tokens)
    + ":" + str(seed)
)

Why a composite hash: Combining component hashes into a single identifier means you can answer “did these two generations use identical conditions?” with a single string comparison rather than comparing six fields individually. This is especially valuable when grouping runs in log analysis or drift detection.

The colon delimiter: The colon (:) is a non-hex character that prevents collisions from field concatenation. Without it, the concatenation input_hash="ab" + system_prompt_hash="cd" would produce the same string as input_hash="abc" + system_prompt_hash="d". The delimiter makes these unambiguous: "ab:cd" vs "abc:d".

Source: Implementation recommendation from _working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md.

Example:

ipc_id = compute_ipc_id(
    input_hash="d8bd1395713454e4...",       # 64-char hex
    system_prompt_hash="7f3a9c6e4b8f1d2c...", # 64-char hex
    model="gemma2:2b",
    temperature=0.2,
    max_tokens=120,
    seed=2954173979,
)
# Returns: "4a2e7f91..." (64-char hex)

If any single field changes – even the seed by one digit – the IPC ID changes. This is verified by the test suite, which modifies each field individually and asserts a different hash.

See also: app.hashing.compute_ipc_id()


Normalisation Philosophy

Normalisation is the most subtle and error-prone part of the hashing system. Getting it wrong silently undermines the entire provenance chain. This section explains the conservative approach in depth.

The Normalisation Dilemma

There is a tension between two failure modes:

  • Too aggressive (e.g., lowercasing, removing all whitespace): Semantically different texts hash identically, creating false-positive matches. You think two runs used the same prompt, but they did not.

  • Too conservative (e.g., hashing raw bytes): Identical texts with trivial formatting differences hash differently, creating false-positive mismatches. You think two runs used different prompts, but the only difference was a trailing space added by an editor.

The system must find the precise boundary between semantic and non-semantic variation.

Normalisation Principles

Rule

Rationale

Applied To

Never lowercase

Case carries semantic meaning (“NEVER” vs “never”)

Prompts, Output

Preserve internal structure

Line order, sentence order, paragraph breaks may be meaningful

Prompts, Output

Strip edge whitespace

Editors often add trailing spaces/newlines; these are noise

Prompts, Output

Collapse 2+ spaces to 1

LLMs sometimes produce inconsistent spacing; this is noise

Output only

Strip per-line whitespace

Editor indentation is noise in prompt files

Prompts only

Preserve tabs in output

Tab characters in LLM output may represent structure

Output only

Why Prompts and Outputs Have Different Rules

System prompts are human-authored text files that accumulate editor artifacts over time: indentation from copy-paste, trailing spaces from line editing, blank lines added by auto-formatters. The per-line strip is appropriate because no human writes a prompt where leading indentation is semantically meaningful to the LLM.

LLM output is machine-generated and may use tabs or specific spacing patterns as part of its structure. Only the most clearly non-semantic variation – runs of multiple consecutive spaces – is normalised. The regex re.sub(r" {2,}", " ", text) targets only ASCII space (U+0020), never \s, so newlines, tabs, and other whitespace are preserved.

What Normalisation Does NOT Do

The following operations are intentionally avoided:

  • Does not remove duplicate sentences – repetition may be a legitimate LLM output pattern.

  • Does not reorder anything – line order, sentence order, and paragraph order are always preserved.

  • Does not normalise Unicode (e.g., NFC/NFKC) – this is a conscious choice. If Unicode normalisation becomes needed, it should be added explicitly as a documented change.

  • Does not strip internal blank lines from prompts – these are paragraph breaks that may carry structural meaning.

  • Does not touch punctuation – commas, periods, dashes, and all other punctuation are preserved exactly.

  • Does not trim internal whitespace from prompts – only leading/trailing whitespace on each line is stripped.

The Invariant

The normalisation rules are designed to uphold a single invariant:

Any edit that changes meaning will change the hash. Any edit that does not change meaning will not change the hash.

This ensures that the hashing system is both sensitive to real changes and robust against formatting noise.


The Provenance Chain in Action

This section walks through a complete generation cycle, showing exactly when and where each hash is computed, what data flows through the chain, and what the final response looks like.

End-to-End Data Flow

 +-------------------------------------------------------------------+
 |  Frontend: User edits axes, selects model, clicks Generate        |
 |                                                                   |
 |  Sends POST /api/generate:                                       |
 |    { payload: {...}, model: "gemma2:2b",                          |
 |      temperature: 0.2, max_tokens: 120 }                         |
 +-------------------------------+-----------------------------------+
                                 |
                                 v
 +-------------------------------------------------------------------+
 |  Backend: generate() route handler                                |
 |                                                                   |
 |  1. Resolve system prompt                                         |
 |     - Use custom override from request, OR                        |
 |     - Load default from the supported local prompt roots          |
 |                                                                   |
 |  2. Serialise payload as pretty-printed JSON                      |
 |     - This becomes the "user turn" sent to the LLM               |
 |                                                                   |
 |  3. Call Ollama                                                   |
 |     - Send system prompt + serialised payload                     |
 |     - Receive generated text + usage metrics                      |
 |                                                                   |
 |  4. Compute IPC hashes:                                           |
 |     input_hash        = compute_payload_hash(payload.model_dump())|
 |     system_prompt_hash = compute_system_prompt_hash(prompt)       |
 |     output_hash       = compute_output_hash(text)                 |
 |     ipc_id            = compute_ipc_id(                           |
 |         input_hash, system_prompt_hash,                           |
 |         model, temperature, max_tokens, seed)                     |
 +-------------------------------+-----------------------------------+
                                 |
                                 v
 +-------------------------------------------------------------------+
 |  GenerateResponse (JSON):                                         |
 |  {                                                                |
 |    "text": "A weathered figure stands near the threshold...",     |
 |    "model": "gemma2:2b",                                          |
 |    "temperature": 0.2,                                            |
 |    "input_hash": "d8bd139571345...",                              |
 |    "system_prompt_hash": "7f3a9c6e4b8f1...",                     |
 |    "output_hash": "9cbe31f3d1e7a...",                             |
 |    "ipc_id": "4a2e7f9183bc2..."                                   |
 |  }                                                                |
 +-------------------------------+-----------------------------------+
                                 |
                                 v
 +-------------------------------------------------------------------+
 |  Frontend: Display in UI                                          |
 |                                                                   |
 |  Output text appears in the output box.                           |
 |  Meta area shows three lines:                                     |
 |                                                                   |
 |  model: gemma2:2b  .  temp: 0.2  .  seed: 2954173979             |
 |  input: d8bd139571345...  .  prompt: 7f3a9c6e4b8f1...            |
 |    .  output: 9cbe31f3d1e7a...                                   |
 |  ipc: 4a2e7f9183bc2...                                            |
 +-------------------------------------------------------------------+

Step-by-Step Walkthrough

Step 1 – Prompt resolution. The backend checks whether the request includes a custom system_prompt override. If not, it loads the default prompt from the supported local prompt roots, currently app/lab_only/prompts/character_description/system_prompt_v01.txt. The resolved prompt is the text that will be hashed as system_prompt_hash.

Step 2 – Payload serialisation. The AxisPayload is serialised to pretty-printed JSON (with 2-space indentation) and sent as the user turn to Ollama. This is the text the LLM “sees” as the user message.

Step 3 – Ollama generation. The backend calls Ollama’s /api/generate endpoint with the system prompt and serialised payload. Ollama returns the generated text and optional usage metrics (prompt tokens, generation tokens).

Step 4 – Hash computation. After the Ollama call succeeds, the backend computes all four hashes. The order matters: input_hash and system_prompt_hash are computed from the request data, output_hash from the response text, and ipc_id from the combination of all provenance fields. All four are included in the GenerateResponse.

Step 5 – Frontend display. The frontend displays the hashes in a three-line meta area below the output text. All hashes are truncated to 16 characters (out of 64) for UI readability. The full 64-character hashes are available in the JSON response and in saved files.


Integration Points

This section catalogues every place in the codebase where IPC hashes are computed, transmitted, stored, or displayed.

/api/generate – Full IPC Chain

Every successful generation returns all four hashes. The hashes are computed after the Ollama call completes (the output_hash needs the generated text).

The GenerateResponse Pydantic model defines the four IPC fields:

  • input_hash – SHA-256 of the canonical AxisPayload

  • system_prompt_hash – SHA-256 of the normalised system prompt

  • output_hash – SHA-256 of the normalised output text

  • ipc_id – the composite Interpretive Provenance Chain identifier

All four fields are str | None with default=None on the schema, but the generate endpoint always populates them.

Implementation: app/main.py, generate route handler. Schema: app/schema/generate.py, GenerateResponse class.

/api/log – Backward-Compatible IPC

The log endpoint appends structured entries to the configured log root, typically AXIS_LAB_LOGS_DIR/run_log.jsonl on Luminal and repo-local logs/run_log.jsonl only as a local-development fallback. It supports IPC hashes with backward compatibility:

  • input_hash and output_hash are always computed (these only need the payload and output text, which are required parameters).

  • system_prompt_hash and ipc_id are only computed when the optional system_prompt parameter is provided.

  • When system_prompt is omitted, both fields are null in the log entry.

This design ensures that older frontend versions (or external callers) that do not send the prompt can still log successfully. The IPC fields on LogEntry are Optional[str] with default=None, so existing JSONL records written before IPC was implemented can still be deserialised without error.

Implementation: app/main.py, log_run route handler. Schema: app/schema/generate.py, LogEntry class.

/api/save – Persistent IPC Record

The save endpoint writes session state to a timestamped folder under the configured writable data root, typically AXIS_LAB_DATA_DIR on Luminal and repo-local data/ only as a local-development fallback. IPC hashes are persisted in two locations:

1. metadata.json – All four hashes appear as top-level fields:

{
  "folder_name": "20260218_143022_d8bd1395",
  "timestamp": "2026-02-18T14:30:22.504478+00:00",
  "input_hash": "d8bd1395713454e4...",
  "system_prompt_hash": "7f3a9c6e4b8f1d2c...",
  "output_hash": "9cbe31f3d1e7a2c6...",
  "ipc_id": "4a2e7f9183bc2d4e...",
  "model": "gemma2:2b",
  "temperature": 0.2,
  "max_tokens": 120,
  "seed": 2954173979,
  "world_id": "pipeworks_web",
  "policy_hash": "d845cdcf...",
  "axis_count": 11
}

2. output.md – The system_prompt_hash and ipc_id appear as HTML comments in the provenance header, making the saved file self-documenting:

<!-- system_prompt_hash: 7f3a9c6e4b8f1d2c... -->
<!-- ipc_id: 4a2e7f9183bc2d4e... -->

Conditional computation: The system_prompt_hash is always computed (the system prompt is a required field in SaveRequest). However, output_hash and ipc_id are only computed when the user has generated output before saving. Without output, the provenance chain is incomplete, and both fields are null.

Implementation: app/routes_save.py, save route handler. Schema: app/schema/save.py, SaveRequest and SaveResponse classes.

Frontend Display

The frontend (app/static/mod-generate.js) displays IPC hashes in a meta table below the generated output:

model: gemma2:2b  .  temp: 0.2  .  seed: 2954173979
input: d8bd139571345...  .  prompt: 7f3a9c6e4b8f1...  .  output: 9cbe31f3d1e7a...
ipc: 4a2e7f9183bc2...

All hashes are truncated to 16 characters via .slice(0, 16) followed by an ellipsis character. The ipc_id gets its own line to visually distinguish the composite identifier from the component hashes.

The CSS class .output-meta uses white-space: pre-wrap so that the newline characters in the meta string render as line breaks.

Hash Availability Summary

Endpoint

input_hash

system_prompt_hash

output_hash

ipc_id

/api/generate

Always

Always

Always

Always

/api/log

Always

When prompt provided

Always

When prompt provided

/api/save response

Always

Always

When output exists

When output exists

metadata.json (saved)

Always

Always

When output exists

When output exists

output.md (saved)

In header

In header

N/A (is the content)

In header

Frontend display

Always

Always

Always

Always


Use Cases and Experimental Scenarios

The IPC system enables four categories of analysis that were not possible before its implementation.

Detecting Prompt Drift

Scenario: You modify the system prompt from v01 to v02 (e.g., adding “avoid metaphor”). You generate text from the same payload with the same model and parameters.

What the hashes reveal:

  • input_hashunchanged (same payload)

  • system_prompt_hashdifferent (prompt text changed)

  • ipc_iddifferent (one component changed)

  • output_hashdifferent (the LLM responded differently)

Conclusion: The behavioural change is attributable to the prompt change, not to the model or the input. The IPC separates the variables.

Without the IPC, you would see different output and have no way to determine whether the cause was the prompt, the model, or random variation.

Detecting Model Drift

Scenario: You upgrade Ollama’s model (e.g., a new release of gemma2:2b). Same payload, same prompt, same parameters.

What the hashes reveal:

  • input_hashunchanged

  • system_prompt_hashunchanged

  • ipc_idunchanged (if the model string is identical)

  • output_hashmay differ (model internals changed)

Conclusion: If the IPC ID matches but the output hash differs, the model’s internal behaviour changed even though all user-controlled inputs were identical. This is Ollama-level or model-weight-level drift – invisible without the IPC.

Reproducibility Audit

Scenario: You need to verify that a saved session can be reproduced.

Procedure:

  1. Load metadata.json from a saved session.

  2. Extract the ipc_id.

  3. Re-run the generation with the same payload, prompt, model, and parameters.

  4. Compare the new ipc_id to the saved one – they should match.

  5. Compare the output_hash – if they differ, the model exhibited stochastic variation even under identical conditions.

This provides a rigorous, quantifiable measure of reproducibility.

Grouping Runs for Analysis

Scenario: You have 50 log entries in the configured run log and want to analyse output stability.

Procedure:

  1. Group entries by ipc_id.

  2. Within each group, examine output_hash variation to measure output stability under identical conditions.

  3. Across groups, compare how different conditions produce different outputs.

  4. Use system_prompt_hash to isolate the effect of prompt changes.

  5. Use input_hash to isolate the effect of payload changes.

This enables systematic, data-driven analysis of LLM behaviour – the kind of analysis that is impossible without provenance tracking.


Design Decisions and Trade-offs

This section documents the why behind specific technical choices, for future maintainers and contributors.

Why SHA-256?

SHA-256 is fast, widely available in Python’s standard library (hashlib), and produces a 64-character hex digest that is long enough to be collision-resistant for this use case. It is the same algorithm used elsewhere in the project (e.g., policy_hash). There is no need for a cryptographic commitment scheme – the hashes are fingerprints, not signatures.

Why Not Hash Raw Bytes?

Raw file contents include editor artifacts: trailing whitespace, different line endings (LF vs CRLF), BOM characters. Two developers editing the same prompt in different editors would produce different hashes for semantically identical prompts. Normalisation eliminates this source of false mismatch.

Source: _working/axis_lab/pipeworks_lab_prompt_hash.md, section 4.1: “We do not hash raw file contents. We hash a normalised version to avoid meaningless diffs.”

Why Is the Hashing Module Dependency-Free?

app/hashing.py imports only hashlib, json, and re from the standard library. It does not import Pydantic models from app/schema/. This is intentional: it keeps the module usable from any context (tests, scripts, future tools) without pulling in the web framework’s dependencies. Callers convert Pydantic models to plain dicts before calling compute_payload_hash.

Why sort_keys=True for Payload Canonicalisation?

Python dicts are insertion-ordered since 3.7, but different code paths may construct the same payload with keys in different order. Sorted-key serialisation guarantees the same JSON string regardless of construction order. This is verified by the test TestComputePayloadHash.test_order_independent in tests/test_hashing.py.

Why Colon Delimiters in the IPC ID?

Without a delimiter, concatenating field values could produce ambiguous strings. The colon is a non-hex character that creates unambiguous field boundaries. This prevents collisions like "ab" + "cd" vs "abc" + "d", which would produce the same concatenated string "abcd" but different colon-delimited strings "ab:cd" vs "abc:d".

This is verified by the test TestComputeIpcId.test_colon_delimiter_prevents_collision in tests/test_hashing.py.

Source: Implementation recommendation from _working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md: “concatenate the hex digests with a non-hex delimiter (like a colon) before hashing the final string.”

Why Are IPC Fields Optional on LogEntry?

Backward compatibility. The logging endpoint existed before IPC hashing was implemented. Existing JSONL records have no IPC fields; making them Optional with default=None means those records can still be deserialised by the updated LogEntry schema without breaking.

Future Considerations

The original design document proposes an “Optional advanced version” of the IPC ID that includes the Ollama build hash, model digest, and host platform for even higher reproducibility guarantees:

IPC_ID_advanced = SHA-256(
    input_hash
    + ":" + system_prompt_hash
    + ":" + model
    + ":" + temperature
    + ":" + max_tokens
    + ":" + seed
    + ":" + ollama_build_hash
    + ":" + model_digest
    + ":" + host_platform
)

This depends on Ollama API support for exposing build and model digest information. It remains a valuable long-term goal.

Source: _working/axis_lab/pipeworks_interpretive_provenance_chains.md, section 5.


Testing the Hashing System

The hashing system has comprehensive test coverage in tests/test_hashing.py. This section describes the test organisation and the properties verified.

Test Organisation

The test module contains six test classes, each targeting a single function:

Test Class

Function Under Test

Tests

TestNormaliseSystemPrompt

_normalise_system_prompt

8

TestNormaliseOutput

_normalise_output

7

TestComputeSystemPromptHash

compute_system_prompt_hash

5

TestComputeOutputHash

compute_output_hash

5

TestComputeIpcId

compute_ipc_id

5

TestComputePayloadHash

compute_payload_hash

4

Total

34

Properties Verified

Every test class verifies four properties (as stated in the test module docstring):

  1. Correctness of normalisation rules – edge cases, boundary conditions, empty inputs, whitespace-only inputs.

  2. Determinism – same input always produces same output.

  3. Sensitivity – different inputs produce different outputs.

  4. Format – 64-character lowercase hex digest (where applicable).

Running the Tests

# Run hashing tests only
pytest tests/test_hashing.py -v

# Run with coverage
pytest tests/test_hashing.py -v --cov=app.hashing --cov-report=term

Extending the Normalisation Rules

If a normalisation rule needs to change (e.g., adding Unicode NFC normalisation), follow this procedure:

  1. Add tests for the new behaviour first (test-driven development).

  2. Update the private normalisation function in app/hashing.py.

  3. Verify that all existing tests still pass – unless the change is intentionally breaking.

  4. Update this guide to document the new rule.

Warning

Changing normalisation rules invalidates all previously computed hashes for affected hash types. This is a breaking change for any stored data (saved sessions, JSONL logs) that references those hashes. Treat normalisation changes with the same care as a database schema migration.


Design Document References

The IPC system was designed based on three documents in the project’s _working/axis_lab/ directory. These documents are preserved for historical reference and contain the original reasoning and specifications.

Primary Design Document

_working/axis_lab/pipeworks_interpretive_provenance_chains.md“Axis Descriptor Lab – Reproducibility & Trace Methodology”

This document formalises the IPC concept, defines the required hashes, specifies the IPC ID formula, and articulates the experimental guarantees. It establishes the theoretical framework that the implementation realises.

System Prompt Hashing Specification

_working/axis_lab/pipeworks_lab_prompt_hash.md“System Prompt Hashing – Reproducibility & Drift Control”

This document identifies the gap that system prompt changes were untracked, defines the normalisation strategy for prompt text, and proposes a three-phase rollout from observational to scientific capability. Its concluding statement captures the motivation:

“Without system_prompt_hash, the lab is observational. With system_prompt_hash, the lab becomes scientific.”

Enhancement Assessment

_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md“Assessment of Proposed Enhancements for Axis Descriptor Lab”

A technical assessment of both design documents, evaluating their feasibility and alignment with the project’s philosophy. This document recommended the colon-delimiter approach for the IPC ID and noted the value of consolidating hashing logic into a single utility module – both of which were adopted in the implementation.

Note

These documents are in the _working/ directory and are not part of the published package. They represent the design rationale and are preserved as historical reference within the repository.


Glossary

Authoritative layer

The deterministic data that drives generation: axis scores, labels, seeds, policy rules. Never overridden by the LLM. The system is authoritative.

Ornamental layer

The LLM-generated descriptive text. Interprets authoritative data but has no authority of its own. The LLM is ornamental.

Interpretive Provenance Chain (IPC)

A composite SHA-256 fingerprint of all variables that influence a generation. The chain links payload, prompt, model, sampling parameters, and seed into a single reproducibility signature.

IPC ID

The single 64-character hexadecimal digest computed from the provenance chain. Two generations with identical IPC IDs used identical inputs in every respect.

Normalisation

The process of reducing text to a canonical form before hashing. Removes noise (trivial whitespace, editor artifacts) while preserving signal (case, punctuation, structure). Ensures that semantically identical texts produce identical hashes.

Payload hash (input_hash)

SHA-256 of the canonically serialised AxisPayload (sorted-key JSON). Fingerprints the complete deterministic entity state.

System prompt hash (system_prompt_hash)

SHA-256 of the normalised system prompt text. Fingerprints the constraint layer that governs LLM behaviour.

Output hash (output_hash)

SHA-256 of the normalised LLM output text. Fingerprints the observable interpretive artifact produced by the generation.

Drift

A change in LLM output that is not attributable to a change in user-controlled inputs. May be caused by model updates, prompt changes, or stochastic sampling. The IPC enables detecting and attributing drift to specific variables.

Provenance

The complete record of origin and processing history for a generated output. In this system, provenance includes the payload, prompt, model, sampling parameters, and seed – everything needed to reproduce the generation.