Interpretive Provenance Chain (IPC) and Hashing System¶
Introduction¶
The Axis Descriptor Lab generates descriptive text from deterministic axis payloads using a constrained LLM layer (Ollama). The system is built on a simple premise: a set of numerical scores and labels (the axes) are authoritative facts about an entity, and the LLM produces prose that interprets those facts without overriding them.
This raises a question that is deceptively difficult to answer:
When the output changes, what caused the change?
Was it a different axis score? A modified system prompt? A model upgrade? A different sampling temperature? Or just the inherent randomness of language model inference?
Without a systematic way to fingerprint every variable that influences a generation, any experiment with the lab is observational – you can see what happened, but you cannot determine why. Behavioural shifts may be misattributed to axis changes when the real cause was a prompt edit. Model upgrades may alter output in ways that are invisible without a baseline. Seed variance may introduce noise that masks real signal.
The Interpretive Provenance Chain (IPC) is the project’s answer to this problem. It is a composite SHA-256 fingerprint of every variable that influences a generation, implemented as a chain of individual hashes combined into a single identifier. The guarantee is simple:
Two generations with the same IPC ID used identical inputs in every respect. If their outputs differ, the difference is attributable solely to LLM stochasticity.
The IPC upgrades the lab from an observational tool to a reproducible scientific instrument.
Pipe-Works Design Philosophy
The IPC is grounded in four principles from the broader Pipe-Works project:
Determinism over optimisation – the system controls what it can; the LLM is allowed to vary only within fingerprinted boundaries.
Inspectability over mystique – every variable is visible, hashed, and stored.
Programmatic truth over narrative authority – the LLM interprets but never decides.
Failure as data – even when the LLM produces unexpected output, the IPC ensures the conditions are recorded for analysis.
Source: _working/axis_lab/pipeworks_interpretive_provenance_chains.md,
section 8.
What this document covers:
The authoritative/ornamental boundary that makes IPC meaningful
The four hash functions and their normalisation rules
The normalisation philosophy in depth
A complete generation walkthrough showing the chain in action
Every integration point in the codebase
Practical use cases for drift detection and reproducibility
Design decisions and trade-offs
How the hashing system is tested
References to the original design documents
A glossary of key terms
The Four Hashes¶
The IPC system uses four SHA-256 hash functions, each targeting a
different component of the generation pipeline. All are implemented in
the dedicated app/hashing.py module and return 64-character lowercase
hexadecimal digest strings.
Payload Hash (compute_payload_hash)¶
What it fingerprints: The complete deterministic entity state – all axis labels, all axis scores, the world ID, the policy hash, and the seed. This is the authoritative data that drives the generation.
Normalisation: The payload dictionary is serialised to JSON with
sort_keys=True to eliminate Python dict insertion-order variation.
ensure_ascii=False preserves Unicode characters faithfully.
canonical = json.dumps(payload_dict, sort_keys=True, ensure_ascii=False)
return hashlib.sha256(canonical.encode("utf-8")).hexdigest()
Key design choice: The function accepts a plain dict, not a
Pydantic model, to keep the hashing module dependency-free. Callers
convert their models first:
input_hash = compute_payload_hash(payload.model_dump())
Example:
Given this payload:
{
"axes": {
"health": {"label": "weary", "score": 0.5},
"age": {"label": "old", "score": 0.7}
},
"policy_hash": "abc123",
"seed": 42,
"world_id": "test_world"
}
The canonical JSON (sorted keys) is:
{"axes": {"age": {"label": "old", "score": 0.7}, "health": {"label": "weary", "score": 0.5}}, "policy_hash": "abc123", "seed": 42, "world_id": "test_world"}
Note how age now appears before health due to key sorting. This
ensures that the same payload always produces the same hash, regardless
of the order keys were inserted in Python.
See also: app.hashing.compute_payload_hash()
System Prompt Hash (compute_system_prompt_hash)¶
What it fingerprints: The constraint layer – the system prompt that tells the LLM what it is and is not allowed to do.
Why it matters: System prompts evolve. Small changes such as adding “avoid metaphor”, changing sentence limits, or adjusting escalation rules can materially change LLM behaviour. Without hashing the prompt, behavioural shifts may be misattributed to axis changes or model differences. The prompt text is part of the experimental condition and must be fingerprinted.
Source: _working/axis_lab/pipeworks_lab_prompt_hash.md, section 3.
Normalisation rules (applied in order):
Split the text into individual lines.
Strip leading and trailing whitespace from each line. This removes editor-introduced indentation and trailing spaces without altering the words on the line.
Rejoin the stripped lines with
\n.Strip leading and trailing blank lines from the entire result. Internal blank lines (paragraph breaks) are preserved because they may carry structural meaning in multi-section prompts.
Do not lowercase. Case is semantic – “NEVER” and “never” may carry different emphasis for the LLM.
Example:
These two prompt strings produce the same hash:
Raw A: " line one \n line two "
Raw B: "line one\nline two"
Both normalise to "line one\nline two" before hashing.
See also: app.hashing.compute_system_prompt_hash() and
app.hashing._normalise_system_prompt()
Output Hash (compute_output_hash)¶
What it fingerprints: The observable result – the exact text the LLM produced.
Why it exists: The output hash enables detecting whether two runs produced identical output, even when the input conditions were the same. If two generations share the same IPC ID but have different output hashes, the LLM exhibited stochastic drift – it produced different prose from identical inputs.
Normalisation rules (applied in order):
Strip leading and trailing whitespace from the entire string.
Collapse runs of two or more consecutive ASCII space characters (
U+0020) into a single space. Only ASCII space is targeted – newlines, tabs, and other whitespace are left intact so that paragraph structure is preserved.Preserve punctuation exactly as-is.
Preserve letter casing exactly as-is.
Preserve sentence and line order exactly as-is.
The core normalisation is a single regex:
collapsed = re.sub(r" {2,}", " ", stripped)
This targets only ASCII space runs (" {2,}"), not the broader \s
pattern, which would also collapse newlines and tabs.
Important
The output hash normalisation is intentionally more conservative than the system prompt normalisation. LLM output may use tabs or specific spacing patterns as part of its structure. Only the most clearly non-semantic variation (runs of multiple spaces) is normalised.
See also: app.hashing.compute_output_hash() and
app.hashing._normalise_output()
IPC Identifier (compute_ipc_id)¶
What it fingerprints: The complete provenance chain – every variable that could influence the generation, combined into a single identifier.
Formula:
IPC_ID = SHA-256(
input_hash
+ ":" + system_prompt_hash
+ ":" + model
+ ":" + str(temperature)
+ ":" + str(max_tokens)
+ ":" + str(seed)
)
Why a composite hash: Combining component hashes into a single identifier means you can answer “did these two generations use identical conditions?” with a single string comparison rather than comparing six fields individually. This is especially valuable when grouping runs in log analysis or drift detection.
The colon delimiter: The colon (:) is a non-hex character that
prevents collisions from field concatenation. Without it, the
concatenation input_hash="ab" + system_prompt_hash="cd" would
produce the same string as input_hash="abc" + system_prompt_hash="d".
The delimiter makes these unambiguous: "ab:cd" vs "abc:d".
Source: Implementation recommendation from
_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md.
Example:
ipc_id = compute_ipc_id(
input_hash="d8bd1395713454e4...", # 64-char hex
system_prompt_hash="7f3a9c6e4b8f1d2c...", # 64-char hex
model="gemma2:2b",
temperature=0.2,
max_tokens=120,
seed=2954173979,
)
# Returns: "4a2e7f91..." (64-char hex)
If any single field changes – even the seed by one digit – the IPC ID changes. This is verified by the test suite, which modifies each field individually and asserts a different hash.
See also: app.hashing.compute_ipc_id()
Normalisation Philosophy¶
Normalisation is the most subtle and error-prone part of the hashing system. Getting it wrong silently undermines the entire provenance chain. This section explains the conservative approach in depth.
The Normalisation Dilemma¶
There is a tension between two failure modes:
Too aggressive (e.g., lowercasing, removing all whitespace): Semantically different texts hash identically, creating false-positive matches. You think two runs used the same prompt, but they did not.
Too conservative (e.g., hashing raw bytes): Identical texts with trivial formatting differences hash differently, creating false-positive mismatches. You think two runs used different prompts, but the only difference was a trailing space added by an editor.
The system must find the precise boundary between semantic and non-semantic variation.
Normalisation Principles¶
Rule |
Rationale |
Applied To |
|---|---|---|
Never lowercase |
Case carries semantic meaning (“NEVER” vs “never”) |
Prompts, Output |
Preserve internal structure |
Line order, sentence order, paragraph breaks may be meaningful |
Prompts, Output |
Strip edge whitespace |
Editors often add trailing spaces/newlines; these are noise |
Prompts, Output |
Collapse 2+ spaces to 1 |
LLMs sometimes produce inconsistent spacing; this is noise |
Output only |
Strip per-line whitespace |
Editor indentation is noise in prompt files |
Prompts only |
Preserve tabs in output |
Tab characters in LLM output may represent structure |
Output only |
Why Prompts and Outputs Have Different Rules¶
System prompts are human-authored text files that accumulate editor artifacts over time: indentation from copy-paste, trailing spaces from line editing, blank lines added by auto-formatters. The per-line strip is appropriate because no human writes a prompt where leading indentation is semantically meaningful to the LLM.
LLM output is machine-generated and may use tabs or specific spacing
patterns as part of its structure. Only the most clearly non-semantic
variation – runs of multiple consecutive spaces – is normalised. The
regex re.sub(r" {2,}", " ", text) targets only ASCII space (U+0020),
never \s, so newlines, tabs, and other whitespace are preserved.
What Normalisation Does NOT Do¶
The following operations are intentionally avoided:
Does not remove duplicate sentences – repetition may be a legitimate LLM output pattern.
Does not reorder anything – line order, sentence order, and paragraph order are always preserved.
Does not normalise Unicode (e.g., NFC/NFKC) – this is a conscious choice. If Unicode normalisation becomes needed, it should be added explicitly as a documented change.
Does not strip internal blank lines from prompts – these are paragraph breaks that may carry structural meaning.
Does not touch punctuation – commas, periods, dashes, and all other punctuation are preserved exactly.
Does not trim internal whitespace from prompts – only leading/trailing whitespace on each line is stripped.
The Invariant¶
The normalisation rules are designed to uphold a single invariant:
Any edit that changes meaning will change the hash. Any edit that does not change meaning will not change the hash.
This ensures that the hashing system is both sensitive to real changes and robust against formatting noise.
The Provenance Chain in Action¶
This section walks through a complete generation cycle, showing exactly when and where each hash is computed, what data flows through the chain, and what the final response looks like.
End-to-End Data Flow¶
+-------------------------------------------------------------------+
| Frontend: User edits axes, selects model, clicks Generate |
| |
| Sends POST /api/generate: |
| { payload: {...}, model: "gemma2:2b", |
| temperature: 0.2, max_tokens: 120 } |
+-------------------------------+-----------------------------------+
|
v
+-------------------------------------------------------------------+
| Backend: generate() route handler |
| |
| 1. Resolve system prompt |
| - Use custom override from request, OR |
| - Load default from the supported local prompt roots |
| |
| 2. Serialise payload as pretty-printed JSON |
| - This becomes the "user turn" sent to the LLM |
| |
| 3. Call Ollama |
| - Send system prompt + serialised payload |
| - Receive generated text + usage metrics |
| |
| 4. Compute IPC hashes: |
| input_hash = compute_payload_hash(payload.model_dump())|
| system_prompt_hash = compute_system_prompt_hash(prompt) |
| output_hash = compute_output_hash(text) |
| ipc_id = compute_ipc_id( |
| input_hash, system_prompt_hash, |
| model, temperature, max_tokens, seed) |
+-------------------------------+-----------------------------------+
|
v
+-------------------------------------------------------------------+
| GenerateResponse (JSON): |
| { |
| "text": "A weathered figure stands near the threshold...", |
| "model": "gemma2:2b", |
| "temperature": 0.2, |
| "input_hash": "d8bd139571345...", |
| "system_prompt_hash": "7f3a9c6e4b8f1...", |
| "output_hash": "9cbe31f3d1e7a...", |
| "ipc_id": "4a2e7f9183bc2..." |
| } |
+-------------------------------+-----------------------------------+
|
v
+-------------------------------------------------------------------+
| Frontend: Display in UI |
| |
| Output text appears in the output box. |
| Meta area shows three lines: |
| |
| model: gemma2:2b . temp: 0.2 . seed: 2954173979 |
| input: d8bd139571345... . prompt: 7f3a9c6e4b8f1... |
| . output: 9cbe31f3d1e7a... |
| ipc: 4a2e7f9183bc2... |
+-------------------------------------------------------------------+
Step-by-Step Walkthrough¶
Step 1 – Prompt resolution. The backend checks whether the request
includes a custom system_prompt override. If not, it loads the default
prompt from the supported local prompt roots, currently
app/lab_only/prompts/character_description/system_prompt_v01.txt.
The resolved prompt is the text that will be hashed as system_prompt_hash.
Step 2 – Payload serialisation. The AxisPayload is serialised
to pretty-printed JSON (with 2-space indentation) and sent as the user
turn to Ollama. This is the text the LLM “sees” as the user message.
Step 3 – Ollama generation. The backend calls Ollama’s
/api/generate endpoint with the system prompt and serialised payload.
Ollama returns the generated text and optional usage metrics (prompt
tokens, generation tokens).
Step 4 – Hash computation. After the Ollama call succeeds, the
backend computes all four hashes. The order matters: input_hash and
system_prompt_hash are computed from the request data, output_hash
from the response text, and ipc_id from the combination of all
provenance fields. All four are included in the GenerateResponse.
Step 5 – Frontend display. The frontend displays the hashes in a three-line meta area below the output text. All hashes are truncated to 16 characters (out of 64) for UI readability. The full 64-character hashes are available in the JSON response and in saved files.
Integration Points¶
This section catalogues every place in the codebase where IPC hashes are computed, transmitted, stored, or displayed.
/api/generate – Full IPC Chain¶
Every successful generation returns all four hashes. The hashes are
computed after the Ollama call completes (the output_hash needs the
generated text).
The GenerateResponse Pydantic model defines the four IPC fields:
input_hash– SHA-256 of the canonical AxisPayloadsystem_prompt_hash– SHA-256 of the normalised system promptoutput_hash– SHA-256 of the normalised output textipc_id– the composite Interpretive Provenance Chain identifier
All four fields are str | None with default=None on the schema,
but the generate endpoint always populates them.
Implementation: app/main.py, generate route handler.
Schema: app/schema/generate.py, GenerateResponse class.
/api/log – Backward-Compatible IPC¶
The log endpoint appends structured entries to the configured log root,
typically AXIS_LAB_LOGS_DIR/run_log.jsonl on Luminal and repo-local
logs/run_log.jsonl only as a local-development fallback.
It supports IPC hashes with backward compatibility:
input_hashandoutput_hashare always computed (these only need the payload and output text, which are required parameters).system_prompt_hashandipc_idare only computed when the optionalsystem_promptparameter is provided.When
system_promptis omitted, both fields arenullin the log entry.
This design ensures that older frontend versions (or external callers)
that do not send the prompt can still log successfully. The IPC fields
on LogEntry are Optional[str] with default=None, so existing
JSONL records written before IPC was implemented can still be
deserialised without error.
Implementation: app/main.py, log_run route handler.
Schema: app/schema/generate.py, LogEntry class.
/api/save – Persistent IPC Record¶
The save endpoint writes session state to a timestamped folder under the
configured writable data root, typically AXIS_LAB_DATA_DIR on Luminal and
repo-local data/ only as a local-development fallback. IPC hashes are
persisted in two locations:
1. metadata.json – All four hashes appear as top-level fields:
{
"folder_name": "20260218_143022_d8bd1395",
"timestamp": "2026-02-18T14:30:22.504478+00:00",
"input_hash": "d8bd1395713454e4...",
"system_prompt_hash": "7f3a9c6e4b8f1d2c...",
"output_hash": "9cbe31f3d1e7a2c6...",
"ipc_id": "4a2e7f9183bc2d4e...",
"model": "gemma2:2b",
"temperature": 0.2,
"max_tokens": 120,
"seed": 2954173979,
"world_id": "pipeworks_web",
"policy_hash": "d845cdcf...",
"axis_count": 11
}
2. output.md – The system_prompt_hash and ipc_id appear as
HTML comments in the provenance header, making the saved file
self-documenting:
<!-- system_prompt_hash: 7f3a9c6e4b8f1d2c... -->
<!-- ipc_id: 4a2e7f9183bc2d4e... -->
Conditional computation: The system_prompt_hash is always
computed (the system prompt is a required field in SaveRequest).
However, output_hash and ipc_id are only computed when the user
has generated output before saving. Without output, the provenance
chain is incomplete, and both fields are null.
Implementation: app/routes_save.py, save route handler.
Schema: app/schema/save.py, SaveRequest and SaveResponse classes.
Frontend Display¶
The frontend (app/static/mod-generate.js) displays IPC hashes in a
meta table below the generated output:
model: gemma2:2b . temp: 0.2 . seed: 2954173979
input: d8bd139571345... . prompt: 7f3a9c6e4b8f1... . output: 9cbe31f3d1e7a...
ipc: 4a2e7f9183bc2...
All hashes are truncated to 16 characters via .slice(0, 16) followed
by an ellipsis character. The ipc_id gets its own line to visually
distinguish the composite identifier from the component hashes.
The CSS class .output-meta uses white-space: pre-wrap so that the
newline characters in the meta string render as line breaks.
Hash Availability Summary¶
Endpoint |
|
|
|
|
|---|---|---|---|---|
|
Always |
Always |
Always |
Always |
|
Always |
When prompt provided |
Always |
When prompt provided |
|
Always |
Always |
When output exists |
When output exists |
|
Always |
Always |
When output exists |
When output exists |
|
In header |
In header |
N/A (is the content) |
In header |
Frontend display |
Always |
Always |
Always |
Always |
Use Cases and Experimental Scenarios¶
The IPC system enables four categories of analysis that were not possible before its implementation.
Detecting Prompt Drift¶
Scenario: You modify the system prompt from v01 to v02 (e.g., adding “avoid metaphor”). You generate text from the same payload with the same model and parameters.
What the hashes reveal:
input_hash– unchanged (same payload)system_prompt_hash– different (prompt text changed)ipc_id– different (one component changed)output_hash– different (the LLM responded differently)
Conclusion: The behavioural change is attributable to the prompt change, not to the model or the input. The IPC separates the variables.
Without the IPC, you would see different output and have no way to determine whether the cause was the prompt, the model, or random variation.
Detecting Model Drift¶
Scenario: You upgrade Ollama’s model (e.g., a new release of
gemma2:2b). Same payload, same prompt, same parameters.
What the hashes reveal:
input_hash– unchangedsystem_prompt_hash– unchangedipc_id– unchanged (if the model string is identical)output_hash– may differ (model internals changed)
Conclusion: If the IPC ID matches but the output hash differs, the model’s internal behaviour changed even though all user-controlled inputs were identical. This is Ollama-level or model-weight-level drift – invisible without the IPC.
Reproducibility Audit¶
Scenario: You need to verify that a saved session can be reproduced.
Procedure:
Load
metadata.jsonfrom a saved session.Extract the
ipc_id.Re-run the generation with the same payload, prompt, model, and parameters.
Compare the new
ipc_idto the saved one – they should match.Compare the
output_hash– if they differ, the model exhibited stochastic variation even under identical conditions.
This provides a rigorous, quantifiable measure of reproducibility.
Grouping Runs for Analysis¶
Scenario: You have 50 log entries in the configured run log and want to analyse output stability.
Procedure:
Group entries by
ipc_id.Within each group, examine
output_hashvariation to measure output stability under identical conditions.Across groups, compare how different conditions produce different outputs.
Use
system_prompt_hashto isolate the effect of prompt changes.Use
input_hashto isolate the effect of payload changes.
This enables systematic, data-driven analysis of LLM behaviour – the kind of analysis that is impossible without provenance tracking.
Design Decisions and Trade-offs¶
This section documents the why behind specific technical choices, for future maintainers and contributors.
Why SHA-256?¶
SHA-256 is fast, widely available in Python’s standard library
(hashlib), and produces a 64-character hex digest that is long enough
to be collision-resistant for this use case. It is the same algorithm
used elsewhere in the project (e.g., policy_hash). There is no need
for a cryptographic commitment scheme – the hashes are fingerprints,
not signatures.
Why Not Hash Raw Bytes?¶
Raw file contents include editor artifacts: trailing whitespace, different line endings (LF vs CRLF), BOM characters. Two developers editing the same prompt in different editors would produce different hashes for semantically identical prompts. Normalisation eliminates this source of false mismatch.
Source: _working/axis_lab/pipeworks_lab_prompt_hash.md, section 4.1:
“We do not hash raw file contents. We hash a normalised version to
avoid meaningless diffs.”
Why Is the Hashing Module Dependency-Free?¶
app/hashing.py imports only hashlib, json, and re from the
standard library. It does not import Pydantic models from
app/schema/. This is intentional: it keeps the module usable from
any context (tests, scripts, future tools) without pulling in the web
framework’s dependencies. Callers convert Pydantic models to plain
dicts before calling compute_payload_hash.
Why sort_keys=True for Payload Canonicalisation?¶
Python dicts are insertion-ordered since 3.7, but different code paths
may construct the same payload with keys in different order. Sorted-key
serialisation guarantees the same JSON string regardless of construction
order. This is verified by the test
TestComputePayloadHash.test_order_independent in
tests/test_hashing.py.
Why Colon Delimiters in the IPC ID?¶
Without a delimiter, concatenating field values could produce ambiguous
strings. The colon is a non-hex character that creates unambiguous
field boundaries. This prevents collisions like
"ab" + "cd" vs "abc" + "d", which would produce the same
concatenated string "abcd" but different colon-delimited strings
"ab:cd" vs "abc:d".
This is verified by the test
TestComputeIpcId.test_colon_delimiter_prevents_collision in
tests/test_hashing.py.
Source: Implementation recommendation from
_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md:
“concatenate the hex digests with a non-hex delimiter (like a colon)
before hashing the final string.”
Why Are IPC Fields Optional on LogEntry?¶
Backward compatibility. The logging endpoint existed before IPC hashing
was implemented. Existing JSONL records have no IPC fields; making them
Optional with default=None means those records can still be
deserialised by the updated LogEntry schema without breaking.
Future Considerations¶
The original design document proposes an “Optional advanced version” of the IPC ID that includes the Ollama build hash, model digest, and host platform for even higher reproducibility guarantees:
IPC_ID_advanced = SHA-256(
input_hash
+ ":" + system_prompt_hash
+ ":" + model
+ ":" + temperature
+ ":" + max_tokens
+ ":" + seed
+ ":" + ollama_build_hash
+ ":" + model_digest
+ ":" + host_platform
)
This depends on Ollama API support for exposing build and model digest information. It remains a valuable long-term goal.
Source: _working/axis_lab/pipeworks_interpretive_provenance_chains.md,
section 5.
Testing the Hashing System¶
The hashing system has comprehensive test coverage in
tests/test_hashing.py. This section describes the test organisation
and the properties verified.
Test Organisation¶
The test module contains six test classes, each targeting a single function:
Test Class |
Function Under Test |
Tests |
|---|---|---|
|
|
8 |
|
|
7 |
|
|
5 |
|
|
5 |
|
|
5 |
|
|
4 |
Total |
34 |
Properties Verified¶
Every test class verifies four properties (as stated in the test module docstring):
Correctness of normalisation rules – edge cases, boundary conditions, empty inputs, whitespace-only inputs.
Determinism – same input always produces same output.
Sensitivity – different inputs produce different outputs.
Format – 64-character lowercase hex digest (where applicable).
Running the Tests¶
# Run hashing tests only
pytest tests/test_hashing.py -v
# Run with coverage
pytest tests/test_hashing.py -v --cov=app.hashing --cov-report=term
Extending the Normalisation Rules¶
If a normalisation rule needs to change (e.g., adding Unicode NFC normalisation), follow this procedure:
Add tests for the new behaviour first (test-driven development).
Update the private normalisation function in
app/hashing.py.Verify that all existing tests still pass – unless the change is intentionally breaking.
Update this guide to document the new rule.
Warning
Changing normalisation rules invalidates all previously computed hashes for affected hash types. This is a breaking change for any stored data (saved sessions, JSONL logs) that references those hashes. Treat normalisation changes with the same care as a database schema migration.
Design Document References¶
The IPC system was designed based on three documents in the project’s
_working/axis_lab/ directory. These documents are preserved for
historical reference and contain the original reasoning and
specifications.
Primary Design Document¶
_working/axis_lab/pipeworks_interpretive_provenance_chains.md
– “Axis Descriptor Lab – Reproducibility & Trace Methodology”
This document formalises the IPC concept, defines the required hashes, specifies the IPC ID formula, and articulates the experimental guarantees. It establishes the theoretical framework that the implementation realises.
System Prompt Hashing Specification¶
_working/axis_lab/pipeworks_lab_prompt_hash.md
– “System Prompt Hashing – Reproducibility & Drift Control”
This document identifies the gap that system prompt changes were untracked, defines the normalisation strategy for prompt text, and proposes a three-phase rollout from observational to scientific capability. Its concluding statement captures the motivation:
“Without system_prompt_hash, the lab is observational. With system_prompt_hash, the lab becomes scientific.”
Enhancement Assessment¶
_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md
– “Assessment of Proposed Enhancements for Axis Descriptor Lab”
A technical assessment of both design documents, evaluating their feasibility and alignment with the project’s philosophy. This document recommended the colon-delimiter approach for the IPC ID and noted the value of consolidating hashing logic into a single utility module – both of which were adopted in the implementation.
Note
These documents are in the _working/ directory and are not part of
the published package. They represent the design rationale and are
preserved as historical reference within the repository.
Glossary¶
- Authoritative layer
The deterministic data that drives generation: axis scores, labels, seeds, policy rules. Never overridden by the LLM. The system is authoritative.
- Ornamental layer
The LLM-generated descriptive text. Interprets authoritative data but has no authority of its own. The LLM is ornamental.
- Interpretive Provenance Chain (IPC)
A composite SHA-256 fingerprint of all variables that influence a generation. The chain links payload, prompt, model, sampling parameters, and seed into a single reproducibility signature.
- IPC ID
The single 64-character hexadecimal digest computed from the provenance chain. Two generations with identical IPC IDs used identical inputs in every respect.
- Normalisation
The process of reducing text to a canonical form before hashing. Removes noise (trivial whitespace, editor artifacts) while preserving signal (case, punctuation, structure). Ensures that semantically identical texts produce identical hashes.
- Payload hash (
input_hash) SHA-256 of the canonically serialised AxisPayload (sorted-key JSON). Fingerprints the complete deterministic entity state.
- System prompt hash (
system_prompt_hash) SHA-256 of the normalised system prompt text. Fingerprints the constraint layer that governs LLM behaviour.
- Output hash (
output_hash) SHA-256 of the normalised LLM output text. Fingerprints the observable interpretive artifact produced by the generation.
- Drift
A change in LLM output that is not attributable to a change in user-controlled inputs. May be caused by model updates, prompt changes, or stochastic sampling. The IPC enables detecting and attributing drift to specific variables.
- Provenance
The complete record of origin and processing history for a generated output. In this system, provenance includes the payload, prompt, model, sampling parameters, and seed – everything needed to reproduce the generation.