(ipc-and-hashing)=

# Interpretive Provenance Chain (IPC) and Hashing System

## Introduction

The Axis Descriptor Lab generates descriptive text from deterministic axis
payloads using a constrained LLM layer (Ollama).  The system is built on a
simple premise: a set of numerical scores and labels (the *axes*) are
authoritative facts about an entity, and the LLM produces prose that
*interprets* those facts without overriding them.

This raises a question that is deceptively difficult to answer:

> When the output changes, what caused the change?

Was it a different axis score?  A modified system prompt?  A model upgrade?
A different sampling temperature?  Or just the inherent randomness of
language model inference?

Without a systematic way to fingerprint every variable that influences a
generation, any experiment with the lab is **observational** -- you can see
what happened, but you cannot determine *why*.  Behavioural shifts may be
misattributed to axis changes when the real cause was a prompt edit.  Model
upgrades may alter output in ways that are invisible without a baseline.
Seed variance may introduce noise that masks real signal.

The **Interpretive Provenance Chain (IPC)** is the project's answer to this
problem.  It is a composite SHA-256 fingerprint of *every* variable that
influences a generation, implemented as a chain of individual hashes
combined into a single identifier.  The guarantee is simple:

> Two generations with the same IPC ID used identical inputs in every
> respect.  If their outputs differ, the difference is attributable solely
> to LLM stochasticity.

The IPC upgrades the lab from an observational tool to a reproducible
scientific instrument.

:::{admonition} Pipe-Works Design Philosophy
:class: note

The IPC is grounded in four principles from the broader Pipe-Works project:

- **Determinism over optimisation** -- the system controls what it can; the
  LLM is allowed to vary only within fingerprinted boundaries.
- **Inspectability over mystique** -- every variable is visible, hashed,
  and stored.
- **Programmatic truth over narrative authority** -- the LLM interprets
  but never decides.
- **Failure as data** -- even when the LLM produces unexpected output,
  the IPC ensures the conditions are recorded for analysis.

*Source: `_working/axis_lab/pipeworks_interpretive_provenance_chains.md`,
section 8.*
:::

**What this document covers:**

1. The authoritative/ornamental boundary that makes IPC meaningful
2. The four hash functions and their normalisation rules
3. The normalisation philosophy in depth
4. A complete generation walkthrough showing the chain in action
5. Every integration point in the codebase
6. Practical use cases for drift detection and reproducibility
7. Design decisions and trade-offs
8. How the hashing system is tested
9. References to the original design documents
10. A glossary of key terms

---

## The Authoritative/Ornamental Boundary

The Axis Descriptor Lab operates on a strict two-layer architecture.
Understanding this boundary is essential to understanding why the IPC
exists and what it fingerprints.

### The Authoritative Layer (Deterministic)

The authoritative layer consists of data that the system controls
completely:

- **Axis scores and labels** -- numerical values (0.0--1.0) and their
  human-readable descriptors (e.g., "weary", "threadbare").  These are
  the entity's ground truth.
- **Policy rules** -- the mapping table that assigns labels to score
  ranges (e.g., age 0.75 maps to "old").  Identified by a `policy_hash`.
- **Seed** -- the deterministic RNG seed that produced the axis scores.
  Same seed, same scores.
- **World ID** -- the Pipe-Works world context identifier.

### The Ornamental Layer (Stochastic)

The ornamental layer is the LLM's output:

- A paragraph of descriptive prose that *interprets* the authoritative
  data.
- It modulates tone, chooses words, and compresses structured state into
  natural language.
- It **never** makes decisions, overrides scores, or introduces facts
  not present in the payload.

This principle is enforced at every level of the system.  The system
prompt itself contains the line:

> "The system is authoritative.  You are ornamental."
>
> -- `app/lab_only/prompts/character_description/system_prompt_v01.txt`, line 29

### Why the Boundary Matters for Hashing

The boundary between authoritative and ornamental is *exactly* the
boundary the IPC fingerprints.  The input side (payload, prompt, model,
parameters) is the **experimental condition**.  The output side (LLM text)
is the **experimental observation**.

```text
AUTHORITATIVE (deterministic)           ORNAMENTAL (stochastic)
=================================       ========================
 AxisPayload (axes, scores, seed)        LLM-generated paragraph
 Policy rules (policy_hash)              - interprets, never decides
 System prompt (constraint text)         - modulates tone
 Model + sampling parameters             - compresses state to prose
          |                                        |
          +---- IPC fingerprints this boundary ----+
                      completely
```

The IPC captures the complete experimental condition so that
observations can be meaningfully compared.  If two runs share the
same IPC ID, any difference in output is pure stochastic variation --
not a change in the experiment.

---

## The Four Hashes

The IPC system uses four SHA-256 hash functions, each targeting a
different component of the generation pipeline.  All are implemented in
the dedicated `app/hashing.py` module and return 64-character lowercase
hexadecimal digest strings.

### Payload Hash (`compute_payload_hash`)

**What it fingerprints:**  The complete deterministic entity state --
all axis labels, all axis scores, the world ID, the policy hash, and
the seed.  This is the authoritative data that drives the generation.

**Normalisation:**  The payload dictionary is serialised to JSON with
`sort_keys=True` to eliminate Python dict insertion-order variation.
`ensure_ascii=False` preserves Unicode characters faithfully.

```python
canonical = json.dumps(payload_dict, sort_keys=True, ensure_ascii=False)
return hashlib.sha256(canonical.encode("utf-8")).hexdigest()
```

**Key design choice:**  The function accepts a plain `dict`, not a
Pydantic model, to keep the hashing module dependency-free.  Callers
convert their models first:

```python
input_hash = compute_payload_hash(payload.model_dump())
```

**Example:**

Given this payload:

```json
{
  "axes": {
    "health": {"label": "weary", "score": 0.5},
    "age": {"label": "old", "score": 0.7}
  },
  "policy_hash": "abc123",
  "seed": 42,
  "world_id": "test_world"
}
```

The canonical JSON (sorted keys) is:

```json
{"axes": {"age": {"label": "old", "score": 0.7}, "health": {"label": "weary", "score": 0.5}}, "policy_hash": "abc123", "seed": 42, "world_id": "test_world"}
```

Note how `age` now appears before `health` due to key sorting.  This
ensures that the same payload always produces the same hash, regardless
of the order keys were inserted in Python.

*See also: {func}`app.hashing.compute_payload_hash`*

### System Prompt Hash (`compute_system_prompt_hash`)

**What it fingerprints:**  The constraint layer -- the system prompt
that tells the LLM what it is and is not allowed to do.

**Why it matters:**  System prompts evolve.  Small changes such as
adding "avoid metaphor", changing sentence limits, or adjusting
escalation rules can materially change LLM behaviour.  Without hashing
the prompt, behavioural shifts may be misattributed to axis changes or
model differences.  The prompt text is part of the experimental
condition and must be fingerprinted.

*Source: `_working/axis_lab/pipeworks_lab_prompt_hash.md`, section 3.*

**Normalisation rules** (applied in order):

1. Split the text into individual lines.
2. Strip leading and trailing whitespace from **each** line.  This
   removes editor-introduced indentation and trailing spaces without
   altering the words on the line.
3. Rejoin the stripped lines with `\n`.
4. Strip leading and trailing blank lines from the **entire** result.
   Internal blank lines (paragraph breaks) are preserved because they
   may carry structural meaning in multi-section prompts.
5. Do **not** lowercase.  Case is semantic -- "NEVER" and "never" may
   carry different emphasis for the LLM.

**Example:**

These two prompt strings produce the **same** hash:

```text
Raw A:  "  line one  \n  line two  "
Raw B:  "line one\nline two"
```

Both normalise to `"line one\nline two"` before hashing.

*See also: {func}`app.hashing.compute_system_prompt_hash` and
{func}`app.hashing._normalise_system_prompt`*

### Output Hash (`compute_output_hash`)

**What it fingerprints:**  The observable result -- the exact text the
LLM produced.

**Why it exists:**  The output hash enables detecting whether two runs
produced identical output, even when the input conditions were the
same.  If two generations share the same IPC ID but have different
output hashes, the LLM exhibited stochastic drift -- it produced
different prose from identical inputs.

**Normalisation rules** (applied in order):

1. Strip leading and trailing whitespace from the entire string.
2. Collapse runs of two or more consecutive ASCII space characters
   (`U+0020`) into a single space.  Only ASCII space is targeted --
   newlines, tabs, and other whitespace are left intact so that
   paragraph structure is preserved.
3. Preserve punctuation exactly as-is.
4. Preserve letter casing exactly as-is.
5. Preserve sentence and line order exactly as-is.

The core normalisation is a single regex:

```python
collapsed = re.sub(r" {2,}", " ", stripped)
```

This targets only ASCII space runs (`" {2,}"`), not the broader `\s`
pattern, which would also collapse newlines and tabs.

:::{important}
The output hash normalisation is intentionally **more conservative** than
the system prompt normalisation.  LLM output may use tabs or specific
spacing patterns as part of its structure.  Only the most clearly
non-semantic variation (runs of multiple spaces) is normalised.
:::

*See also: {func}`app.hashing.compute_output_hash` and
{func}`app.hashing._normalise_output`*

### IPC Identifier (`compute_ipc_id`)

**What it fingerprints:**  The complete provenance chain -- every
variable that could influence the generation, combined into a single
identifier.

**Formula:**

```text
IPC_ID = SHA-256(
    input_hash
    + ":" + system_prompt_hash
    + ":" + model
    + ":" + str(temperature)
    + ":" + str(max_tokens)
    + ":" + str(seed)
)
```

**Why a composite hash:**  Combining component hashes into a single
identifier means you can answer "did these two generations use
identical conditions?" with a single string comparison rather than
comparing six fields individually.  This is especially valuable when
grouping runs in log analysis or drift detection.

**The colon delimiter:**  The colon (`:`) is a non-hex character that
prevents collisions from field concatenation.  Without it, the
concatenation `input_hash="ab"` + `system_prompt_hash="cd"` would
produce the same string as `input_hash="abc"` + `system_prompt_hash="d"`.
The delimiter makes these unambiguous: `"ab:cd"` vs `"abc:d"`.

*Source: Implementation recommendation from
`_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md`.*

**Example:**

```python
ipc_id = compute_ipc_id(
    input_hash="d8bd1395713454e4...",       # 64-char hex
    system_prompt_hash="7f3a9c6e4b8f1d2c...", # 64-char hex
    model="gemma2:2b",
    temperature=0.2,
    max_tokens=120,
    seed=2954173979,
)
# Returns: "4a2e7f91..." (64-char hex)
```

If any single field changes -- even the seed by one digit -- the IPC
ID changes.  This is verified by the test suite, which modifies each
field individually and asserts a different hash.

*See also: {func}`app.hashing.compute_ipc_id`*

---

## Normalisation Philosophy

Normalisation is the most subtle and error-prone part of the hashing
system.  Getting it wrong silently undermines the entire provenance
chain.  This section explains the conservative approach in depth.

### The Normalisation Dilemma

There is a tension between two failure modes:

- **Too aggressive** (e.g., lowercasing, removing all whitespace):
  Semantically different texts hash identically, creating
  false-positive matches.  You think two runs used the same prompt,
  but they did not.
- **Too conservative** (e.g., hashing raw bytes):  Identical texts
  with trivial formatting differences hash differently, creating
  false-positive mismatches.  You think two runs used different
  prompts, but the only difference was a trailing space added by an
  editor.

The system must find the precise boundary between semantic and
non-semantic variation.

### Normalisation Principles

| Rule | Rationale | Applied To |
|------|-----------|------------|
| Never lowercase | Case carries semantic meaning ("NEVER" vs "never") | Prompts, Output |
| Preserve internal structure | Line order, sentence order, paragraph breaks may be meaningful | Prompts, Output |
| Strip edge whitespace | Editors often add trailing spaces/newlines; these are noise | Prompts, Output |
| Collapse 2+ spaces to 1 | LLMs sometimes produce inconsistent spacing; this is noise | Output only |
| Strip per-line whitespace | Editor indentation is noise in prompt files | Prompts only |
| Preserve tabs in output | Tab characters in LLM output may represent structure | Output only |

### Why Prompts and Outputs Have Different Rules

**System prompts** are human-authored text files that accumulate editor
artifacts over time: indentation from copy-paste, trailing spaces from
line editing, blank lines added by auto-formatters.  The per-line strip
is appropriate because no human writes a prompt where leading
indentation is semantically meaningful to the LLM.

**LLM output** is machine-generated and may use tabs or specific spacing
patterns as part of its structure.  Only the most clearly non-semantic
variation -- runs of multiple consecutive spaces -- is normalised.  The
regex `re.sub(r" {2,}", " ", text)` targets only ASCII space (`U+0020`),
never `\s`, so newlines, tabs, and other whitespace are preserved.

### What Normalisation Does NOT Do

The following operations are intentionally avoided:

- **Does not remove duplicate sentences** -- repetition may be a
  legitimate LLM output pattern.
- **Does not reorder anything** -- line order, sentence order, and
  paragraph order are always preserved.
- **Does not normalise Unicode** (e.g., NFC/NFKC) -- this is a
  conscious choice.  If Unicode normalisation becomes needed, it should
  be added explicitly as a documented change.
- **Does not strip internal blank lines from prompts** -- these are
  paragraph breaks that may carry structural meaning.
- **Does not touch punctuation** -- commas, periods, dashes, and all
  other punctuation are preserved exactly.
- **Does not trim internal whitespace from prompts** -- only
  leading/trailing whitespace on each line is stripped.

### The Invariant

The normalisation rules are designed to uphold a single invariant:

> Any edit that changes meaning **will** change the hash.
> Any edit that does not change meaning **will not** change the hash.

This ensures that the hashing system is both sensitive to real changes
and robust against formatting noise.

---

## The Provenance Chain in Action

This section walks through a complete generation cycle, showing exactly
when and where each hash is computed, what data flows through the chain,
and what the final response looks like.

### End-to-End Data Flow

```text
 +-------------------------------------------------------------------+
 |  Frontend: User edits axes, selects model, clicks Generate        |
 |                                                                   |
 |  Sends POST /api/generate:                                       |
 |    { payload: {...}, model: "gemma2:2b",                          |
 |      temperature: 0.2, max_tokens: 120 }                         |
 +-------------------------------+-----------------------------------+
                                 |
                                 v
 +-------------------------------------------------------------------+
 |  Backend: generate() route handler                                |
 |                                                                   |
 |  1. Resolve system prompt                                         |
 |     - Use custom override from request, OR                        |
 |     - Load default from the supported local prompt roots          |
 |                                                                   |
 |  2. Serialise payload as pretty-printed JSON                      |
 |     - This becomes the "user turn" sent to the LLM               |
 |                                                                   |
 |  3. Call Ollama                                                   |
 |     - Send system prompt + serialised payload                     |
 |     - Receive generated text + usage metrics                      |
 |                                                                   |
 |  4. Compute IPC hashes:                                           |
 |     input_hash        = compute_payload_hash(payload.model_dump())|
 |     system_prompt_hash = compute_system_prompt_hash(prompt)       |
 |     output_hash       = compute_output_hash(text)                 |
 |     ipc_id            = compute_ipc_id(                           |
 |         input_hash, system_prompt_hash,                           |
 |         model, temperature, max_tokens, seed)                     |
 +-------------------------------+-----------------------------------+
                                 |
                                 v
 +-------------------------------------------------------------------+
 |  GenerateResponse (JSON):                                         |
 |  {                                                                |
 |    "text": "A weathered figure stands near the threshold...",     |
 |    "model": "gemma2:2b",                                          |
 |    "temperature": 0.2,                                            |
 |    "input_hash": "d8bd139571345...",                              |
 |    "system_prompt_hash": "7f3a9c6e4b8f1...",                     |
 |    "output_hash": "9cbe31f3d1e7a...",                             |
 |    "ipc_id": "4a2e7f9183bc2..."                                   |
 |  }                                                                |
 +-------------------------------+-----------------------------------+
                                 |
                                 v
 +-------------------------------------------------------------------+
 |  Frontend: Display in UI                                          |
 |                                                                   |
 |  Output text appears in the output box.                           |
 |  Meta area shows three lines:                                     |
 |                                                                   |
 |  model: gemma2:2b  .  temp: 0.2  .  seed: 2954173979             |
 |  input: d8bd139571345...  .  prompt: 7f3a9c6e4b8f1...            |
 |    .  output: 9cbe31f3d1e7a...                                   |
 |  ipc: 4a2e7f9183bc2...                                            |
 +-------------------------------------------------------------------+
```

### Step-by-Step Walkthrough

**Step 1 -- Prompt resolution.**  The backend checks whether the request
includes a custom `system_prompt` override.  If not, it loads the default
prompt from the supported local prompt roots, currently
`app/lab_only/prompts/character_description/system_prompt_v01.txt`.
The resolved prompt is the text that will be hashed as `system_prompt_hash`.

**Step 2 -- Payload serialisation.**  The `AxisPayload` is serialised
to pretty-printed JSON (with 2-space indentation) and sent as the user
turn to Ollama.  This is the text the LLM "sees" as the user message.

**Step 3 -- Ollama generation.**  The backend calls Ollama's
`/api/generate` endpoint with the system prompt and serialised payload.
Ollama returns the generated text and optional usage metrics (prompt
tokens, generation tokens).

**Step 4 -- Hash computation.**  After the Ollama call succeeds, the
backend computes all four hashes.  The order matters: `input_hash` and
`system_prompt_hash` are computed from the request data, `output_hash`
from the response text, and `ipc_id` from the combination of all
provenance fields.  All four are included in the `GenerateResponse`.

**Step 5 -- Frontend display.**  The frontend displays the hashes in a
three-line meta area below the output text.  All hashes are truncated
to 16 characters (out of 64) for UI readability.  The full 64-character
hashes are available in the JSON response and in saved files.

---

## Integration Points

This section catalogues every place in the codebase where IPC hashes are
computed, transmitted, stored, or displayed.

### `/api/generate` -- Full IPC Chain

Every successful generation returns all four hashes.  The hashes are
computed after the Ollama call completes (the `output_hash` needs the
generated text).

The `GenerateResponse` Pydantic model defines the four IPC fields:

- `input_hash` -- SHA-256 of the canonical AxisPayload
- `system_prompt_hash` -- SHA-256 of the normalised system prompt
- `output_hash` -- SHA-256 of the normalised output text
- `ipc_id` -- the composite Interpretive Provenance Chain identifier

All four fields are `str | None` with `default=None` on the schema,
but the generate endpoint always populates them.

*Implementation: `app/main.py`, generate route handler.
Schema: `app/schema/generate.py`, `GenerateResponse` class.*

### `/api/log` -- Backward-Compatible IPC

The log endpoint appends structured entries to the configured log root,
typically `AXIS_LAB_LOGS_DIR/run_log.jsonl` on Luminal and repo-local
`logs/run_log.jsonl` only as a local-development fallback.
It supports IPC hashes with backward compatibility:

- `input_hash` and `output_hash` are **always** computed (these only
  need the payload and output text, which are required parameters).
- `system_prompt_hash` and `ipc_id` are **only** computed when the
  optional `system_prompt` parameter is provided.
- When `system_prompt` is omitted, both fields are `null` in the
  log entry.

This design ensures that older frontend versions (or external callers)
that do not send the prompt can still log successfully.  The IPC fields
on `LogEntry` are `Optional[str]` with `default=None`, so existing
JSONL records written before IPC was implemented can still be
deserialised without error.

*Implementation: `app/main.py`, log_run route handler.
Schema: `app/schema/generate.py`, `LogEntry` class.*

### `/api/save` -- Persistent IPC Record

The save endpoint writes session state to a timestamped folder under the
configured writable data root, typically `AXIS_LAB_DATA_DIR` on Luminal and
repo-local `data/` only as a local-development fallback. IPC hashes are
persisted in two locations:

**1. `metadata.json`** -- All four hashes appear as top-level fields:

```json
{
  "folder_name": "20260218_143022_d8bd1395",
  "timestamp": "2026-02-18T14:30:22.504478+00:00",
  "input_hash": "d8bd1395713454e4...",
  "system_prompt_hash": "7f3a9c6e4b8f1d2c...",
  "output_hash": "9cbe31f3d1e7a2c6...",
  "ipc_id": "4a2e7f9183bc2d4e...",
  "model": "gemma2:2b",
  "temperature": 0.2,
  "max_tokens": 120,
  "seed": 2954173979,
  "world_id": "pipeworks_web",
  "policy_hash": "d845cdcf...",
  "axis_count": 11
}
```

**2. `output.md`** -- The `system_prompt_hash` and `ipc_id` appear as
HTML comments in the provenance header, making the saved file
self-documenting:

```html
<!-- system_prompt_hash: 7f3a9c6e4b8f1d2c... -->
<!-- ipc_id: 4a2e7f9183bc2d4e... -->
```

**Conditional computation:**  The `system_prompt_hash` is always
computed (the system prompt is a required field in `SaveRequest`).
However, `output_hash` and `ipc_id` are only computed when the user
has generated output before saving.  Without output, the provenance
chain is incomplete, and both fields are `null`.

*Implementation: `app/routes_save.py`, save route handler.
Schema: `app/schema/save.py`, `SaveRequest` and `SaveResponse` classes.*

### Frontend Display

The frontend (`app/static/mod-generate.js`) displays IPC hashes in a
meta table below the generated output:

```text
model: gemma2:2b  .  temp: 0.2  .  seed: 2954173979
input: d8bd139571345...  .  prompt: 7f3a9c6e4b8f1...  .  output: 9cbe31f3d1e7a...
ipc: 4a2e7f9183bc2...
```

All hashes are truncated to 16 characters via `.slice(0, 16)` followed
by an ellipsis character.  The `ipc_id` gets its own line to visually
distinguish the composite identifier from the component hashes.

The CSS class `.output-meta` uses `white-space: pre-wrap` so that the
newline characters in the meta string render as line breaks.

### Hash Availability Summary

| Endpoint | `input_hash` | `system_prompt_hash` | `output_hash` | `ipc_id` |
|----------|:---:|:---:|:---:|:---:|
| `/api/generate` | Always | Always | Always | Always |
| `/api/log` | Always | When prompt provided | Always | When prompt provided |
| `/api/save` response | Always | Always | When output exists | When output exists |
| `metadata.json` (saved) | Always | Always | When output exists | When output exists |
| `output.md` (saved) | In header | In header | N/A (is the content) | In header |
| Frontend display | Always | Always | Always | Always |

---

## Use Cases and Experimental Scenarios

The IPC system enables four categories of analysis that were not
possible before its implementation.

### Detecting Prompt Drift

**Scenario:**  You modify the system prompt from v01 to v02 (e.g.,
adding "avoid metaphor").  You generate text from the same payload
with the same model and parameters.

**What the hashes reveal:**

- `input_hash` -- **unchanged** (same payload)
- `system_prompt_hash` -- **different** (prompt text changed)
- `ipc_id` -- **different** (one component changed)
- `output_hash` -- **different** (the LLM responded differently)

**Conclusion:**  The behavioural change is attributable to the prompt
change, not to the model or the input.  The IPC separates the
variables.

Without the IPC, you would see different output and have no way to
determine whether the cause was the prompt, the model, or random
variation.

### Detecting Model Drift

**Scenario:**  You upgrade Ollama's model (e.g., a new release of
`gemma2:2b`).  Same payload, same prompt, same parameters.

**What the hashes reveal:**

- `input_hash` -- **unchanged**
- `system_prompt_hash` -- **unchanged**
- `ipc_id` -- **unchanged** (if the model string is identical)
- `output_hash` -- **may differ** (model internals changed)

**Conclusion:**  If the IPC ID matches but the output hash differs,
the model's internal behaviour changed even though all user-controlled
inputs were identical.  This is Ollama-level or model-weight-level
drift -- invisible without the IPC.

### Reproducibility Audit

**Scenario:**  You need to verify that a saved session can be
reproduced.

**Procedure:**

1. Load `metadata.json` from a saved session.
2. Extract the `ipc_id`.
3. Re-run the generation with the same payload, prompt, model, and
   parameters.
4. Compare the new `ipc_id` to the saved one -- they should match.
5. Compare the `output_hash` -- if they differ, the model exhibited
   stochastic variation even under identical conditions.

This provides a rigorous, quantifiable measure of reproducibility.

### Grouping Runs for Analysis

**Scenario:**  You have 50 log entries in the configured run log and want
to analyse output stability.

**Procedure:**

1. Group entries by `ipc_id`.
2. Within each group, examine `output_hash` variation to measure
   output stability under identical conditions.
3. Across groups, compare how different conditions produce different
   outputs.
4. Use `system_prompt_hash` to isolate the effect of prompt changes.
5. Use `input_hash` to isolate the effect of payload changes.

This enables systematic, data-driven analysis of LLM behaviour --
the kind of analysis that is impossible without provenance tracking.

---

## Design Decisions and Trade-offs

This section documents the *why* behind specific technical choices,
for future maintainers and contributors.

### Why SHA-256?

SHA-256 is fast, widely available in Python's standard library
(`hashlib`), and produces a 64-character hex digest that is long enough
to be collision-resistant for this use case.  It is the same algorithm
used elsewhere in the project (e.g., `policy_hash`).  There is no need
for a cryptographic commitment scheme -- the hashes are fingerprints,
not signatures.

### Why Not Hash Raw Bytes?

Raw file contents include editor artifacts: trailing whitespace,
different line endings (LF vs CRLF), BOM characters.  Two developers
editing the same prompt in different editors would produce different
hashes for semantically identical prompts.  Normalisation eliminates
this source of false mismatch.

*Source: `_working/axis_lab/pipeworks_lab_prompt_hash.md`, section 4.1:
"We do not hash raw file contents.  We hash a normalised version to
avoid meaningless diffs."*

### Why Is the Hashing Module Dependency-Free?

`app/hashing.py` imports only `hashlib`, `json`, and `re` from the
standard library.  It does not import Pydantic models from
`app/schema/`.  This is intentional: it keeps the module usable from
any context (tests, scripts, future tools) without pulling in the web
framework's dependencies.  Callers convert Pydantic models to plain
dicts before calling `compute_payload_hash`.

### Why `sort_keys=True` for Payload Canonicalisation?

Python dicts are insertion-ordered since 3.7, but different code paths
may construct the same payload with keys in different order.  Sorted-key
serialisation guarantees the same JSON string regardless of construction
order.  This is verified by the test
`TestComputePayloadHash.test_order_independent` in
`tests/test_hashing.py`.

### Why Colon Delimiters in the IPC ID?

Without a delimiter, concatenating field values could produce ambiguous
strings.  The colon is a non-hex character that creates unambiguous
field boundaries.  This prevents collisions like
`"ab" + "cd"` vs `"abc" + "d"`, which would produce the same
concatenated string `"abcd"` but different colon-delimited strings
`"ab:cd"` vs `"abc:d"`.

This is verified by the test
`TestComputeIpcId.test_colon_delimiter_prevents_collision` in
`tests/test_hashing.py`.

*Source: Implementation recommendation from
`_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md`:
"concatenate the hex digests with a non-hex delimiter (like a colon)
before hashing the final string."*

### Why Are IPC Fields Optional on LogEntry?

Backward compatibility.  The logging endpoint existed before IPC hashing
was implemented.  Existing JSONL records have no IPC fields; making them
`Optional` with `default=None` means those records can still be
deserialised by the updated `LogEntry` schema without breaking.

### Future Considerations

The original design document proposes an "Optional advanced version" of
the IPC ID that includes the Ollama build hash, model digest, and host
platform for even higher reproducibility guarantees:

```text
IPC_ID_advanced = SHA-256(
    input_hash
    + ":" + system_prompt_hash
    + ":" + model
    + ":" + temperature
    + ":" + max_tokens
    + ":" + seed
    + ":" + ollama_build_hash
    + ":" + model_digest
    + ":" + host_platform
)
```

This depends on Ollama API support for exposing build and model digest
information.  It remains a valuable long-term goal.

*Source: `_working/axis_lab/pipeworks_interpretive_provenance_chains.md`,
section 5.*

---

## Testing the Hashing System

The hashing system has comprehensive test coverage in
`tests/test_hashing.py`.  This section describes the test organisation
and the properties verified.

### Test Organisation

The test module contains six test classes, each targeting a single
function:

| Test Class | Function Under Test | Tests |
|------------|-------------------|:-----:|
| `TestNormaliseSystemPrompt` | `_normalise_system_prompt` | 8 |
| `TestNormaliseOutput` | `_normalise_output` | 7 |
| `TestComputeSystemPromptHash` | `compute_system_prompt_hash` | 5 |
| `TestComputeOutputHash` | `compute_output_hash` | 5 |
| `TestComputeIpcId` | `compute_ipc_id` | 5 |
| `TestComputePayloadHash` | `compute_payload_hash` | 4 |
| **Total** | | **34** |

### Properties Verified

Every test class verifies four properties (as stated in the test module
docstring):

1. **Correctness of normalisation rules** -- edge cases, boundary
   conditions, empty inputs, whitespace-only inputs.
2. **Determinism** -- same input always produces same output.
3. **Sensitivity** -- different inputs produce different outputs.
4. **Format** -- 64-character lowercase hex digest (where applicable).

### Running the Tests

```bash
# Run hashing tests only
pytest tests/test_hashing.py -v

# Run with coverage
pytest tests/test_hashing.py -v --cov=app.hashing --cov-report=term
```

### Extending the Normalisation Rules

If a normalisation rule needs to change (e.g., adding Unicode NFC
normalisation), follow this procedure:

1. **Add tests for the new behaviour first** (test-driven development).
2. **Update the private normalisation function** in `app/hashing.py`.
3. **Verify that all existing tests still pass** -- unless the change
   is intentionally breaking.
4. **Update this guide** to document the new rule.

:::{warning}
Changing normalisation rules **invalidates all previously computed
hashes** for affected hash types.  This is a breaking change for any
stored data (saved sessions, JSONL logs) that references those hashes.
Treat normalisation changes with the same care as a database schema
migration.
:::

---

## Design Document References

The IPC system was designed based on three documents in the project's
`_working/axis_lab/` directory.  These documents are preserved for
historical reference and contain the original reasoning and
specifications.

### Primary Design Document

**`_working/axis_lab/pipeworks_interpretive_provenance_chains.md`**
-- *"Axis Descriptor Lab -- Reproducibility & Trace Methodology"*

This document formalises the IPC concept, defines the required hashes,
specifies the IPC ID formula, and articulates the experimental
guarantees.  It establishes the theoretical framework that the
implementation realises.

### System Prompt Hashing Specification

**`_working/axis_lab/pipeworks_lab_prompt_hash.md`**
-- *"System Prompt Hashing -- Reproducibility & Drift Control"*

This document identifies the gap that system prompt changes were
untracked, defines the normalisation strategy for prompt text, and
proposes a three-phase rollout from observational to scientific
capability.  Its concluding statement captures the motivation:

> "Without system_prompt_hash, the lab is observational.
> With system_prompt_hash, the lab becomes scientific."

### Enhancement Assessment

**`_working/axis_lab/pipeworks_axis_descriptor_lab_proposed_enhancements.md`**
-- *"Assessment of Proposed Enhancements for Axis Descriptor Lab"*

A technical assessment of both design documents, evaluating their
feasibility and alignment with the project's philosophy.  This document
recommended the colon-delimiter approach for the IPC ID and noted the
value of consolidating hashing logic into a single utility module --
both of which were adopted in the implementation.

:::{note}
These documents are in the `_working/` directory and are not part of
the published package.  They represent the design rationale and are
preserved as historical reference within the repository.
:::

---

## Glossary

Authoritative layer
:   The deterministic data that drives generation: axis scores, labels,
    seeds, policy rules.  Never overridden by the LLM.  The system is
    authoritative.

Ornamental layer
:   The LLM-generated descriptive text.  Interprets authoritative data
    but has no authority of its own.  The LLM is ornamental.

Interpretive Provenance Chain (IPC)
:   A composite SHA-256 fingerprint of all variables that influence a
    generation.  The chain links payload, prompt, model, sampling
    parameters, and seed into a single reproducibility signature.

IPC ID
:   The single 64-character hexadecimal digest computed from the
    provenance chain.  Two generations with identical IPC IDs used
    identical inputs in every respect.

Normalisation
:   The process of reducing text to a canonical form before hashing.
    Removes noise (trivial whitespace, editor artifacts) while
    preserving signal (case, punctuation, structure).  Ensures that
    semantically identical texts produce identical hashes.

Payload hash (`input_hash`)
:   SHA-256 of the canonically serialised AxisPayload (sorted-key JSON).
    Fingerprints the complete deterministic entity state.

System prompt hash (`system_prompt_hash`)
:   SHA-256 of the normalised system prompt text.  Fingerprints the
    constraint layer that governs LLM behaviour.

Output hash (`output_hash`)
:   SHA-256 of the normalised LLM output text.  Fingerprints the
    observable interpretive artifact produced by the generation.

Drift
:   A change in LLM output that is not attributable to a change in
    user-controlled inputs.  May be caused by model updates, prompt
    changes, or stochastic sampling.  The IPC enables detecting and
    attributing drift to specific variables.

Provenance
:   The complete record of origin and processing history for a generated
    output.  In this system, provenance includes the payload, prompt,
    model, sampling parameters, and seed -- everything needed to
    reproduce the generation.