Transformation Map¶

Clause-level sentence alignment and diffing between baseline and current LLM output texts.

app/transformation_map.py¶

Clause-Level Alignment Layer (Transformation Map) for the Axis Descriptor Lab.

Why a dedicated module?¶

The word-level diff (client-side LCS) is too granular — clause rewrites appear as a long sequence of single-word insertions and deletions, obscuring the structural change. The signal isolation layer (signal_isolation.py) is lexically useful but structure-blind (set difference, not positional).

The Transformation Map fills the gap by extracting clause-scale replacement pairs — showing what chunk of text was replaced by what chunk — without semantic interpretation.

Pipeline (sentence-aware alignment)¶

Normalise — collapse whitespace, strip edges.
Sentence split — nltk.sent_tokenize() on both texts.
Sentence alignment — difflib.SequenceMatcher on sentence lists to pair sentences (equal, replace, insert, delete).
Token-level alignment within matched sentence pairs — for each “replace” sentence pair, run difflib.SequenceMatcher on nltk.word_tokenize() tokens and extract “replace” opcodes.
For “equal” sentence pairs — skip (no changes).
For insert/delete-only sentences — optionally included via the include_all parameter. When False (default), only replace operations are shown. When True, inserts and deletes appear as rows with an empty removed or added side.

Noise reduction¶

Ignore replacements where both sides are a single stopword.
Merge adjacent replace operations into a single row.
Normalise whitespace before alignment.

NLTK data requirements¶

Reuses the same NLTK data packages as signal_isolation.py: punkt_tab, stopwords. These resources are validated explicitly at call time rather than being downloaded during module import.

app.transformation_map.compute_transformation_map(baseline_text, current_text, *, include_all=False)[source]¶

Extract clause-level change pairs between two texts.

Returns a list of {"removed": "...", "added": "..."} dicts representing the structural changes found by sentence-aware alignment followed by token-level diffing within changed sentence groups.

Parameters:

baseline_text (The reference text (A).)
current_text (The comparison text (B).)
include_all (When True, include insert-only and delete-only) – operations as rows (with an empty removed or added side). When False (default), only replacement operations are returned.

Returns:

Each dict has removed (text from A) and added (text from B). Empty list if the texts are identical.

Return type:

list[dict[str, str]]