Transformation Map¶
Clause-level sentence alignment and diffing between baseline and current LLM output texts.
app/transformation_map.py¶
Clause-Level Alignment Layer (Transformation Map) for the Axis Descriptor Lab.
Why a dedicated module?¶
The word-level diff (client-side LCS) is too granular — clause rewrites appear
as a long sequence of single-word insertions and deletions, obscuring the
structural change. The signal isolation layer (signal_isolation.py) is
lexically useful but structure-blind (set difference, not positional).
The Transformation Map fills the gap by extracting clause-scale replacement pairs — showing what chunk of text was replaced by what chunk — without semantic interpretation.
Pipeline (sentence-aware alignment)¶
Normalise — collapse whitespace, strip edges.
Sentence split —
nltk.sent_tokenize()on both texts.Sentence alignment —
difflib.SequenceMatcheron sentence lists to pair sentences (equal, replace, insert, delete).Token-level alignment within matched sentence pairs — for each “replace” sentence pair, run
difflib.SequenceMatcheronnltk.word_tokenize()tokens and extract “replace” opcodes.For “equal” sentence pairs — skip (no changes).
For insert/delete-only sentences — optionally included via the
include_allparameter. When False (default), only replace operations are shown. When True, inserts and deletes appear as rows with an empty removed or added side.
Noise reduction¶
Ignore replacements where both sides are a single stopword.
Merge adjacent replace operations into a single row.
Normalise whitespace before alignment.
NLTK data requirements¶
Reuses the same NLTK data packages as signal_isolation.py:
punkt_tab, stopwords. These resources are validated explicitly at
call time rather than being downloaded during module import.
- app.transformation_map.compute_transformation_map(baseline_text, current_text, *, include_all=False)[source]¶
Extract clause-level change pairs between two texts.
Returns a list of
{"removed": "...", "added": "..."}dicts representing the structural changes found by sentence-aware alignment followed by token-level diffing within changed sentence groups.- Parameters:
baseline_text (The reference text (A).)
current_text (The comparison text (B).)
include_all (When True, include insert-only and delete-only) – operations as rows (with an empty
removedoraddedside). When False (default), only replacement operations are returned.
- Returns:
Each dict has
removed(text from A) andadded(text from B). Empty list if the texts are identical.- Return type: