Transformation Map

Clause-level sentence alignment and diffing between baseline and current LLM output texts.

app/transformation_map.py

Clause-Level Alignment Layer (Transformation Map) for the Axis Descriptor Lab.

Why a dedicated module?

The word-level diff (client-side LCS) is too granular — clause rewrites appear as a long sequence of single-word insertions and deletions, obscuring the structural change. The signal isolation layer (signal_isolation.py) is lexically useful but structure-blind (set difference, not positional).

The Transformation Map fills the gap by extracting clause-scale replacement pairs — showing what chunk of text was replaced by what chunk — without semantic interpretation.

Pipeline (sentence-aware alignment)

  1. Normalise — collapse whitespace, strip edges.

  2. Sentence splitnltk.sent_tokenize() on both texts.

  3. Sentence alignmentdifflib.SequenceMatcher on sentence lists to pair sentences (equal, replace, insert, delete).

  4. Token-level alignment within matched sentence pairs — for each “replace” sentence pair, run difflib.SequenceMatcher on nltk.word_tokenize() tokens and extract “replace” opcodes.

  5. For “equal” sentence pairs — skip (no changes).

  6. For insert/delete-only sentences — optionally included via the include_all parameter. When False (default), only replace operations are shown. When True, inserts and deletes appear as rows with an empty removed or added side.

Noise reduction

  • Ignore replacements where both sides are a single stopword.

  • Merge adjacent replace operations into a single row.

  • Normalise whitespace before alignment.

NLTK data requirements

Reuses the same NLTK data packages as signal_isolation.py: punkt_tab, stopwords. These resources are validated explicitly at call time rather than being downloaded during module import.

app.transformation_map.compute_transformation_map(baseline_text, current_text, *, include_all=False)[source]

Extract clause-level change pairs between two texts.

Returns a list of {"removed": "...", "added": "..."} dicts representing the structural changes found by sentence-aware alignment followed by token-level diffing within changed sentence groups.

Parameters:
  • baseline_text (The reference text (A).)

  • current_text (The comparison text (B).)

  • include_all (When True, include insert-only and delete-only) – operations as rows (with an empty removed or added side). When False (default), only replacement operations are returned.

Returns:

Each dict has removed (text from A) and added (text from B). Empty list if the texts are identical.

Return type:

list[dict[str, str]]