Micro-Indicators

Structural Learning Layer: deterministic heuristic classifiers that label transformation-map rows with structural shift indicators (e.g. compression, embodiment shift, intensity ↑).

Uses NLTK for POS tagging and sentence segmentation, and JSON lexicon data from the repo’s checked-in lexicon resources.

app/micro_indicators.py

Micro-Indicators — Structural Pattern Classification for Transformation Map Rows.

Why a dedicated module?

The Transformation Map (transformation_map.py) extracts clause-level replacement pairs (removed/added) between two texts. These pairs reveal what changed, but not the structural character of the change.

Micro-indicators fill that gap by labelling each row with one or more deterministic heuristic tags — “compression”, “embodiment shift”, “intensity ↑”, etc. — that surface structural writing patterns without performing semantic interpretation.

The 10 indicators

  1. compression — removed tokens ≥ ratio × added tokens

  2. expansion — added tokens ≥ ratio × removed tokens

  3. embodiment shift — abstract words removed, physical words added

  4. abstraction ↑ — concrete words removed, abstract words added

  5. intensity ↑ — word moves up on a known intensity scale

  6. intensity ↓ — word moves down on a known intensity scale

  7. consolidation — sentence count decreases

  8. fragmentation — sentence count increases

  9. tone reframing — lexical substitution with no other structural shift (fallback)

  10. modality shift — verb/adjective density change (POS tagging)

  11. lexical pivot — rare content word → rare content word (fallback)

Design principles

  • Deterministic: same input always produces the same indicators.

  • Rule-based: no AI inference, no embeddings, no probabilistic reasoning.

  • Conservative: defaults are tuned to avoid false positives.

  • Educational: labels introduce structural writing vocabulary.

  • Transparent: each heuristic is a simple, inspectable rule.

Lexicon data

Three JSON files in app/data/ provide the vocabulary for lexicon-based indicators:

  • embodiment_v0_1.json — abstract/physical word lists

  • abstraction_v0_1.json — concrete/abstract term lists

  • intensity_v0_1.json — ordered intensity scales

These are loaded once at module import time and converted to frozenset lookups for O(1) membership testing.

NLTK data requirements

Reuses the same NLTK data packages as signal_isolation.py (punkt_tab, stopwords, wordnet). Those resources are validated explicitly at call time rather than being downloaded during module import.

Additionally requires averaged_perceptron_tagger_eng for the modality shift indicator (POS tagging). Environment preparation should bootstrap all required NLTK data up front via python tools/bootstrap_nltk.py.

class app.micro_indicators.IndicatorConfig(compression_ratio=2.0, expansion_ratio=2.0, min_tokens=2, modality_density_threshold=0.3, enabled=None)[source]

Bases: object

Tuning parameters for micro-indicator detection.

All fields have conservative defaults. The frontend can override these per-request via the indicator_config field on the TransformationMapRequest schema.

Parameters:
  • compression_ratio (float) – Minimum ratio of len(removed_tokens) / len(added_tokens) to flag “compression”. Default 2.0 means removed must be at least twice as long as added.

  • expansion_ratio (float) – Minimum ratio of len(added_tokens) / len(removed_tokens) to flag “expansion”. Default 2.0.

  • min_tokens (int) – Minimum token count on the larger side to consider size-based indicators (compression/expansion). Prevents flagging single-word swaps. Default 2.

  • modality_density_threshold (float) – Minimum absolute change in verb+adjective density (proportion of tokens that are verbs or adjectives) to flag “modality shift”. Default 0.3 (conservative — requires a 30 percentage-point shift).

  • enabled (tuple[str, ...] | None) – When not None, only compute indicators whose names appear in this tuple. None means all indicators are active.

compression_ratio: float = 2.0
expansion_ratio: float = 2.0
min_tokens: int = 2
modality_density_threshold: float = 0.3
enabled: tuple[str, ...] | None = None
__init__(compression_ratio=2.0, expansion_ratio=2.0, min_tokens=2, modality_density_threshold=0.3, enabled=None)
app.micro_indicators.classify_row(removed, added, *, config=None)[source]

Compute micro-indicators for a single transformation map row.

Evaluates all applicable indicator heuristics against the removed/added text pair and returns a list of indicator labels. A row can have zero or more indicators (e.g., ["compression", "intensity ↑"]).

Structural indicators are evaluated first; fallback indicators (tone reframing, lexical pivot) only fire when no structural indicator matched.

Parameters:
  • removed (str) – The text chunk from the baseline (A) that was replaced.

  • added (str) – The text chunk from the current text (B) that replaced it.

  • config (IndicatorConfig | None) – Optional tuning parameters. None uses conservative defaults.

Returns:

Ordered list of indicator labels that apply to this row. Empty list if no indicators match or if both inputs are empty.

Return type:

list[str]

app.micro_indicators.classify_rows(rows, *, config=None)[source]

Compute micro-indicators for every row in a transformation map.

Convenience wrapper that calls classify_row() on each row.

Parameters:
Returns:

One list of indicator labels per row, in the same order as the input rows.

Return type:

list[list[str]]