Micro-Indicators¶
Structural Learning Layer: deterministic heuristic classifiers that label transformation-map rows with structural shift indicators (e.g. compression, embodiment shift, intensity ↑).
Uses NLTK for POS tagging and sentence segmentation, and JSON lexicon data from the repo’s checked-in lexicon resources.
app/micro_indicators.py¶
Micro-Indicators — Structural Pattern Classification for Transformation Map Rows.
Why a dedicated module?¶
The Transformation Map (transformation_map.py) extracts clause-level
replacement pairs (removed/added) between two texts. These pairs reveal
what changed, but not the structural character of the change.
Micro-indicators fill that gap by labelling each row with one or more deterministic heuristic tags — “compression”, “embodiment shift”, “intensity ↑”, etc. — that surface structural writing patterns without performing semantic interpretation.
The 10 indicators¶
compression — removed tokens ≥ ratio × added tokens
expansion — added tokens ≥ ratio × removed tokens
embodiment shift — abstract words removed, physical words added
abstraction ↑ — concrete words removed, abstract words added
intensity ↑ — word moves up on a known intensity scale
intensity ↓ — word moves down on a known intensity scale
consolidation — sentence count decreases
fragmentation — sentence count increases
tone reframing — lexical substitution with no other structural shift (fallback)
modality shift — verb/adjective density change (POS tagging)
lexical pivot — rare content word → rare content word (fallback)
Design principles¶
Deterministic: same input always produces the same indicators.
Rule-based: no AI inference, no embeddings, no probabilistic reasoning.
Conservative: defaults are tuned to avoid false positives.
Educational: labels introduce structural writing vocabulary.
Transparent: each heuristic is a simple, inspectable rule.
Lexicon data¶
Three JSON files in app/data/ provide the vocabulary for lexicon-based
indicators:
embodiment_v0_1.json— abstract/physical word listsabstraction_v0_1.json— concrete/abstract term listsintensity_v0_1.json— ordered intensity scales
These are loaded once at module import time and converted to frozenset
lookups for O(1) membership testing.
NLTK data requirements¶
Reuses the same NLTK data packages as signal_isolation.py (punkt_tab,
stopwords, wordnet). Those resources are validated explicitly at call time
rather than being downloaded during module import.
Additionally requires averaged_perceptron_tagger_eng for the modality
shift indicator (POS tagging). Environment preparation should bootstrap all
required NLTK data up front via python tools/bootstrap_nltk.py.
- class app.micro_indicators.IndicatorConfig(compression_ratio=2.0, expansion_ratio=2.0, min_tokens=2, modality_density_threshold=0.3, enabled=None)[source]¶
Bases:
objectTuning parameters for micro-indicator detection.
All fields have conservative defaults. The frontend can override these per-request via the
indicator_configfield on theTransformationMapRequestschema.- Parameters:
compression_ratio (float) – Minimum ratio of
len(removed_tokens) / len(added_tokens)to flag “compression”. Default 2.0 means removed must be at least twice as long as added.expansion_ratio (float) – Minimum ratio of
len(added_tokens) / len(removed_tokens)to flag “expansion”. Default 2.0.min_tokens (int) – Minimum token count on the larger side to consider size-based indicators (compression/expansion). Prevents flagging single-word swaps. Default 2.
modality_density_threshold (float) – Minimum absolute change in verb+adjective density (proportion of tokens that are verbs or adjectives) to flag “modality shift”. Default 0.3 (conservative — requires a 30 percentage-point shift).
enabled (tuple[str, ...] | None) – When not None, only compute indicators whose names appear in this tuple. None means all indicators are active.
- __init__(compression_ratio=2.0, expansion_ratio=2.0, min_tokens=2, modality_density_threshold=0.3, enabled=None)¶
- app.micro_indicators.classify_row(removed, added, *, config=None)[source]¶
Compute micro-indicators for a single transformation map row.
Evaluates all applicable indicator heuristics against the removed/added text pair and returns a list of indicator labels. A row can have zero or more indicators (e.g.,
["compression", "intensity ↑"]).Structural indicators are evaluated first; fallback indicators (
tone reframing,lexical pivot) only fire when no structural indicator matched.- Parameters:
removed (str) – The text chunk from the baseline (A) that was replaced.
added (str) – The text chunk from the current text (B) that replaced it.
config (IndicatorConfig | None) – Optional tuning parameters.
Noneuses conservative defaults.
- Returns:
Ordered list of indicator labels that apply to this row. Empty list if no indicators match or if both inputs are empty.
- Return type:
- app.micro_indicators.classify_rows(rows, *, config=None)[source]¶
Compute micro-indicators for every row in a transformation map.
Convenience wrapper that calls
classify_row()on each row.- Parameters:
rows (list[dict[str, str]]) – Each dict must have
"removed"and"added"keys (the output ofcompute_transformation_map()).config (IndicatorConfig | None) – Optional tuning parameters.
Noneuses conservative defaults.
- Returns:
One list of indicator labels per row, in the same order as the input rows.
- Return type: