TRMInterp
Use this workflow to prevent one-off experiment scripts and keep outputs comparable.
Contract
Enforce exactly three commands:
- •
trminterp trace - •
trminterp sae fit - •
trminterp intervene
Enforce exactly two report formats:
- •
trace.jsonl - •
report.json
Allow optional caches/artifacts:
- •
states.npy - •
sae.pt - •
sae_metrics.json - •intervention diffs JSON under the same run folder
Loop
Run in this order:
- •
trminterp trace - •
trminterp sae fit - •
trminterp intervene
Use deterministic seeds for every stage. Write all generated outputs under a single run directory. Never mix code changes and generated outputs in the same commit unless explicitly requested.
Command Mapping
If the repo has no trminterp binary yet, map to existing modules and keep output names stable:
- •
trminterp trace:python -m mechinterp_cli trace --out-dir <run_dir>/traces --seed <seed>Then normalize/rename the selected trace file to<run_dir>/trace.jsonl. Optionally export stacked states to<run_dir>/states.npy. - •
trminterp sae fit:python -m analysis.sae --summary-json <run_dir>/summary.json --out-json <run_dir>/sae_metrics.json --save-model-json <run_dir>/sae.pt - •
trminterp intervene:python -m analysis.causality --summary-json <run_dir>/summary.json --seed <seed> --out-json <run_dir>/report.json
Keep the report focused on baseline vs intervened outcomes and top causal features.
Intervention Policy
Implement only these policies in v0:
- •
ablate_features - •
clamp_features - •
patch_features
Avoid GUI work, feature browsers, seed-matching research tooling, and model-zoo abstractions in this skill version.
Internal API
Keep internal abstractions minimal:
- •
TraceDataset.load(path) -> states[N,D], meta - •
SAE.encode(h) -> a,SAE.decode(a) -> h_hat - •
InterventionPolicy.apply(step, h, sae, context) -> h_prime - •
Evaluator.compare(baseline_trace, new_trace) -> report
Read references/v0-spec.md for the exact payload expectations.