TRMInterp

Use this workflow to prevent one-off experiment scripts and keep outputs comparable.

Contract

Enforce exactly three commands:

•trminterp trace
•trminterp sae fit
•trminterp intervene

Enforce exactly two report formats:

•trace.jsonl
•report.json

Allow optional caches/artifacts:

•states.npy
•sae.pt
•sae_metrics.json
•intervention diffs JSON under the same run folder

Loop

Run in this order:

•trminterp trace
•trminterp sae fit
•trminterp intervene

Use deterministic seeds for every stage. Write all generated outputs under a single run directory. Never mix code changes and generated outputs in the same commit unless explicitly requested.

Command Mapping

If the repo has no trminterp binary yet, map to existing modules and keep output names stable:

•
trminterp trace: python -m mechinterp_cli trace --out-dir <run_dir>/traces --seed <seed> Then normalize/rename the selected trace file to <run_dir>/trace.jsonl. Optionally export stacked states to <run_dir>/states.npy.
•
trminterp sae fit: python -m analysis.sae --summary-json <run_dir>/summary.json --out-json <run_dir>/sae_metrics.json --save-model-json <run_dir>/sae.pt
•
trminterp intervene: python -m analysis.causality --summary-json <run_dir>/summary.json --seed <seed> --out-json <run_dir>/report.json

Keep the report focused on baseline vs intervened outcomes and top causal features.

Intervention Policy

Implement only these policies in v0:

•ablate_features
•clamp_features
•patch_features

Avoid GUI work, feature browsers, seed-matching research tooling, and model-zoo abstractions in this skill version.

Internal API

Keep internal abstractions minimal:

•TraceDataset.load(path) -> states[N,D], meta
•SAE.encode(h) -> a, SAE.decode(a) -> h_hat
•InterventionPolicy.apply(step, h, sae, context) -> h_prime
•Evaluator.compare(baseline_trace, new_trace) -> report

Read references/v0-spec.md for the exact payload expectations.