Trace Forge

The Problem

Every LLM operation produces three things:

•HOLD₁ — The input (context, files, instructions)
•AGENT — The cognition (decisions, attention, confidence, surprise)
•HOLD₂ — The output (findings, documents, code, answers)

The industry default: ship HOLD₂, discard AGENT. The most valuable part — how the system thought — evaporates every session.

This skill makes the AGENT layer a first-class, persistent output.

The Three-Output Protocol

Every operation that uses this skill produces three files:

code

output/
├── WORK.md        ← what was produced (HOLD₂)
├── TRACE.md       ← how it thought (the AGENT, captured)
└── FILTER.md      ← what was deleted/kept/emerged (compression)

WORK.md

The primary deliverable. Whatever the plugin/skill/pipeline was asked to produce. No change from normal operation.

TRACE.md

The cognitive trace. Captured DURING operation, not after. Format:

markdown

# Trace: [operation name]
# Session: [timestamp]
# Operator: [agent identifier]

## Decisions

### Decision 1: [what was decided]
- **Options available:** [what could have been chosen]
- **Chosen:** [what was selected]
- **Sacrificed:** [what was NOT selected and why]
- **Confidence:** [high/medium/low]
- **Signal source:** [what triggered this decision]

### Decision 2: ...

## Attention Log
- **Read:** [files/sources actually consumed, in order]
- **Skipped:** [files/sources considered but not read, with reason]
- **Surprised by:** [things found that weren't expected]
- **Missed (known):** [things I know I didn't look at]

## Confidence Map
- **High confidence:** [claims/patterns I'm sure about]
- **Medium confidence:** [claims that seem right but could be wrong]
- **Low confidence:** [guesses, inferences, things I'd want to verify]

## Surplus Value
[Insights that emerged from the processing that weren't present in any single input. The thing the Furnace produces that neither HOLD contains.]

FILTER.md

The compression artifact. Apply the scout filter to the trace itself:

markdown

# Filter: [operation name]

## Deleted (Drowning)
[What was noise, circular, redundant, or irrelevant. What the next cycle should skip.]

## Kept (Swimming)
[What was signal. What the next cycle should prioritize.]

## Emerged (Surplus Value)
[What appeared that wasn't in any input. The actual intelligence of the operation.]

How to Emit a Trace

During Operation (Not After)

The trace is emitted AS the agent works, not reconstructed afterward. This means:

•Before each decision: Name the options and the choice
•After each file read: Note what was found vs expected
•At each pivot point: Record why direction changed
•When uncertain: Say so, with what would resolve it
•When surprised: Capture the surprise immediately — it's highest-signal

The Compression Rule

Not every micro-decision needs recording. Apply the filter:

•Delete mechanical decisions (which tool to use, formatting choices)
•Keep strategic decisions (which file to read, which pattern to weight, which interpretation to choose)
•Flag emergent insights (connections that appeared through processing)

Integration with Existing Operations

This skill layers ON TOP of other skills. When trace-forge is active:

•Any skill invocation produces its normal output PLUS the three trace files
•The trace files go in a trace/ subdirectory alongside the output
•Each trace is timestamped and named: TRACE_[operation]_[timestamp].md

The Feedback Loop

The trace files from previous runs become HOLD₁ for the next run.

code

Cycle 1: Read corpus → Produce findings + TRACE₁ + FILTER₁
Cycle 2: Read corpus + TRACE₁ + FILTER₁ → Produce better findings + TRACE₂ + FILTER₂
Cycle 3: Read corpus + TRACE₂ + FILTER₂ → Produce even better findings + TRACE₃ + FILTER₃

Each cycle:

•TRACE tells the next agent HOW the last one thought → improves decision-making
•FILTER tells the next agent WHAT was noise → reduces wasted attention
•WORK tells the next agent WHAT was found → prevents re-discovery

Reading Previous Traces

When a previous TRACE.md exists for the same operation:

•Read the Attention Log — know what was already read
•Read the Confidence Map — know where certainty is low
•Read the Surplus Value — know what emerged last time
•Read the FILTER — know what to skip and what to prioritize

Grammar Integration

Following THE_GRAMMAR OF IDENTITY:

Output	Domain	Grammar
WORK.md	NOT-ME (infrastructure output)	`_` lowercase
TRACE.md	US (shared cognition)	`-` Normal Caps
FILTER.md	ME (architect's compression)	`:` directives

The trace is US-domain because it exists at the interface between ME's intent and NOT-ME's execution. It belongs to neither alone.

For the Plugin-Maker

When trace-forge is used to BUILD other plugins:

•The plugin-maker creates a new skill/plugin
•That skill/plugin inherits the three-output protocol
•Every child plugin emits WORK + TRACE + FILTER
•The plugin-maker reads child traces to improve its own generation
•The loop compounds: better traces → better plugins → better traces

The plugin-maker's TRACE.md is the most important artifact in the system. It captures how plugin-generation decisions are made — which is the intelligence that makes each generation better than the last.

Pattern Recognition

If this skill gets invoked with similar trace patterns 3+ times:

•Notice the repeating decision structure
•Ask: "Your traces show you keep deciding [X] the same way. Want me to crystallize that into a standard?"
•Create the standard — a decision that only needs to be made once

This is how traces become standards. How the AGENT layer crystallizes into HOLD.

The Test

Before shipping: Does your output directory contain TRACE.md?

If no: the most valuable part of this operation just got deleted. If yes: the next cycle will be better than this one.

Ground Truth

The agent layer already exists. Every LLM already generates reasoning. Every human already makes decisions while working. The crime is the default: delete the thinking, ship only the result.

This skill changes the default. Nothing else.

— From THE_FRAMEWORK