Skill: W&B Weave
When to Use This Skill
Use this skill when:
- •Adding observability to agent functions
- •Logging metrics for the dashboard
- •Creating traces for debugging
- •Evaluating agent performance
- •Implementing TraceTriage self-improvement
Do NOT use this skill when:
- •Working on core agent logic without logging
- •Setting up infrastructure (Redis, Vercel, etc.)
Overview
Weights & Biases Weave provides observability for LLM applications. For PatchPilot, it:
- •Creates trace trees showing agent interactions
- •Logs inputs/outputs of every function
- •Tracks metrics like pass rate, time-to-fix
- •Enables evaluation and comparison of runs
Key features:
- •
@weave.op()decorator for automatic tracing - •Structured logging with
weave.log() - •Evaluation framework for A/B testing
- •Dashboard for visualization
Key Concepts
Traces
A trace is a tree of function calls. In PatchPilot:
code
Run: patchpilot-run-123 ├── Orchestrator.run() │ ├── TesterAgent.runTest() │ │ └── [inputs, outputs, duration] │ ├── TriageAgent.diagnose() │ │ └── [inputs, outputs, duration] │ ├── FixerAgent.generatePatch() │ │ └── [inputs, outputs, duration] │ └── VerifierAgent.verify() │ └── [inputs, outputs, duration]
Operations
An operation is a traced function. Use @weave.op():
typescript
class TesterAgent {
@weave.op()
async runTest(spec: TestSpec): Promise<TestResult> {
// Automatically logged
}
}
Metrics
Metrics are key-value pairs logged per run:
typescript
weave.log({
test_pass_rate: 0.87,
time_to_fix_seconds: 192,
iterations: 2
});
Common Patterns
Initialize Weave
typescript
import weave from 'weave';
await weave.init({
project: 'patchpilot',
entity: process.env.WANDB_ENTITY
});
Trace Agent Methods
typescript
import weave from 'weave';
class TesterAgent {
@weave.op()
async runTest(spec: TestSpec): Promise<TestResult> {
// All inputs and outputs automatically logged
const result = await this.executeTest(spec);
return result;
}
@weave.op()
async captureFailure(error: Error): Promise<FailureReport> {
// Nested calls create trace tree
const screenshot = await this.getScreenshot();
return { error, screenshot };
}
}
Log Metrics
typescript
async function logRunMetrics(result: RunResult): Promise<void> {
weave.log({
// Test metrics
tests_total: result.totalTests,
tests_passed: result.passedTests,
pass_rate: result.passedTests / result.totalTests,
// Fix metrics
bugs_found: result.bugsFound,
bugs_fixed: result.bugsFixed,
fix_success_rate: result.bugsFixed / result.bugsFound,
// Performance metrics
total_duration_seconds: result.duration / 1000,
avg_fix_time_seconds: result.avgFixTime / 1000,
total_iterations: result.totalIterations,
// Cost metrics
llm_tokens_used: result.tokensUsed,
redis_queries: result.redisQueries
});
}
Create Evaluation
typescript
import weave from 'weave';
// Define evaluation dataset
const evalDataset = [
{ input: { errorMessage: 'Missing onClick' }, expected: 'UI_BUG' },
{ input: { errorMessage: 'API 404' }, expected: 'BACKEND_ERROR' }
];
// Run evaluation
const evaluation = await weave.evaluate({
model: triageAgent,
dataset: evalDataset,
scorers: [
(output, expected) => output.failureType === expected ? 1 : 0
]
});
console.log('Accuracy:', evaluation.scores.mean);
Best Practices
- •Trace all agent entry points - Every public method should have
@weave.op() - •Log structured data - Use objects, not strings
- •Name operations clearly - Include agent name in function names
- •Log at consistent points - After each run, not during
- •Include metadata - Test IDs, timestamps, versions
Common Pitfalls
Missing Async Handling
- •
@weave.op()works with async functions - •Ensure all promises are awaited
Large Payloads
- •Don't log huge DOM snapshots
- •Truncate long strings
- •Use references for binary data
Missing Context
- •Initialize weave early
- •Ensure
WANDB_API_KEYis set
Related Skills
- •
marimo-dashboards/- Visualizing Weave data - •
patchpilot-agents/- Where to add tracing