Skill: W&B Weave

When to Use This Skill

Use this skill when:

•Adding observability to agent functions
•Logging metrics for the dashboard
•Creating traces for debugging
•Evaluating agent performance
•Implementing TraceTriage self-improvement

Do NOT use this skill when:

•Working on core agent logic without logging
•Setting up infrastructure (Redis, Vercel, etc.)

Overview

Weights & Biases Weave provides observability for LLM applications. For PatchPilot, it:

•Creates trace trees showing agent interactions
•Logs inputs/outputs of every function
•Tracks metrics like pass rate, time-to-fix
•Enables evaluation and comparison of runs

Key features:

•@weave.op() decorator for automatic tracing
•Structured logging with weave.log()
•Evaluation framework for A/B testing
•Dashboard for visualization

Key Concepts

Traces

A trace is a tree of function calls. In PatchPilot:

code

Run: patchpilot-run-123
├── Orchestrator.run()
│   ├── TesterAgent.runTest()
│   │   └── [inputs, outputs, duration]
│   ├── TriageAgent.diagnose()
│   │   └── [inputs, outputs, duration]
│   ├── FixerAgent.generatePatch()
│   │   └── [inputs, outputs, duration]
│   └── VerifierAgent.verify()
│       └── [inputs, outputs, duration]

Operations

An operation is a traced function. Use @weave.op():

typescript

class TesterAgent {
  @weave.op()
  async runTest(spec: TestSpec): Promise<TestResult> {
    // Automatically logged
  }
}

Metrics

Metrics are key-value pairs logged per run:

typescript

weave.log({
  test_pass_rate: 0.87,
  time_to_fix_seconds: 192,
  iterations: 2
});

Common Patterns

Initialize Weave

typescript

import weave from 'weave';

await weave.init({
  project: 'patchpilot',
  entity: process.env.WANDB_ENTITY
});

Trace Agent Methods

typescript

import weave from 'weave';

class TesterAgent {
  @weave.op()
  async runTest(spec: TestSpec): Promise<TestResult> {
    // All inputs and outputs automatically logged
    const result = await this.executeTest(spec);
    return result;
  }

  @weave.op()
  async captureFailure(error: Error): Promise<FailureReport> {
    // Nested calls create trace tree
    const screenshot = await this.getScreenshot();
    return { error, screenshot };
  }
}

Log Metrics

typescript

async function logRunMetrics(result: RunResult): Promise<void> {
  weave.log({
    // Test metrics
    tests_total: result.totalTests,
    tests_passed: result.passedTests,
    pass_rate: result.passedTests / result.totalTests,

    // Fix metrics
    bugs_found: result.bugsFound,
    bugs_fixed: result.bugsFixed,
    fix_success_rate: result.bugsFixed / result.bugsFound,

    // Performance metrics
    total_duration_seconds: result.duration / 1000,
    avg_fix_time_seconds: result.avgFixTime / 1000,
    total_iterations: result.totalIterations,

    // Cost metrics
    llm_tokens_used: result.tokensUsed,
    redis_queries: result.redisQueries
  });
}

Create Evaluation

typescript

import weave from 'weave';

// Define evaluation dataset
const evalDataset = [
  { input: { errorMessage: 'Missing onClick' }, expected: 'UI_BUG' },
  { input: { errorMessage: 'API 404' }, expected: 'BACKEND_ERROR' }
];

// Run evaluation
const evaluation = await weave.evaluate({
  model: triageAgent,
  dataset: evalDataset,
  scorers: [
    (output, expected) => output.failureType === expected ? 1 : 0
  ]
});

console.log('Accuracy:', evaluation.scores.mean);

Best Practices

•Trace all agent entry points - Every public method should have @weave.op()
•Log structured data - Use objects, not strings
•Name operations clearly - Include agent name in function names
•Log at consistent points - After each run, not during
•Include metadata - Test IDs, timestamps, versions

Wandb Weave

Skill: W&B Weave

When to Use This Skill

Overview

Key Concepts

Traces

Operations

Metrics

Common Patterns

Initialize Weave

Trace Agent Methods

Log Metrics

Create Evaluation

Best Practices

Common Pitfalls

Missing Async Handling

Large Payloads

Missing Context

Related Skills

References