Judgeval Tracing for Claude Code
Automatically trace Claude Code sessions to Judgeval for observability and debugging.
Features
- •Session Tracing: Capture complete conversation sessions with start/end times
- •Turn Tracking: Track each user prompt as a separate turn
- •LLM Spans: Log every model call with input/output and token usage
- •Tool Spans: Track tool invocations (file operations, terminal, MCP tools)
- •Cache Metrics: Track cache creation and read tokens for prompt caching
Trace Structure
code
Session (task span)
├── Turn 1 (task span)
│ ├── claude-opus-4-5 (llm span)
│ ├── Read: file.py (tool span)
│ ├── Subagent: code-reviewer (task span) ← Subagent with nested spans
│ │ ├── claude-3-5-haiku (llm span)
│ │ ├── Read (tool span)
│ │ └── claude-3-5-haiku (llm span)
│ └── claude-opus-4-5 (llm span)
└── Turn 2 (task span)
└── ...
Setup
After installing the plugin, run setup in your project directory:
bash
cd /path/to/your/project bash ~/.claude/plugins/marketplaces/judgeval-claude-plugin/skills/trace-claude-code/setup.sh
This will prompt you for:
- •
JUDGMENT_API_KEY- Your Judgeval API key - •
JUDGMENT_ORG_ID- Your organization ID - •
JUDGMENT_API_URL- API URL (default: https://api.judgmentlabs.ai) - •
JUDGEVAL_CC_PROJECT- Project name (default: claude-code)
Environment Variables
| Variable | Required | Description |
|---|---|---|
TRACE_TO_JUDGEVAL | Yes | Set to true to enable tracing |
JUDGMENT_API_KEY | Yes | Your Judgeval API key |
JUDGMENT_ORG_ID | Yes | Your organization ID |
JUDGMENT_API_URL | No | API URL (default: https://api.judgmentlabs.ai) |
JUDGEVAL_CC_PROJECT | No | Project name (default: claude-code) |
JUDGEVAL_CC_DEBUG | No | Set to true for debug logging |
Hooks
| Hook | Trigger | Action |
|---|---|---|
session_start.sh | Session begins | Creates root trace span |
user_prompt_submit.sh | User sends prompt | Creates Turn span |
post_tool_use.sh | Tool completes | Tracks tool count |
stop_hook.sh | Response complete | Marks turn for finalization |
subagent_stop.sh | Subagent completes | Parses subagent transcript, creates nested spans |
session_end.sh | Session ends | Creates LLM/Tool spans, finalizes session |
Span Attributes
Session Span
- •
judgment.span_kind: "task" - •
judgment.input: Session description - •
judgment.output: Completion summary - •
turn_count: Number of turns
LLM Span
- •
judgment.span_kind: "llm" - •
judgment.input: Conversation history - •
judgment.output: Model response - •
judgment.llm.model: Model name - •
judgment.llm.provider: "anthropic" - •
judgment.usage.non_cached_input_tokens: Input tokens - •
judgment.usage.output_tokens: Output tokens - •
judgment.usage.cache_creation_input_tokens: Cache write tokens - •
judgment.usage.cache_read_input_tokens: Cache read tokens
Tool Span
- •
judgment.span_kind: "tool" - •
judgment.input: Tool input - •
judgment.output: Tool output - •
tool_name: Tool identifier
Logs
Hook logs are written to: ~/.claude/state/judgeval_hook.log
Enable debug logging:
bash
export JUDGEVAL_CC_DEBUG=true
Troubleshooting
Traces not appearing:
- •Check
TRACE_TO_JUDGEVAL=trueis set - •Verify API key and org ID are correct
- •Check logs for errors
Missing spans:
- •Ensure all hooks are executable
- •Check for jq/curl availability
- •Review debug logs