Phoenix CLI
Debug and analyze LLM applications using the Phoenix CLI (px).
Quick Start
Installation
npm install -g @arizeai/phoenix-cli # Or run directly with npx npx @arizeai/phoenix-cli
Configuration
Set environment variables before running commands:
export PHOENIX_HOST=http://localhost:6006 export PHOENIX_PROJECT=my-project export PHOENIX_API_KEY=your-api-key # if authentication is enabled
CLI flags override environment variables when specified.
Debugging Workflows
Debug a failing LLM application
- •Fetch recent traces to see what's happening:
px traces --limit 10
- •Find failed traces:
px traces --limit 50 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
- •Get details on a specific trace:
px trace <trace-id>
- •Look for errors in spans:
px trace <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'
Find performance issues
- •Get the slowest traces:
px traces --limit 20 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
- •Analyze span durations within a trace:
px trace <trace-id> --format raw | jq '.spans | sort_by(-.duration_ms) | .[0:5] | .[] | {name, duration_ms, span_kind}'
Analyze LLM usage
Extract models and token counts:
px traces --limit 50 --format raw --no-progress | \
jq -r '.[].spans[] | select(.span_kind == "LLM") | {model: .attributes["llm.model_name"], prompt_tokens: .attributes["llm.token_count.prompt"], completion_tokens: .attributes["llm.token_count.completion"]}'
Review experiment results
- •List datasets:
px datasets
- •List experiments for a dataset:
px experiments --dataset my-dataset
- •Analyze experiment failures:
px experiment <experiment-id> --format raw --no-progress | \
jq '.[] | select(.error != null) | {input: .input, error}'
- •Calculate average latency:
px experiment <experiment-id> --format raw --no-progress | \ jq '[.[].latency_ms] | add / length'
Command Reference
px traces
Fetch recent traces from a project.
px traces [directory] [options]
| Option | Description |
|---|---|
[directory] | Save traces as JSON files to directory |
-n, --limit <number> | Number of traces (default: 10) |
--last-n-minutes <number> | Filter by time window |
--since <timestamp> | Fetch since ISO timestamp |
--format <format> | pretty, json, or raw |
--include-annotations | Include span annotations |
px trace
Fetch a specific trace by ID.
px trace <trace-id> [options]
| Option | Description |
|---|---|
--file <path> | Save to file |
--format <format> | pretty, json, or raw |
--include-annotations | Include span annotations |
px datasets
List all datasets.
px datasets [options]
px dataset
Fetch examples from a dataset.
px dataset <dataset-name> [options]
| Option | Description |
|---|---|
--split <name> | Filter by split (repeatable) |
--version <id> | Specific dataset version |
--file <path> | Save to file |
px experiments
List experiments for a dataset.
px experiments --dataset <name> [directory]
| Option | Description |
|---|---|
--dataset <name> | Dataset name or ID (required) |
[directory] | Export experiment JSON to directory |
px experiment
Fetch a single experiment with run data.
px experiment <experiment-id> [options]
px prompts
List all prompts.
px prompts [options]
px prompt
Fetch a specific prompt.
px prompt <prompt-name> [options]
Output Formats
- •
pretty(default): Human-readable tree view - •
json: Formatted JSON with indentation - •
raw: Compact JSON for piping tojqor other tools
Use --format raw --no-progress when piping output to other commands.
Trace Structure
Traces contain spans with OpenInference semantic attributes:
{
"traceId": "abc123",
"spans": [{
"name": "chat_completion",
"span_kind": "LLM",
"status_code": "OK",
"attributes": {
"llm.model_name": "gpt-4",
"llm.token_count.prompt": 512,
"llm.token_count.completion": 256,
"input.value": "What is the weather?",
"output.value": "The weather is sunny..."
}
}],
"duration": 1250,
"status": "OK"
}
Key span kinds: LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT.
Key attributes for LLM spans:
- •
llm.model_name: Model used - •
llm.provider: Provider name (e.g., "openai") - •
llm.token_count.prompt/llm.token_count.completion: Token counts - •
llm.input_messages.*: Input messages (indexed, with role and content) - •
llm.output_messages.*: Output messages (indexed, with role and content) - •
input.value/output.value: Raw input/output as text - •
exception.message: Error message if failed