Oracle Monitor Debug Skill
This skill enables you to debug multi-agent systems using the Oracle Monitor telemetry system. The system provides a unified view of agents, workloads, queues, and LLM usage.
Quick Access
Get System State
bash
# Get current system state as log format (human-readable) curl -s http://localhost:8080/api/v1/oracle/state?format=log | jq -r '.log' # Get as JSON for programmatic access curl -s http://localhost:8080/api/v1/oracle/state # Get summary only curl -s http://localhost:8080/api/v1/oracle/state?format=summary | jq
Check for Issues
bash
# Quick issue check curl -s http://localhost:8080/api/v1/oracle/issues | jq
Get Specific Components
bash
# List agents and their activity curl -s http://localhost:8080/api/v1/oracle/agents | jq # Get Kubernetes workload metrics curl -s http://localhost:8080/api/v1/oracle/workload | jq # Get LLM model usage curl -s http://localhost:8080/api/v1/oracle/llm | jq
CLI Tool
The Oracle Monitor CLI can be run directly:
bash
# From the project directory python -m src.oracle.cli state # Get system state python -m src.oracle.cli agents # List agents python -m src.oracle.cli workload # Show Kubernetes workloads python -m src.oracle.cli llm # Show LLM usage python -m src.oracle.cli watch # Real-time monitoring python -m src.oracle.cli debug # Interactive debug session
Understanding the State
Agent State
- •name: Agent identifier
- •deployment_name: Kubernetes deployment
- •models: LLM models the agent can use
- •activity.active_task_ids: Currently running tasks
Workload State
- •deployment_name: Kubernetes deployment
- •live.active_pods: Number of running pods
- •max_pods: Maximum allowed pods
- •pods: Individual pod metrics (CPU, memory, status)
Queue State
- •name: Queue/topic name
- •tasks: Pending tasks with priority levels
- •tasks[].priority.level: low, normal, high, critical
LLM State
- •model: Model identifier
- •provider: Provider name (ollama, openai, anthropic)
- •tpm/tpm_max: Tokens per minute usage/limit
- •rpm/rpm_max: Requests per minute usage/limit
Debugging Workflow
- •
Start by checking issues:
bashcurl -s http://localhost:8080/api/v1/oracle/issues | jq
- •
If issues found, get detailed state:
bashcurl -s http://localhost:8080/api/v1/oracle/state?format=log | jq -r '.log'
- •
For trace-level debugging:
bash# List recent traces curl -s http://localhost:8080/api/v1/traces?limit=10 | jq # Get specific trace details curl -s http://localhost:8080/api/v1/traces/{trace_id} | jq - •
For real-time events: Connect to WebSocket at
ws://localhost:8080/ws/debug/{client_id}
Common Issues
Agent Not Responding
- •Check if agent deployment is running:
/api/v1/oracle/workload - •Check if LLM is available:
/api/v1/oracle/llm - •Look at recent traces for errors:
/api/v1/traces?status=failed
High Latency
- •Check LLM rate limits:
/api/v1/oracle/llm - •Check queue depth:
/api/v1/oracle/state - •Check pod resource usage:
/api/v1/oracle/workload
Tasks Stuck in Queue
- •Check queue state:
/api/v1/oracle/state?format=summary - •Look for blocked tasks with high
waiting_since_mins - •Check for failed agents
API Reference
| Endpoint | Description |
|---|---|
GET /api/v1/oracle/state | Full system state |
GET /api/v1/oracle/state?format=log | Human-readable log format |
GET /api/v1/oracle/state?format=summary | Just summary metrics |
GET /api/v1/oracle/agents | Agent list and activity |
GET /api/v1/oracle/workload | Kubernetes workload metrics |
GET /api/v1/oracle/llm | LLM model usage |
GET /api/v1/oracle/issues | Current system issues |