Agent Communication Debugger
Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.
Prerequisites
- •A2A agent system located in
a2a_communicating_agents/ - •Python 3.10+ environment
- •Access to agent logs in
logs/directory - •Agent configurations in respective
agent.jsonfiles
Instructions
1. Check Agent Status
First, determine which agents are running:
# Check all agent processes ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep
Look for:
- •
orchestrator_agent/main.py - •
coder_agent/main.py - •
tester_agent/main.py - •
websocket_server.py
Common issues:
- •Agent process not found → Agent isn't running, needs to be started
- •Multiple instances → Duplicate processes causing conflicts
2. Inspect Agent Configurations
Read the agent configuration files to verify capabilities and topics:
# View orchestrator config cat a2a_communicating_agents/orchestrator_agent/agent.json # View coder agent config cat a2a_communicating_agents/coder_agent/agent.json # View tester agent config (if exists) cat a2a_communicating_agents/tester_agent/agent.json
Verify:
- •Agent names match expected values
- •Topics are correctly defined
- •Capabilities describe what the agent does
- •No JSON syntax errors
3. Check Agent Logs
Examine logs for errors and message flow:
# View orchestrator logs (last 50 lines) tail -50 logs/orchestrator.log # View all logs with timestamps tail -f logs/*.log # Search for specific errors grep -i "error\|exception\|failed" logs/*.log # Check for routing decisions grep -i "routing to\|routed to" logs/orchestrator.log
Look for:
- •Connection errors
- •Routing decisions showing wrong agent selection
- •JSON parsing errors
- •Message processing failures
4. Verify Message Transport
Check if the message transport (WebSocket or RAG board) is working:
# Check if WebSocket server is running ps aux | grep websocket_server | grep -v grep netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765 # Check RAG board storage ls -lh a2a_communicating_agents/storage/ ls -lh storage/ # Check recent messages in message board tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"
Expected:
- •WebSocket server on port 8765 (if using WebSocket transport)
- •Recent messages in storage/message_board.jsonl (if using RAG transport)
- •No permission errors accessing storage
5. Test Message Sending
Use the provided test script to send a message and verify delivery:
# Send a test message to orchestrator python .claude/skills/agent-debug/scripts/test_message.py
This script will:
- •Send a test message to the orchestrator topic
- •Wait for response
- •Show message delivery status
- •Display any responses received
6. Diagnose Routing Issues
If messages reach orchestrator but route to wrong agent:
Check orchestrator's routing logic:
# View the decide_route method grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py
Check priority keyword mappings:
# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py
Verify agent discovery:
# Check discovered agents in logs grep "Discovered.*agents" logs/orchestrator.log | tail -5
Common routing issues:
- •Agent not discovered → Check agent.json exists and is valid
- •Wrong agent selected → Keywords don't match, update priority_mappings
- •Null target → No suitable agent found, check agent topics/capabilities
7. Check Environment Variables
Verify API keys and configuration:
# Check if OPENAI_API_KEY is set (don't display value) env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/' # Check model configuration grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"
Required environment variables:
- •
OPENAI_API_KEY- For LLM-based routing and code generation - •
ORCHESTRATOR_MODELorOPENAI_MODEL- Model to use (default: gpt-5-mini) - •
CODER_MODEL- Model for coder agent (optional, defaults to OPENAI_MODEL)
8. Restart Agents (if needed)
If agents are stuck or not responding:
# Stop all agents pkill -f "orchestrator_agent/main.py" pkill -f "coder_agent/main.py" pkill -f "tester_agent/main.py" pkill -f "websocket_server.py" # Wait a moment sleep 2 # Start WebSocket server (if using) cd a2a_communicating_agents nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 & # Start orchestrator nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 & # Start coder agent nohup python coder_agent/main.py > ../logs/coder.log 2>&1 & # Verify they started sleep 3 ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep
9. Common Issues and Solutions
See common_issues.md for a detailed troubleshooting guide covering:
- •Messages not being delivered
- •Routing to wrong agent
- •Agent not generating responses
- •Duplicate message processing
- •Transport connectivity problems
Quick Diagnostic Checklist
Run through this checklist systematically:
- • All required agents are running (orchestrator, coder, tester)
- • WebSocket server is running (if using WebSocket transport)
- • Agent configuration files are valid JSON
- • Orchestrator discovered all agents (check logs)
- • OPENAI_API_KEY is set in environment
- • Recent log entries show activity
- • No Python exceptions in logs
- • Test message sends and receives successfully
- • Routing decisions select correct agent
Examples
Example 1: Agent Not Responding to Messages
User problem:
I'm sending messages to the orchestrator but getting no response
Debug workflow:
- •
Check if orchestrator is running:
bashps aux | grep orchestrator_agent | grep -v grep
Result: No process found → Orchestrator isn't running
- •
Check logs for crash:
bashtail -50 logs/orchestrator.log
Result: ImportError for OpenAI package
- •
Solution: Install missing dependency
bashpip install openai
- •
Restart orchestrator:
bashcd a2a_communicating_agents nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &
- •
Verify it's running:
bashps aux | grep orchestrator_agent | grep -v grep tail -10 logs/orchestrator.log
Example 2: Messages Routing to Wrong Agent
User problem:
I asked for code but it routed to dashboard-agent instead of coder-agent
Debug workflow:
- •
Check orchestrator discovered coder-agent:
bashgrep "Discovered.*agents" logs/orchestrator.log | tail -1
Result: Shows coder-agent in list ✓
- •
Check routing decision in logs:
bashgrep -A 5 "please write.*code" logs/orchestrator.log
Result: Shows routing to dashboard-agent
- •
Check routing logic:
bashgrep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.pyResult: Keywords look correct
- •
Check LLM routing decision:
bashgrep "Error in decision making" logs/orchestrator.log
Result: LLM routing failed, falling back to heuristic
- •
Check API key:
bashenv | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'
Result: Variable not set
- •
Solution: Set API key and restart orchestrator:
bashexport OPENAI_API_KEY="your-key-here" # Or add to .env file echo "OPENAI_API_KEY=your-key-here" >> .env
- •
Restart orchestrator to pick up new environment
Example 3: Coder Agent Acknowledges But Doesn't Generate Code
User problem:
Coder agent receives the message but only acknowledges, doesn't generate code
Debug workflow:
- •
Check coder agent logs:
bashgrep -i "generate\|code" logs/coder.log | tail -20
Result: "OpenAI package not available. Code generation will be limited."
- •
Check if OpenAI is installed:
bashpython -c "import openai; print(openai.__version__)" 2>&1
Result: ModuleNotFoundError
- •
Install OpenAI package:
bashpip install openai
- •
Restart coder agent:
bashpkill -f "coder_agent/main.py" cd a2a_communicating_agents nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &
- •
Verify initialization:
bashgrep "Initialized with model" logs/coder.log | tail -1
Result: Should show model name (e.g., gpt-5-mini)
- •
Send test message and verify code generation
Example 4: Complete System Health Check
User request:
Run a complete diagnostic on the agent system
Complete diagnostic workflow:
- •
Check all agents running:
bashecho "=== Agent Processes ===" ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep
- •
Check agent configs:
bashecho "=== Agent Configurations ===" for agent in orchestrator_agent coder_agent tester_agent; do if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then echo "--- $agent ---" cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool fi done - •
Check environment:
bashecho "=== Environment Variables ===" env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'
- •
Check recent logs:
bashecho "=== Recent Log Activity ===" tail -5 logs/*.log 2>/dev/null
- •
Check for errors:
bashecho "=== Recent Errors ===" grep -i "error\|exception" logs/*.log | tail -10
- •
Test message sending:
bashecho "=== Message Transport Test ===" python .claude/skills/agent-debug/scripts/test_message.py
- •
Provide summary report with:
- •Agent status (running/stopped)
- •Configuration validity
- •Environment completeness
- •Recent error count
- •Transport test result
Related Tools
- •
orchestrator_chat.py- Interactive chat interface for testing - •
send_agent_message.py- Send messages programmatically - •Agent start/stop scripts in
a2a_communicating_agents/
Summary
This skill provides systematic debugging for the A2A agent communication system. Use it whenever:
- •Agents aren't communicating
- •Messages aren't being delivered
- •Routing is incorrect
- •System behavior is unexpected
Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:
- •Agent not running
- •Missing dependencies
- •Missing API keys
- •Invalid configurations
- •Routing logic issues
Start with the Quick Diagnostic Checklist and drill down based on what fails.