Agent Communication Debugger

Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.

Prerequisites

•A2A agent system located in a2a_communicating_agents/
•Python 3.10+ environment
•Access to agent logs in logs/ directory
•Agent configurations in respective agent.json files

Instructions

1. Check Agent Status

First, determine which agents are running:

bash

# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep

Look for:

•orchestrator_agent/main.py
•coder_agent/main.py
•tester_agent/main.py
•websocket_server.py

Common issues:

•Agent process not found → Agent isn't running, needs to be started
•Multiple instances → Duplicate processes causing conflicts

2. Inspect Agent Configurations

Read the agent configuration files to verify capabilities and topics:

bash

# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json

# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json

# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json

Verify:

•Agent names match expected values
•Topics are correctly defined
•Capabilities describe what the agent does
•No JSON syntax errors

3. Check Agent Logs

Examine logs for errors and message flow:

bash

# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log

# View all logs with timestamps
tail -f logs/*.log

# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log

# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log

Look for:

•Connection errors
•Routing decisions showing wrong agent selection
•JSON parsing errors
•Message processing failures

4. Verify Message Transport

Check if the message transport (WebSocket or RAG board) is working:

bash

# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765

# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/

# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"

Expected:

•WebSocket server on port 8765 (if using WebSocket transport)
•Recent messages in storage/message_board.jsonl (if using RAG transport)
•No permission errors accessing storage

5. Test Message Sending

Use the provided test script to send a message and verify delivery:

bash

# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py

This script will:

•Send a test message to the orchestrator topic
•Wait for response
•Show message delivery status
•Display any responses received

6. Diagnose Routing Issues

If messages reach orchestrator but route to wrong agent:

Check orchestrator's routing logic:

bash

# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py

Check priority keyword mappings:

bash

# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Verify agent discovery:

bash

# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5

Common routing issues:

•Agent not discovered → Check agent.json exists and is valid
•Wrong agent selected → Keywords don't match, update priority_mappings
•Null target → No suitable agent found, check agent topics/capabilities

7. Check Environment Variables

Verify API keys and configuration:

bash

# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'

# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"

Required environment variables:

•OPENAI_API_KEY - For LLM-based routing and code generation
•ORCHESTRATOR_MODEL or OPENAI_MODEL - Model to use (default: gpt-5-mini)
•CODER_MODEL - Model for coder agent (optional, defaults to OPENAI_MODEL)

8. Restart Agents (if needed)

If agents are stuck or not responding:

bash

# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"

# Wait a moment
sleep 2

# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &

# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep

9. Common Issues and Solutions

See common_issues.md for a detailed troubleshooting guide covering:

•Messages not being delivered
•Routing to wrong agent
•Agent not generating responses
•Duplicate message processing
•Transport connectivity problems

Quick Diagnostic Checklist

Run through this checklist systematically:

• All required agents are running (orchestrator, coder, tester)
• WebSocket server is running (if using WebSocket transport)
• Agent configuration files are valid JSON
• Orchestrator discovered all agents (check logs)
• OPENAI_API_KEY is set in environment
• Recent log entries show activity
• No Python exceptions in logs
• Test message sends and receives successfully
• Routing decisions select correct agent

Examples

Example 1: Agent Not Responding to Messages

User problem:

code

I'm sending messages to the orchestrator but getting no response

Debug workflow:

•
Check if orchestrator is running:
bash
```
ps aux | grep orchestrator_agent | grep -v grep
```
Result: No process found → Orchestrator isn't running
•
Check logs for crash:
bash
```
tail -50 logs/orchestrator.log
```
Result: ImportError for OpenAI package
•
Solution: Install missing dependency
bash
```
pip install openai
```

•

Restart orchestrator:

bash

cd a2a_communicating_agents
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

•

Verify it's running:

bash

ps aux | grep orchestrator_agent | grep -v grep
tail -10 logs/orchestrator.log

Example 2: Messages Routing to Wrong Agent

User problem:

code

I asked for code but it routed to dashboard-agent instead of coder-agent

Debug workflow:

•
Check orchestrator discovered coder-agent:
bash
```
grep "Discovered.*agents" logs/orchestrator.log | tail -1
```
Result: Shows coder-agent in list ✓
•
Check routing decision in logs:
bash
```
grep -A 5 "please write.*code" logs/orchestrator.log
```
Result: Shows routing to dashboard-agent

•

Check routing logic:

bash

grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Result: Keywords look correct

•
Check LLM routing decision:
bash
```
grep "Error in decision making" logs/orchestrator.log
```
Result: LLM routing failed, falling back to heuristic

•

Check API key:

bash

env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'

Result: Variable not set

•

Solution: Set API key and restart orchestrator:

bash

export OPENAI_API_KEY="your-key-here"
# Or add to .env file
echo "OPENAI_API_KEY=your-key-here" >> .env

•
Restart orchestrator to pick up new environment

Example 3: Coder Agent Acknowledges But Doesn't Generate Code

User problem:

code

Coder agent receives the message but only acknowledges, doesn't generate code

Debug workflow:

•
Check coder agent logs:
bash
```
grep -i "generate\|code" logs/coder.log | tail -20
```
Result: "OpenAI package not available. Code generation will be limited."
•
Check if OpenAI is installed:
bash
```
python -c "import openai; print(openai.__version__)" 2>&1
```
Result: ModuleNotFoundError
•
Install OpenAI package:
bash
```
pip install openai
```

•

Restart coder agent:

bash

pkill -f "coder_agent/main.py"
cd a2a_communicating_agents
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

•
Verify initialization:
bash
```
grep "Initialized with model" logs/coder.log | tail -1
```
Result: Should show model name (e.g., gpt-5-mini)
•
Send test message and verify code generation

Example 4: Complete System Health Check

User request:

code

Run a complete diagnostic on the agent system

Complete diagnostic workflow:

•

Check all agents running:

bash

echo "=== Agent Processes ==="
ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep

•

Check agent configs:

bash

echo "=== Agent Configurations ==="
for agent in orchestrator_agent coder_agent tester_agent; do
  if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then
    echo "--- $agent ---"
    cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool
  fi
done

•

Check environment:

bash

echo "=== Environment Variables ==="
env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'

•

Check recent logs:

bash

echo "=== Recent Log Activity ==="
tail -5 logs/*.log 2>/dev/null

•

Check for errors:

bash

echo "=== Recent Errors ==="
grep -i "error\|exception" logs/*.log | tail -10

•

Test message sending:

bash

echo "=== Message Transport Test ==="
python .claude/skills/agent-debug/scripts/test_message.py

•
Provide summary report with:
- •Agent status (running/stopped)
- •Configuration validity
- •Environment completeness
- •Recent error count
- •Transport test result

Related Tools

•orchestrator_chat.py - Interactive chat interface for testing
•send_agent_message.py - Send messages programmatically
•Agent start/stop scripts in a2a_communicating_agents/

Summary

This skill provides systematic debugging for the A2A agent communication system. Use it whenever:

•Agents aren't communicating
•Messages aren't being delivered
•Routing is incorrect
•System behavior is unexpected

Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:

•Agent not running
•Missing dependencies
•Missing API keys
•Invalid configurations
•Routing logic issues

Start with the Quick Diagnostic Checklist and drill down based on what fails.