Parallel SSH
This skill enables parallel SSH operations across multiple hosts for distributed workloads like SGLang/vLLM prefill-decode disaggregation.
When to Use This Skill
- •User provides a list of host IPs (internal or external)
- •User wants to run commands in parallel across multiple machines
- •User needs to start distributed services (prefill nodes, decode nodes)
- •User wants to collect logs from multiple machines simultaneously
- •User needs to monitor GPU status across a cluster
SSH Configuration
Default Key File
~/.ssh/google_compute_engine
Step 0: Ensure SSH Key Exists (IMPORTANT)
Before running any SSH commands, ALWAYS check if the key file exists. If not, generate it using gcloud:
# Check if key exists
if [ ! -f ~/.ssh/google_compute_engine ]; then
echo "SSH key not found, generating via gcloud..."
gcloud compute config-ssh --quiet
fi
This command:
- •Generates
~/.ssh/google_compute_engine(private key) and~/.ssh/google_compute_engine.pub(public key) - •Uploads the public key to the GCP project metadata
- •Configures SSH aliases for all instances in the project
SSH Command Format
ssh -i ~/.ssh/google_compute_engine -o StrictHostKeyChecking=accept-new <IP> "<command>"
The -o StrictHostKeyChecking=accept-new option automatically accepts new host keys (safe for first connection).
Parallel Execution Pattern
Step 1: Parse Host List
When user provides hosts like:
- •"10.8.0.81 10.8.0.82 10.8.0.83"
- •"10.8.0.81, 10.8.0.82, 10.8.0.83"
- •Listed in a file
Parse into individual IPs.
Step 2: Launch Background Tasks
For each host, launch SSH command with run_in_background: true:
ssh -i ~/.ssh/google_compute_engine -o StrictHostKeyChecking=accept-new 10.8.0.81 "command"
IMPORTANT: Launch ALL tasks in a SINGLE message with multiple Bash tool calls to achieve true parallelism.
Step 3: Collect Output
Use TaskOutput to wait for completion, or Read to check progress:
- •Output files:
/tmp/claude-*/tasks/<task_id>.output
Common Commands
GPU Status Check
nvidia-smi --query-gpu=index,name,memory.used,memory.total,utilization.gpu --format=csv
System Status
vmstat 2 15
Service Status
systemctl status <service> || pgrep -a <process>
Start SGLang Prefill Node
source /opt/deepep/unified-env.sh && \
python3 -m sglang.launch_server \
--model-path deepseek-ai/DeepSeek-V3 \
--disaggregation-mode prefill \
--tp-size 8 \
--port 30000 \
--host 0.0.0.0 \
...
Start SGLang Decode Node
source /opt/deepep/unified-env.sh && \
python3 -m sglang.launch_server \
--model-path deepseek-ai/DeepSeek-V3 \
--disaggregation-mode decode \
--tp-size 8 \
--port 30001 \
--host 0.0.0.0 \
...
Example Workflows
Example 1: Check GPU Status Across Cluster
User: "Check GPU status on 10.8.0.81 10.8.0.82 10.8.0.83"
Action: Launch 3 parallel SSH tasks with nvidia-smi command, collect and summarize results.
Example 2: Start Distributed Inference
User: "Start prefill on 10.8.0.81, decode on 10.8.0.82 and 10.8.0.83"
Action:
- •SSH to 10.8.0.81: Start prefill node
- •SSH to 10.8.0.82: Start decode node 1
- •SSH to 10.8.0.83: Start decode node 2 (All launched in parallel)
Example 3: Kill Services Across Cluster
User: "Kill all python processes on all machines"
Action: Launch parallel pkill -f python commands on all hosts.
Example 4: Collect Logs
User: "Get last 100 lines of sglang logs from all machines"
Action: Launch parallel tail -100 /path/to/logs commands.
Host Key Management
If SSH fails with "Host key verification failed":
# Add host key to known_hosts ssh-keyscan -H <IP> >> ~/.ssh/known_hosts # Or use StrictHostKeyChecking=accept-new (recommended) ssh -o StrictHostKeyChecking=accept-new ...
Best Practices
- •Always use background mode for long-running commands
- •Launch all tasks in single message for true parallelism
- •Collect output progressively for long tasks
- •Summarize results in table format for easy comparison
- •Handle failures gracefully - report which hosts failed
Output Format
When reporting results, use table format:
| Host | Status | Key Metrics |
|---|---|---|
| 10.8.0.81 | OK | GPU 0-7: 95% util |
| 10.8.0.82 | OK | GPU 0-7: 80% util |
| 10.8.0.83 | FAILED | SSH timeout |
Environment Variables for Distributed Training
Common environment variables to set on each node:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export NCCL_SOCKET_IFNAME=enp0s19 export GLOO_SOCKET_IFNAME=enp0s19 export MASTER_ADDR=<prefill_node_ip> export MASTER_PORT=29500
Troubleshooting
SSH Key Not Found
If you see "Identity file not accessible" or "No such file or directory":
# Generate SSH key using gcloud (recommended for GCE) gcloud compute config-ssh --quiet # This creates ~/.ssh/google_compute_engine and uploads public key to project metadata
SSH Connection Refused
- •Check if SSH service is running on target
- •Verify network connectivity:
ping <IP>
Permission Denied (publickey)
- •First, check if key exists:
bash
ls -la ~/.ssh/google_compute_engine
- •If not exists, generate it:
bash
gcloud compute config-ssh --quiet
- •If exists but still fails, verify key is uploaded to project:
bash
gcloud compute project-info describe --format="value(commonInstanceMetadata.items.filter(key:ssh-keys))"
Command Timeout
- •Use
nohupfor long-running processes - •Use
screenortmuxfor persistent sessions - •Increase Bash timeout parameter