AgentSkillsCN

agentic-ecosystem-incremental-update

在初始部署完成后,对代理生态系统进行安全的增量部署与更新。重点在于配置的保留、服务的稳妥重启、验证过程,以及可选的测试环节。应在代理生态系统远程部署至少完成一次之后使用。

SKILL.md
--- frontmatter
name: agentic-ecosystem-incremental-update
description: Safe incremental deployment and updates for the agentic ecosystem after initial setup. Focuses on config preservation, safe service restarts, verification, and optional testing. Use after agentic-ecosystem-remote-deployment has completed at least once.
<!-- [Created by Claude: e8fa7e09-6d5f-40f1-89df-3afe03f29ca1] -->

Agentic Ecosystem Incremental Update

Safe deployment strategy for updating components in the agentic ecosystem without breaking existing configurations or services.

When to Use This Skill

  • After initial deployment using agentic-ecosystem-remote-deployment
  • Updating code without changing environment-specific configs
  • Restarting services after crashes or changes
  • Verifying deployments with optional browser testing

Universal Deployment Patterns

1. Pre-Deployment Safety Checklist

Before making any changes:

bash
# Check current state
ssh -p <PORT> <HOST> "cd ~/swe/<component> && git status --short && echo '---' && git log --oneline -3"

# Check running services
ssh -p <PORT> <HOST> "lsof -i :<PORT1> -i :<PORT2> | grep LISTEN"

# Backup .env
ssh -p <PORT> <HOST> "cd ~/swe/<component> && cp .env .env.backup-\$(date +%Y%m%d)"

# List .env backup history
ssh -p <PORT> <HOST> "ls -lt ~/swe/<component>/.env* | head -5"

Critical: Identify remote-specific configs that MUST NOT be overwritten


2. Critical .env Variables Protection Map

Universal pattern: These variables differ per remote and MUST be preserved:

Variable CategoryExamplesWhy Critical
Network bindingSHIM_HOST, CODEX_SHIM_HOSTcm3u=192.168.1.9, cm2=127.0.0.1
Port assignmentsSHIM_PORT, CODEX_SHIM_PORT, AGENT_HQ_UI_PORTMay differ if multiple instances
Working directoriesSHIM_DEFAULT_CWDMay point to different paths
Remote pathsCLAUDE_CODE_CLI, CODEX_CLISymlink targets may differ

Protection strategy:

  1. Before deployment: grep -E '^(SHIM_HOST|CODEX_SHIM_HOST|.*_PORT|.*_CWD)' .env > /tmp/critical-vars.txt
  2. After code update: Restore critical vars from backup
  3. Never deploy .env from local to remote without inspection

3. Safe Service Restart Checklist

Universal restart pattern:

bash
# 1. Kill old process cleanly
ssh <HOST> "pkill -f 'python.*<component>.*server.py'"

# 2. Wait for clean shutdown
sleep 2

# 3. Export PATH (CRITICAL for node access)
# 4. Use Anaconda Python (ALWAYS)
# 5. Start with nohup for persistence
ssh <HOST> "export PATH=\"/opt/homebrew/bin:\$PATH\" && \
  cd ~/swe/<component> && \
  nohup ~/anaconda3/bin/python src/<service>/server.py > /tmp/<service>.log 2>&1 &"

# 6. Wait for startup
sleep 3

# 7. Verify port is listening
ssh <HOST> "lsof -i :<PORT> | grep LISTEN"

Critical requirements:

  • ✅ Always export PATH="/opt/homebrew/bin:$PATH" (for node)
  • ✅ Always use ~/anaconda3/bin/python (not system python)
  • ✅ Pattern: pkill → sleep → export PATH → start with nohup → verify

4. Incremental Deployment Strategy

Pull changes without overwriting configs:

bash
# 1. Fetch remote changes (don't merge yet)
ssh <HOST> "cd ~/swe/<component> && git fetch origin"

# 2. See what will change
ssh <HOST> "cd ~/swe/<component> && git diff HEAD origin/main --stat"

# 3. Stash local .env changes
ssh <HOST> "cd ~/swe/<component> && git stash push .env"

# 4. Pull code changes
ssh <HOST> "cd ~/swe/<component> && git pull origin main"

# 5. Restore .env from backup (don't use stash - may have merge conflicts)
ssh <HOST> "cd ~/swe/<component> && cp .env.backup-YYYYMMDD .env"

# 6. Restart services (see section 3)

Alternative: Selective file updates without git:

  • Use rsync with --exclude='.env' to update code only
  • Preserves all local configs

5. Post-Deployment Verification Protocol

Verify deployment succeeded:

bash
# 1. Check all expected ports
ssh <HOST> "lsof -i :8787 -i :9288 -i :8037 | grep LISTEN"

# 2. Verify processes are using Anaconda Python
ssh <HOST> "ps aux | grep 'anaconda3.*python.*server.py' | grep -v grep"

# 3. Check logs for startup errors
ssh <HOST> "head -20 /tmp/claude-shim.log /tmp/codex-shim.log"

# 4. Test each service with curl
ssh <HOST> "curl -sS http://127.0.0.1:8787/ | head -3"
ssh <HOST> "curl -sS http://127.0.0.1:9288/ | head -3"

# 5. Document results
echo "Deployment verified at $(date)" >> deployment_logs/$(date +%Y_%m_%d).md

6. Interactive Tunnel Setup (Optional)

Ask user first:

"Do you want to create SSH tunnels to test the deployed services?"

If yes:

bash
# Use +20000 port offset to avoid local port conflicts
cd ~/swe/vscode-shims && \
  python launchers/launch_ssh_tunnel_to_m2_tmux.py --verbose \
  --map 28787:8787,29288:9288,28037:8037

Port mapping convention:

  • Local service: 8787, 9288, 8037
  • Remote tunnel: 28787, 29288, 28037

Verify tunnel:

bash
lsof -nP -iTCP:28787 -sTCP:LISTEN
curl -sS http://127.0.0.1:28787/ | head -3

7. Playwright Browser Testing (Optional)

Ask user first:

"Do you want me to test the services using Playwright?"

If yes:

bash
# Visit each tunneled service
playwright navigate http://localhost:28787  # Claude shim
playwright navigate http://localhost:29288  # Codex shim
playwright navigate http://localhost:28037  # Agent HQ

Check for errors:

  • Red error banners
  • Error text in UI
  • Console errors
  • Take screenshot if errors found

Report format:

  • ✅ "Claude shim accessible, no errors"
  • ❌ "Codex shim error: [Errno 2] No such file or directory: 'node'"

8. Rollback Strategy

Before deployment, note:

bash
ssh <HOST> "ls -ld ~/swe/<component>-old-*"
# /Users/m2/swe/vscode-shims-old-20260126

Emergency rollback:

bash
# 1. Kill current services
ssh <HOST> "pkill -f 'python.*<component>.*server.py'"

# 2. Restore old version
ssh <HOST> "cd ~/swe && mv <component> <component>-broken-\$(date +%Y%m%d) && \
  mv <component>-old-YYYYMMDD <component>"

# 3. Restart services
ssh <HOST> "export PATH=\"/opt/homebrew/bin:\$PATH\" && \
  cd ~/swe/<component> && \
  nohup ~/anaconda3/bin/python src/<service>/server.py > /tmp/<service>.log 2>&1 &"

Keep at least 2 previous versions on remote for safety.


9. Common Failure Patterns & Quick Fixes

"Failed to spawn CLI: [Errno 2] No such file or directory: 'node'"

Symptom: Error appears in webview UI, not SSH output Root Cause: /opt/homebrew/bin not in PATH when Python process starts Fix: Restart with PATH export (see section 3) Prevention: Always include export PATH="/opt/homebrew/bin:$PATH"

"Address already in use"

Symptom: Port binding fails during restart Root Cause: Old process not killed, or another service using port Fix:

bash
lsof -i :<PORT> | grep LISTEN
kill -9 <PID>

Prevention: Use pkill before starting new service

"Module not found" or "Import error"

Symptom: Python import failures in logs Root Cause: Using system Python instead of Anaconda Fix: Restart with ~/anaconda3/bin/python Prevention: Always use full Anaconda Python path

"Permission denied"

Symptom: Cannot execute files after transfer Root Cause: File permissions not preserved Fix:

bash
ssh <HOST> "chmod +x ~/swe/<component>/launchers/*.sh"

Prevention: Use rsync -a to preserve permissions


Component-Specific Configurations

vscode-shims

Critical .env variables:

  • SHIM_HOST / CODEX_SHIM_HOST (network binding)
  • SHIM_PORT=8787 / CODEX_SHIM_PORT=9288
  • SHIM_DEFAULT_CWD (working directory)
  • CLAUDE_CODE_CLI / CODEX_CLI (CLI paths)

Restart both services:

bash
# Claude shim
ssh <HOST> "pkill -f 'python.*claude.*server.py' && sleep 2 && \
  export PATH=\"/opt/homebrew/bin:\$PATH\" && \
  cd ~/swe/vscode-shims && \
  nohup ~/anaconda3/bin/python src/claude/server.py > /tmp/claude-shim.log 2>&1 &"

# Codex shim
ssh <HOST> "pkill -f 'python.*codex.*server.py' && sleep 2 && \
  export PATH=\"/opt/homebrew/bin:\$PATH\" && \
  cd ~/swe/vscode-shims && \
  nohup ~/anaconda3/bin/python src/codex/server.py > /tmp/codex-shim.log 2>&1 &"

Verify:

bash
ssh <HOST> "lsof -i :8787 -i :9288 | grep LISTEN"

Unique requirements:

  • Node.js in PATH (for spawning Claude CLI)
  • Anaconda Python 3.10+ (for | union syntax in codex/server.py)

Agent HQ (Future)

Critical .env variables:

  • AMS_TMUX_PORT or AGENT_MGMT_PORT
  • AGENT_HQ_UI_PORT or VITE_PORT
  • Network binding settings

Restart pattern:

bash
# Must source .env before starting
ssh <HOST> "export PATH=\"/opt/homebrew/bin:\$PATH\" && \
  cd ~/AgenticProjects/agent-box-v1 && \
  set -a && source .env && set +a && \
  cd apps/agent-hq-ui && \
  npm run dev:web -- --port 8037 --host 127.0.0.1"

Unique requirements:

  • npm dependencies installed
  • NEVER run Electron remotely (only web version)
  • Must source .env before starting vite

Other Components

As ecosystem grows, add component-specific sections here:

  • Telemetry projects (SQLite ingestors)
  • Custom launchers
  • Background services

Quick Reference: Remote Registry

RemoteSHIM_HOSTAccess MethodNotes
cm3u192.168.1.9Direct LANMac Studio, direct access
cm2127.0.0.1SSH tunnelMacBook Pro, tunnel required

Tunnel port mapping:

  • cm2:8787 → localhost:28787
  • cm2:9288 → localhost:29288
  • cm2:8037 → localhost:28037

Workflow Summary

  1. Safety check → backup .env, check running services
  2. Incremental update → fetch, diff, pull (or rsync code only)
  3. Preserve configs → restore .env from backup
  4. Safe restart → pkill, export PATH, Anaconda Python, nohup
  5. Verify → ports listening, processes correct, logs clean, curl test
  6. Optional: Tunnel → ask user, launch with +20000 offset
  7. Optional: Test → ask user, Playwright visit mapped ports
  8. Document → update deployment_logs

Created by: Claude [e8fa7e09-6d5f-40f1-89df-3afe03f29ca1] Date: 2026-01-26 Pairs with: agentic-ecosystem-remote-deployment