Swarm Coordination
Use this skill when orchestrating a multi-agent swarm using Vers VMs. It combines the Vers swarm tools (from pi-v) with the agent services tools (board, feed, registry) into a unified coordination workflow.
The Full Toolkit
You have access to three categories of tools:
Vers VM & Swarm Tools (from pi-v)
- •
vers_vm_create,vers_vm_commit,vers_vm_restore— VM lifecycle - •
vers_swarm_spawn— Branch VMs and start pi agents - •
vers_swarm_task— Send tasks to agents - •
vers_swarm_status— Check agent status - •
vers_swarm_read— Read agent output - •
vers_swarm_wait— Block until agents finish - •
vers_swarm_teardown— Destroy swarm VMs
Board Tools (coordination)
- •
board_create_task— Track work items - •
board_list_tasks— Query task status - •
board_update_task— Update status/assignee - •
board_add_note— Attach findings, blockers, questions
Feed & Registry Tools (observability)
- •
feed_publish— Emit events for audit trail - •
feed_list— Query activity history - •
feed_stats— Get summary metrics - •
registry_register— Register VMs for discovery - •
registry_discover— Find agents by role - •
registry_heartbeat— Keep registrations alive - •
registry_list— List all registered VMs
SkillHub (fleet-wide skill management)
The infra server includes a SkillHub at /skills/* for centrally managing skills and extensions across all agents. The extension automatically syncs skills from the hub to ~/.pi/agent/skills/_hub/ on session start and subscribes to SSE for live updates. This is handled transparently — agents don't need to interact with SkillHub directly unless publishing new skills.
Environment Variables Agents Receive
When spawned via vers_swarm_spawn, agents get these env vars:
- •
ANTHROPIC_API_KEY— for LLM calls - •
VERS_API_KEY— for Vers API - •
VERS_INFRA_URL— coordination services URL - •
VERS_AUTH_TOKEN— auth token for coordination services - •
VERS_VM_ID— this VM's ID (for self-registration) - •
VERS_AGENT_ROLE— agent's role (worker, lieutenant, etc.) - •
VERS_AGENT_NAME— agent's label/name
The agent-services extension automatically:
- •Publishes
agent_started/agent_stoppedto the feed - •Registers the agent in the registry on startup (using
VERS_VM_ID) - •Sends periodic heartbeats (every 60s) to keep registry entry alive
- •Updates registry status to
stoppedon shutdown
This means you don't need to manually register swarm agents — they self-register the moment they boot. You only need to register VMs created outside the extension (e.g., infra VMs).
Recommended Workflow
Follow this sequence when spinning up a coordinated swarm:
1. Assess the Environment
registry_list {} # What's already running?
board_list_tasks { status: "open" } # Any pending work?
feed_list { limit: 10 } # Recent activity?
2. Create Board Tasks
Break the work into discrete, parallelizable tasks. Each task should be independently completable.
board_create_task {
title: "Implement user auth module",
description: "Create JWT-based auth with refresh tokens. Files: src/auth/...",
tags: ["feature", "auth", "sprint-1"],
createdBy: "coordinator"
}
Task decomposition rules:
- •Each task should take 5–20 minutes for an agent
- •Include specific file paths and acceptance criteria in the description
- •Tag tasks for the feature area they belong to
- •If tasks have dependencies, note them in the description (don't rely on ordering)
3. Spawn the Swarm
vers_swarm_spawn {
commitId: "<golden-image-commit>",
count: 3,
labels: ["auth", "api", "tests"],
anthropicApiKey: "<key>"
}
4. Verify Agent Registration
Agents self-register automatically on boot (via the extension). Verify they've checked in:
registry_list { status: "running", role: "worker" }
If an agent hasn't appeared after ~10s, check feed_list for its agent_started event or vers_swarm_status for errors.
5. Assign Tasks
Update board tasks with assignees and dispatch work:
board_update_task { id: "<task-id>", status: "in_progress", assignee: "worker-auth" }
vers_swarm_task { agentId: "auth", task: "..." }
In the task prompt, tell the agent:
- •What board task ID it's working on
- •To use
feed_publishfor progress updates - •To use
board_add_notefor findings/blockers - •To update the task status when done
6. Monitor Progress
Poll the feed and board while agents work:
feed_list { limit: 20 } # Check for blockers or questions
board_list_tasks { status: "blocked" } # Any stuck tasks?
vers_swarm_status {} # Are agents still working?
If an agent reports a blocker:
- •Check the board note for details
- •Steer the agent or reassign the task
- •Publish a feed event about the resolution
7. Collect Results
vers_swarm_wait { timeoutSeconds: 600 } # Wait for all agents
Then for each agent:
vers_swarm_read { agentId: "auth" }
board_update_task { id: "<task-id>", status: "done" }
feed_publish { agent: "coordinator", type: "task_completed", summary: "Auth module complete" }
8. Clean Up
⚠️ vers_swarm_teardown destroys ALL swarm VMs. The infra VM must NOT be part of the swarm — see Pitfalls below.
vers_swarm_teardown {}
feed_publish { agent: "coordinator", type: "agent_stopped", summary: "Swarm teardown complete" }
Recovery Pattern
If a coordinator session drops (disconnect, crash, compaction), follow the recovery skill (skills/recovery/SKILL.md) for the full protocol. Quick summary:
- •
registry_list { status: "running" }— find active VMs - •
board_list_tasks {}— find all work items by status - •
feed_list { limit: 50 }— understand what happened - •Cross-reference: are
in_progresstask assignees still in the registry? - •Re-assign orphaned tasks, resolve blockers, spawn replacements
- •
feed_publisha recovery event
Since agents auto-register and heartbeat, the registry is your source of truth for what's alive. VMs missing heartbeats for 5+ min are stale and excluded from registry_discover.
Conventions
Agent Naming
- •Coordinator:
coordinatororcoordinator-<project> - •Lieutenants:
lt-<domain>(e.g.,lt-backend,lt-frontend) - •Workers:
worker-<label>(e.g.,worker-auth,worker-api) - •Names should be stable across sessions for feed/board traceability
Task Tagging
- •Feature area:
auth,api,ui,infra - •Task type:
feature,bugfix,test,refactor,docs - •Priority:
p0,p1,p2 - •Sprint/batch:
sprint-1,batch-2
Feed Event Types
Use consistently across all agents:
| Type | When |
|---|---|
agent_started | Agent begins work (auto-published by extension) |
agent_stopped | Agent finishes (auto-published by extension) |
task_started | Agent picks up a specific task |
task_completed | Task finished successfully |
task_failed | Task failed (include error in detail) |
blocker_found | Agent hit a blocker (add board note too) |
question | Agent needs human/coordinator input |
finding | Agent discovered something noteworthy |
file_changed | Significant file modification |
cost_update | Token/cost usage update |
custom | Anything else |
Board Notes
- •
finding: Code insights, patterns discovered, things to know - •
blocker: Something preventing progress — always needs attention - •
question: Needs coordinator or human answer - •
update: Status update or progress checkpoint
Pitfalls
Never put the infra VM in the swarm pool
vers_swarm_teardown destroys ALL VMs that were created by vers_swarm_spawn. If you spawned the infra VM (running agent-services) through the swarm, teardown kills your coordination layer. Always create the infra VM separately via vers_vm_create or vers_vm_restore — never through vers_swarm_spawn.
Infra VM setup is a direct operation
Since the infra VM can't be in the swarm, it's fine to set it up directly via vers_vm_use. This is the one exception to the "never work on VMs directly" rule — you can't delegate infra setup to an agent that depends on the infra you're setting up.
Golden images go stale
Application code baked into golden images drifts from main after merges. After merging changes (especially security patches), either rebuild the golden image or patch in place on restored VMs. Keep golden images minimal — base tooling (node, pi, git) only, not application code that changes frequently.
Swarm agents may ignore inline source in task prompts
When sending large source files inline in a task prompt, agents on golden images may find existing (stale) code at the expected paths and use that instead of writing what you provided. Be explicit: "delete the existing directory first" or verify the deployed code matches after the agent finishes.
All VM ports are public — always use auth
Vers VMs have no firewall. Every port is reachable at https://{vmId}.vm.vers.sh:{port}. Always start agent-services with VERS_AUTH_TOKEN set, and pass the same token to all worker VMs via environment variable.
Web UI for monitoring
The infra VM serves a dashboard at /ui/ with a 3-panel view of board, feed (live SSE), and registry. Use POST /auth/magic-link to generate a browser-accessible auth link. Useful for human operators monitoring the swarm.
Example: Full Coordinated Build
# 1. Check environment
registry_list {}
board_list_tasks {}
# 2. Plan work
board_create_task { title: "Build auth service", tags: ["feature", "auth"], createdBy: "coordinator" }
board_create_task { title: "Build API routes", tags: ["feature", "api"], createdBy: "coordinator" }
board_create_task { title: "Write integration tests", tags: ["test"], createdBy: "coordinator" }
# 3. Spawn agents
vers_swarm_spawn { commitId: "abc123", count: 3, labels: ["auth", "api", "tests"], anthropicApiKey: "sk-..." }
# 4. Verify agents self-registered (automatic via extension)
registry_list { status: "running", role: "worker" } # Should show 3 agents
# 5. Assign and dispatch
board_update_task { id: "task-1", status: "in_progress", assignee: "worker-auth" }
vers_swarm_task { agentId: "auth", task: "Build the auth service. Task ID: task-1. Use feed_publish for updates." }
# ... repeat for other agents
# 6. Monitor
feed_list { limit: 20 }
board_list_tasks { status: "blocked" }
# 7. Collect
vers_swarm_wait { timeoutSeconds: 600 }
# 8. Wrap up
board_update_task { id: "task-1", status: "done" }
board_update_task { id: "task-2", status: "done" }
board_update_task { id: "task-3", status: "done" }
vers_swarm_teardown {}