AgentSkillsCN

codex-subagent-orchestrator

在当前 Codex macOS 应用会话中,启动、监控并协调异步 Codex CLI 子代理。适用于用户需要并行运行 Codex 工作者、执行长时间后台任务、实时监控集群状态、基于 JSON 或 JSONL 任务文件进行批量编排、支持任务取消、日志尾部追踪、自适应等待或事件驱动式等待,以及标准化的运行结束产出(模型、上下文限制、Token 使用量、运行时长,以及状态签到)。

SKILL.md
--- frontmatter
name: codex-subagent-orchestrator
description: Launch, monitor, and coordinate asynchronous Codex CLI subagents from the current Codex macOS app session. Use when users need parallel Codex workers, long-running background execution, fleet monitoring, batch orchestration from JSON or JSONL task files, cancellation, log tailing, adaptive or event-driven waiting, and standardized end-of-run artifacts (model, context limit, token usage, duration, and status check-ins).
compatibility: Requires Python 3 and Codex CLI on PATH. Event-mode waiting uses macOS kqueue when available and falls back to timed polling otherwise.

Codex Subagent Orchestrator

Overview

Use this skill to run codex exec workers asynchronously, keep persistent state per job, and orchestrate large batches without blocking the current Codex app conversation.

The orchestrator now emits structured end-of-run artifacts per job so model/runtime metadata is reportable and reproducible.

Defaults for Model and Reasoning

  • Default model: infer from current session env when available (CODEX_SESSION_MODEL, CODEX_MODEL, OPENAI_MODEL), otherwise fall back to ~/.codex/config.toml model.
  • Default reasoning effort: infer from session env when available, otherwise ~/.codex/config.toml model_reasoning_effort.
  • Override at spawn time:
    • --model <name>
    • --reasoning-effort <low|medium|high|...>

Runtime Projection Input (Required for Smart Auto-Wait)

wait --interval-mode auto no longer guesses from prompt size. The orchestrating model must provide an explicit runtime projection when spawning jobs:

  • --projected-runtime-seconds <seconds> for spawn
  • projected_runtime_seconds per task (or batch default) for batch

Recommended TPS anchors for projection planning:

  • Codex family: ~70 tokens/second
  • Codex Spark family: ~1000 tokens/second

End-of-Run Artifact Contract

Each completed job can generate artifacts.json containing:

  • Model details: requested, effective, source
  • Reasoning effort: requested, effective, source
  • Context limit: context.limit_tokens and source
  • Token usage breakdown: input, cached input, output, reasoning output, total
  • Duration: run_duration_seconds
  • Paths: logs, final message, check-ins

How to generate/report:

  1. Launch with JSON event logging enabled (default):
    • spawn and batch default to --json-events
  2. Wait for completion with artifact emission (default):
    • wait ... emits artifact summary lines and writes artifacts.json
  3. Read full report:
    • python3 "<path-to-skill>/scripts/subagent_fleet.py" artifacts <job-id> --json

Wait Strategy (Adaptive or Event-Driven)

Avoid tight fixed polling loops by default:

  • wait --interval-mode auto (default)
    • Uses only model-provided runtime projections from spawn/batch metadata.
    • If projections are missing, falls back to base interval behavior (no prompt-size heuristics).
  • wait --interval-mode event
    • Uses filesystem notifications (kqueue on macOS) to wake on job-dir updates.
    • Falls back gracefully if event backend is unavailable.
  • wait --interval-mode fixed --interval N
    • Use only when deterministic polling cadence is required.

Status check-ins are recorded by default to <job-dir>/status_checkins.tsv.

Quick Start

  1. Verify environment:
    • python3 "<path-to-skill>/scripts/subagent_fleet.py" doctor
  2. Launch a worker with explicit model controls:
    • python3 "<path-to-skill>/scripts/subagent_fleet.py" spawn --label "schema-audit" --cwd "/path/to/repo" --model "gpt-5.3-codex" --reasoning-effort "medium" --full-auto --sandbox workspace-write --prompt "Audit schema drift and report findings only."
  3. Wait (event-driven) and collect artifacts:
    • python3 "<path-to-skill>/scripts/subagent_fleet.py" wait --interval-mode event <job-id>
    • python3 "<path-to-skill>/scripts/subagent_fleet.py" artifacts <job-id> --json

Workflow

  1. Define independent prompts that can run in parallel.
  2. Launch each unit with spawn or submit many at once with batch.
  3. Track health with list and inspect one job with status.
  4. Use wait (auto or event mode) to complete with minimal polling overhead.
  5. Read artifacts.json for model/context/tokens/duration/check-ins.
  6. Aggregate each final_message.txt into a parent synthesis.

Command Guide

spawn

Launch one async subagent and return immediately.

  • Prompt input: --prompt, --prompt-file, or stdin pipe
  • Model/runtime controls:
    • --model
    • --reasoning-effort
    • --context-limit-tokens (optional explicit report value)
    • --projected-runtime-seconds (model-provided runtime estimate for auto wait cadence)
  • Other controls:
    • --cwd, --profile, --sandbox, --full-auto
    • --dangerously-bypass-approvals-and-sandbox requires explicit --allow-dangerous
    • --add-dir (repeatable)
    • --env KEY=VALUE (repeatable)

batch

Launch many jobs from JSON or JSONL.

  • Required: --file <tasks.json|tasks.jsonl>
  • Supports per-task overrides for model/reasoning/context-limit/projection
  • Tasks that set dangerously_bypass_approvals_and_sandbox are rejected unless batch --allow-dangerous is set
  • Throttling:
    • --max-running N
    • --interval for throttling checks
    • --launch-interval pause between launches
  • See task schema: references/task-file-format.md

list / status

  • list shows fleet overview with duration and effective model
  • status <job-id> shows metadata and can include artifact summary

logs

  • logs <job-id> --stream stdout|stderr --lines 80
  • Add --follow for live tail while running

wait

  • wait <job-id ...> for targeted jobs
  • wait with no ids tracks all known jobs
  • Defaults:
    • Adaptive/event-aware wait strategy
    • Status check-ins enabled
    • Artifact generation enabled

artifacts

  • artifacts <job-id> prints final execution report
  • Add --json for machine-readable output

cancel

  • cancel <job-id ...> to stop specific workers
  • cancel --all to stop all running workers
  • --force-after <seconds> escalates from SIGTERM to SIGKILL

doctor

Check codex binary availability, state-dir health, and default model/reasoning sources.

Operational Notes

  • Default state root is .codex-subagents under current working directory.
  • Each job writes meta.json, logs, prompt, final message, completion markers, and check-ins.
  • artifacts.json is generated from run metadata + JSON event logs.
  • If context limit is not discoverable from logs and not provided explicitly, artifact value is null with source unknown.
  • Prefer wait --interval-mode event or auto for long tasks to avoid aggressive polling.