AgentSkillsCN

label

支持四种模式,自动为帧添加边界框标注:CUA+SAM(精度最优,结合 OpenAI CUA 点击与 SAM 分割)、Gemini(原生边界框检测)、GPT Vision(API 备选方案),或 Codex Vision 子代理(无需 API 密钥)。可通过 Git 工作树并行调度。建议在完成帧采集后使用。

SKILL.md
--- frontmatter
name: label
description: Auto-label frames with bounding boxes. Supports four modes — CUA+SAM (best accuracy, OpenAI CUA clicks + SAM segmentation), Gemini (native bbox detection), GPT vision (API fallback), or Codex vision subagents (no API keys). Parallel dispatch via git worktrees. Use after collecting frames.

Labeling Modes

Set label_mode in config.json:

ModeHow it worksBest for
cua+samCUA clicks on objects → SAM segments precise boundariesBest accuracy, hackathon demo
geminiGemini native bounding box detection (0-1000 scale)Fast, good native bbox support
gptGPT vision model returns JSON bounding boxesSimple fallback
codexCodex subagents view images and write YOLO labels directlyNo API keys

Instructions

  1. Read config.json for label_mode, classes, model, num_agents If the user asks to call subagent, route to parallel dispatch in step 5.

  2. CUA+SAM mode (recommended): Run: uv run .agents/skills/label/scripts/label_cua_sam.py Requires: OPENAI_API_KEY, classes must be set in config.json

  3. Gemini mode: Run: uv run .agents/skills/label/scripts/label_gemini.py Requires: GEMINI_API_KEY or GOOGLE_API_KEY

  4. GPT mode (fallback): Run: uv run .agents/skills/label/scripts/run.py Requires: OPENAI_API_KEY

  5. Parallel dispatch (GPT or Codex mode): Run: bash .agents/skills/label/scripts/dispatch.sh [num_agents] Creates N git worktrees, dispatches N Codex subagents, merges results. If Codex subagents are unavailable in-session, this shell command is the fallback path. Supports:

    • label_mode=gpt with OPENAI_API_KEY (runs run_batch.py)
    • label_mode=codex without API keys (Codex image-viewing subagents)
  6. Outputs: output/frames/*.txt (YOLO labels), output/classes.txt

Scripts

ScriptModeDescription
label_cua_sam.pycua+samCUA for clicks + SAM for segmentation
label_gemini.pygeminiGemini native bounding boxes
run.pygptGPT vision structured output
run_batch.pygptGPT vision (subagent batch mode)
dispatch.shgpt/codexParallel subagent orchestrator
merge_classes.pyallUnify class maps from subagents
auto_label_and_show.pyallAuto-run configured labeler and print/render label previews