Labeling Modes
Set label_mode in config.json:
| Mode | How it works | Best for |
|---|---|---|
cua+sam | CUA clicks on objects → SAM segments precise boundaries | Best accuracy, hackathon demo |
gemini | Gemini native bounding box detection (0-1000 scale) | Fast, good native bbox support |
gpt | GPT vision model returns JSON bounding boxes | Simple fallback |
codex | Codex subagents view images and write YOLO labels directly | No API keys |
Instructions
- •
Read config.json for
label_mode,classes,model,num_agentsIf the user asks tocall subagent, route to parallel dispatch in step 5. - •
CUA+SAM mode (recommended): Run:
uv run .agents/skills/label/scripts/label_cua_sam.pyRequires:OPENAI_API_KEY, classes must be set in config.json - •
Gemini mode: Run:
uv run .agents/skills/label/scripts/label_gemini.pyRequires:GEMINI_API_KEYorGOOGLE_API_KEY - •
GPT mode (fallback): Run:
uv run .agents/skills/label/scripts/run.pyRequires:OPENAI_API_KEY - •
Parallel dispatch (GPT or Codex mode): Run:
bash .agents/skills/label/scripts/dispatch.sh [num_agents]Creates N git worktrees, dispatches N Codex subagents, merges results. If Codex subagents are unavailable in-session, this shell command is the fallback path. Supports:- •
label_mode=gptwithOPENAI_API_KEY(runsrun_batch.py) - •
label_mode=codexwithout API keys (Codex image-viewing subagents)
- •
- •
Outputs:
output/frames/*.txt(YOLO labels),output/classes.txt
Scripts
| Script | Mode | Description |
|---|---|---|
label_cua_sam.py | cua+sam | CUA for clicks + SAM for segmentation |
label_gemini.py | gemini | Gemini native bounding boxes |
run.py | gpt | GPT vision structured output |
run_batch.py | gpt | GPT vision (subagent batch mode) |
dispatch.sh | gpt/codex | Parallel subagent orchestrator |
merge_classes.py | all | Unify class maps from subagents |
auto_label_and_show.py | all | Auto-run configured labeler and print/render label previews |