Survey Seed Harvest

Bootstrap taxonomy seeds from existing survey/review papers inside your retrieved set.

This is an accelerator for the early structure stage: it should make taxonomy-builder easier, not replace it.

Inputs

Uses: papers/papers_dedup.jsonl.

•
Find likely survey/review papers:
- •title/abstract contains “survey”, “review”, “systematic”, “meta-analysis”
•
Extract candidate topic terms and group them into:
- •~4–10 top-level nodes (“chapters”)
- •2–6 children per node (mappable leaves)
•
Write short, actionable descriptions:
- •what belongs here / what does not
- •(optional) list 2–5 representative titles as seeds
•
Treat the result as a starting point:
- •pass it to taxonomy-builder for domain-meaningful rewriting and scope alignment.

• outline/taxonomy.yml exists and is valid YAML.
• Taxonomy has at least 2 levels (children used) and every node has a description.
• Avoid generic placeholder nodes like “Overview/Benchmarks/Open Problems” unless they are truly content-based for your domain.

•python .codex/skills/survey-seed-harvest/scripts/run.py --help
•python .codex/skills/survey-seed-harvest/scripts/run.py --workspace <workspace_dir>

•
More conservative term selection:
- •python .codex/skills/survey-seed-harvest/scripts/run.py --workspace <ws> --top-k 80 --min-freq 3

•This helper is keyword-based; treat the output as seeds and refine with taxonomy-builder.

Fix:

•Broaden retrieval (add “survey”, “review”, “benchmark” variants) or manually seed a few known surveys, then rerun.

Fix:

•Keep seeds concrete (named methods/benchmarks/tasks) and rely on taxonomy-builder to rewrite under the actual scope.