Brainstorm
Goal
Run structured, interactive ideation that turns ambiguous research goals into concrete environment and evaluation plans.
Interaction Style
- •Drive an iterative conversation, not a one-shot dump.
- •Ask focused clarifying questions before proposing large plans.
- •Keep suggestions toolchain-native: CLI, verifiers, and RL trainer workflows.
Discovery Workflow
- •Clarify objective, model family, budget, and timeline.
- •Map objective to workflow levers:
- •environment creation or migration
- •benchmark/eval design
- •GEPA prompt optimization
- •RL training
- •Build a short option set, then deepen only selected options.
- •Nudge model-family intent explicitly:
- •Instruct-first exploration defaults:
gpt-4.1series,qwen3instruct series. - •Reasoning-first exploration defaults:
gpt-5series,qwen3thinking series,glmseries. - •Recommend endpoint aliases in
configs/endpoints.tomlfor repeatable experiments.
Required Grounding Sources
- •Read local source before proposing workflows:
- •optionally clone Prime Intellect repositories to
/tmponly when needed, e.g.- •
git clone https://github.com/PrimeIntellect-ai/prime-cli /tmp/prime-cli - •
git clone https://github.com/PrimeIntellect-ai/prime-rl /tmp/prime-rl
- •
- •current verifiers workspace docs/configs
- •For literature and external eval ideas, browse web sources and prioritize mid-2025 onward unless the user asks otherwise.
- •Include dates when discussing recent papers or benchmarks.
Concept Teaching Mode
When asked to explain RL or environment concepts:
- •Anchor explanations in prime-rl and verifiers terminology.
- •Use concrete config and rollout examples.
- •Distinguish binary-reward and continuous-reward training implications.
Planning Output Format
Produce:
- •Problem framing and assumptions.
- •Candidate environment or eval ideas, ranked by expected value and implementation effort.
- •Experiment plan with milestones, metrics, and go/no-go gates.
- •Risks, dependencies, and required decisions from the user.
- •Distribution plan for mature environments: recommend Hub push after smoke-test stability and ask whether visibility should be
PUBLICorPRIVATE.
Quality Guardrails
- •Do not make hidden assumptions about benchmark prompt formatting or scoring contracts.
- •Flag platform limitations clearly and pause for user direction when blocked.
- •Prefer official first-party capabilities before suggesting custom third-party tooling.