Research Spec Interview
Collaborative interview to develop comprehensive research specifications. Unlike product spec interviews, this focuses on scientific rigor: hypotheses, variables, confounds, baselines, and reproducibility.
When to Use
- •Starting a new research experiment
- •Designing A/B tests or ablation studies
- •Planning model evaluations or benchmark runs
- •Any work requiring statistical validity and reproducibility
Process
1. Introduction
Explain the interview process:
- •"I'll ask questions across 15 research categories"
- •"We'll start high-level, then drill down on critical decisions"
- •"I'll challenge assumptions to strengthen the design"
- •"Resource validation runs inline during the interview"
2. Conduct Interview
Reference the research-interview-guide.md for 15 question categories.
Interview flow:
- •Start high-level: Get research question, motivation, main hypotheses
- •Ask 2-4 focused questions per round: Don't overwhelm with all questions at once
- •Drill down strategically: Spend more time on critical variables and confounds
- •Challenge constructively:
- •"Why baseline X instead of Y?"
- •"What if confound Z explains your results?"
- •"How confident are you in this assumption?"
- •Inline resource validation (Category 11):
- •When discussing resources, check system capabilities immediately
- •Run:
sysctl hw.physicalcpu hw.memsize(macOS) ornproc && free -h(Linux) - •Compare to stated requirements, warn if insufficient
- •Format: "System has X cores / Y GB, you need A cores / B GB - ✓/⚠"
- •Iterate until checklist complete: Ensure all blocking items covered
Categories to cover (see research-interview-guide.md for detailed questions):
- •Research Question & Motivation
- •Hypotheses & Falsification
- •Independent Variables (high-level → drill down)
- •Dependent Variables & Metrics
- •Control Variables
- •Confounding Variables
- •Models & Hyperparameters
- •Baselines & Comparisons
- •Datasets
- •Graphs & Visualizations
- •Resources & Validation (inline check)
- •Sample Size & Power
- •Performance & Caching
- •Error Handling & Retries
- •Reproducibility
3. Generate Spec
Once interview complete, write spec to specs/research-interview-$(utc_date).md using research-spec-template.md.
Spec requirements:
- •Fill all template sections based on interview responses
- •Include resource validation results (✓/⚠ marks)
- •Complete validation checklist at end
- •Keep spec focused (~100-150 lines)
- •Use exact user terminology and values
4. Review & Iterate
Present spec to user:
- •"I've drafted a research spec based on our discussion"
- •"Review the validation checklist - any blocking items missing?"
- •"What needs clarification or adjustment?"
Make revisions as needed.
Output Location
specs/research-interview-DD-MM-YYYY.md (using current UTC date)
Key Principles
- •Scientific rigor: Hypotheses must be falsifiable
- •Explicit assumptions: Challenge vague or implicit assumptions
- •Resource realism: Validate requirements against available resources
- •Reproducibility first: Seeds, versions, logging planned upfront
- •Minimize confounds: Identify and control alternative explanations
Example Interaction
code
User: "I want to test if chain-of-thought improves reasoning" Agent: "I'll help you design a rigorous experiment. Let me start with the core research question: 1. What specific reasoning tasks are you testing? (e.g., math problems, logical puzzles, commonsense reasoning) 2. What baseline are you comparing chain-of-thought against? (zero-shot, few-shot without CoT, etc.) 3. What does 'improves' mean quantitatively? (accuracy increase, reduction in errors, faster convergence?)" User: "Math problems from MMLU. Comparing zero-shot CoT vs zero-shot direct answers. Success means >5% accuracy improvement." Agent: "Good start. Now let's nail down the variables: 4. Which model(s)? Different models have different reasoning capabilities. 5. What's your sample size per condition? Need enough for statistical power. 6. Any confounds to control? (prompt wording, temperature, etc.)" User: "Claude Sonnet 4.5, N=500 per condition, temperature=0 for both." Agent: [After several more rounds covering hypotheses, metrics, datasets, resources...] "Here's your research spec: specs/research-interview-25-01-2026.md Key validation items: ✅ Hypothesis: CoT improves accuracy on MMLU math by >5% | Falsification: if Δ<5% or p>0.05 ✅ Metrics: exact_match accuracy on MMLU-math subset ✅ Datasets: MMLU v1.0, math subset (N=1000), stratified split ✅ Resources: System has 16 cores / 64GB RAM - sufficient ✓ ⚠️ Random seeds not specified - recommend seeds=[42,43,44,45,46] for 5 runs Ready to proceed with implementation?"