Experiment Analyzer Skill
Analyze completed growth experiments, extract insights, and drive continuous learning.
When to Activate
This skill should activate when:
- •User marks experiment as "completed"
- •User asks "what did we learn?"
- •User mentions "results", "outcomes", or "analysis"
- •User asks "what should we do next?"
- •User wants to compare multiple experiments
- •User asks about experiment success rates
Analysis Framework
1. Result Classification
Win (Positive + Significant)
- •Result is better than baseline
- •Statistical significance ≥ 95%
- •Change is meaningful (usually ≥5%)
Loss (Negative + Significant)
- •Result is worse than baseline
- •Statistical significance ≥ 95%
- •Change is meaningful
Inconclusive
- •Statistical significance < 95%
- •Not enough data to make decision
- •Sample size may be insufficient
Neutral
- •Minimal change (< ±2%)
- •No meaningful impact either way
- •May indicate hypothesis was off
2. Hypothesis Validation
Compare original hypothesis to results:
Hypothesis Components:
- •Proposed change → Was it implemented as planned?
- •Target audience → Did we reach the right users?
- •Expected outcome → Did we hit the target?
- •Rationale → Was our reasoning correct?
Validation Questions:
- •Did we achieve the expected outcome? (Yes/No/Partially)
- •Was the underlying assumption correct?
- •What surprised us?
- •What would we do differently?
3. ICE Score Retrospective
Compare predicted vs actual:
Impact Score Validation:
- •Predicted Impact: [original score]
- •Actual Impact: [calculate based on results]
- •Delta: [difference]
- •Learning: Was our impact prediction accurate?
Confidence Score Validation:
- •Predicted Confidence: [original score]
- •Outcome: [win/loss/inconclusive]
- •Learning: Was our confidence justified?
Ease Score Validation:
- •Predicted Ease: [original score]
- •Actual Time: [if tracked]
- •Learning: Was implementation as easy as expected?
4. Insight Generation
Key Questions:
- •What worked? Specific elements that drove success
- •What didn't work? Elements that failed or harmed metrics
- •What was surprising? Unexpected findings
- •What patterns emerge? Connections to other experiments
- •What new questions arise? Areas to investigate further
Secondary Metrics:
- •Review all secondary metrics tracked
- •Look for unintended positive effects
- •Watch for negative side effects
- •Consider holistic impact
5. Follow-up Experiment Suggestions
Based on the outcome, suggest 2-3 follow-up experiments:
For Wins:
- •Scale: Roll out to 100% of users
- •Amplify: Make the winning element more prominent
- •Extend: Apply pattern to related areas
- •Optimize: Test variations to improve further
For Losses:
- •Pivot: Try alternative approach to same problem
- •Investigate: Run research to understand why
- •Revert: Document and move on
- •Learn: Apply learnings to future experiments
For Inconclusive:
- •Re-run: Increase sample size or duration
- •Simplify: Test smaller version to isolate variable
- •Segment: Test with specific user segments
- •Refine: Adjust hypothesis based on early signals
Analysis Process
Step 1: Load and Validate
code
1. Read experiment JSON from completed/archived folder 2. Verify results data exists: - Primary metric - Baseline value - Result value - Statistical significance - Sample size - Duration 3. Check if hypothesis is documented 4. Review ICE scores
Step 2: Calculate Key Metrics
code
Change Percentage = ((Result - Baseline) / Baseline) × 100 Result Classification: - IF change% > 2% AND significance >= 95% → Win - IF change% < -2% AND significance >= 95% → Loss - IF significance < 95% → Inconclusive - IF abs(change%) < 2% → Neutral
Step 3: Generate Insights
code
1. Classify result (Win/Loss/Inconclusive/Neutral) 2. Validate hypothesis against results 3. Review ICE score predictions 4. Extract key learnings 5. Identify surprising findings 6. Check secondary metrics 7. Look for patterns across related experiments
Step 4: Create Follow-up Ideas
code
1. Based on result type, brainstorm 2-3 follow-ups 2. For each follow-up: - Draft hypothesis - Explain rationale (reference current learnings) - Suggest category - Provide preliminary ICE estimate 3. Prioritize follow-ups by potential impact
Step 5: Generate Report
code
1. Create markdown analysis report 2. Include: - Summary (result classification, key numbers) - Hypothesis validation - ICE score retrospective - Key insights (bulleted list) - Secondary metrics review - Recommendations - Follow-up experiment ideas 3. Save to experiments/archive/[id]_analysis.md 4. Update experiment JSON with learnings
Analysis Output Template
markdown
# Experiment Analysis: [Title] **Date:** [Analysis date] **Experiment ID:** [id] **Status:** [Win/Loss/Inconclusive/Neutral] ✓/✗/?/○ ## Summary - **Primary Metric:** [metric name] - **Baseline:** [baseline value] - **Result:** [result value] - **Change:** [+/-X%] - **Statistical Significance:** [XX%] - **Sample Size:** [count] - **Duration:** [days] ## Hypothesis Validation ### Original Hypothesis [Full hypothesis statement] ### Validation - **Expected Outcome:** [what we expected] - **Actual Outcome:** [what happened] - **Hypothesis Validated:** [Yes/No/Partially] **Analysis:** [Explanation of whether and why hypothesis was validated] ## ICE Score Retrospective | Component | Predicted | Actual/Assessment | Accuracy | |-----------|-----------|------------------|----------| | Impact | [score] | [calculate from results] | [good/overestimated/underestimated] | | Confidence | [score] | [based on outcome] | [justified/overconfident/underconfident] | | Ease | [score] | [based on actual effort] | [accurate/harder/easier] | **Learnings for Future Scoring:** - [What we learned about predicting impact] - [What we learned about confidence] - [What we learned about ease] ## Key Insights 1. **[Primary insight]** - [Explanation with data] 2. **[Secondary insight]** - [Explanation] 3. **[Surprising finding]** - [What we didn't expect] ## Secondary Metrics | Metric | Change | Interpretation | |--------|--------|----------------| | [metric 1] | [+/-X%] | [Good/Bad/Neutral] | | [metric 2] | [+/-X%] | [Good/Bad/Neutral] | **Side Effects:** - Positive: [Any unexpected positive impacts] - Negative: [Any unexpected negative impacts] ## Recommendations ### Immediate Actions - [ ] [Action item 1] - [ ] [Action item 2] ### Strategic Implications [Broader implications for product/growth strategy] ## Follow-up Experiment Ideas ### 1. [Experiment Title] **Category:** [category] **Hypothesis:** [Full hypothesis following template] **Rationale:** [Why this follow-up based on current learnings] **Preliminary ICE:** - Impact: [score] - [reasoning] - Confidence: [score] - [reasoning] - Ease: [score] - [reasoning] - **Total: [score]** --- ### 2. [Experiment Title] [Repeat format] --- ### 3. [Experiment Title] [Repeat format] ## Related Experiments [List any related experiments and their outcomes for pattern recognition] ## Notes [Any additional context, edge cases, or considerations]
Cross-Experiment Analysis
When user asks to analyze multiple experiments:
Metrics to Calculate:
- •Success Rate: % of wins out of completed experiments
- •Category Performance: Which funnel stages have best win rate?
- •ICE Score Accuracy: How well do high-ICE experiments perform?
- •Average Impact: What's the typical metric improvement?
- •Cycle Time: Average days from backlog → completed
Pattern Recognition:
- •Which types of experiments succeed most?
- •Which audience segments respond best?
- •Which testing methods are most reliable?
- •What confidence levels actually predict success?
Portfolio View:
markdown
# Experiment Portfolio Analysis ## Overview - Total Experiments: [count] - Completed: [count] - Win Rate: [X%] - Average Change: [+X%] ## By Category | Category | Experiments | Win Rate | Avg Impact | |----------|-------------|----------|------------| | Acquisition | [count] | [X%] | [+X%] | | Activation | [count] | [X%] | [+X%] | | Retention | [count] | [X%] | [+X%] | | Revenue | [count] | [X%] | [+X%] | | Referral | [count] | [X%] | [+X%] | ## ICE Score Performance - Experiments with ICE > 500: [X% win rate] - Experiments with ICE 300-500: [X% win rate] - Experiments with ICE < 300: [X% win rate] **Learning:** [Are high ICE scores actually better predictors?] ## Top Performers 1. [Experiment] - [+X%] change 2. [Experiment] - [+X%] change 3. [Experiment] - [+X%] change ## Key Patterns - [Pattern 1 discovered across experiments] - [Pattern 2] - [Pattern 3] ## Recommendations [Strategic recommendations based on portfolio analysis]
Integration Points
- •Automatically trigger when
/experiment-updatesets status to "completed" - •Work with ICE scorer skill to validate predictions
- •Inform hypothesis generator with learnings
- •Feed into metrics calculator for portfolio analysis
Continuous Improvement
After each analysis:
- •Store learnings in a knowledge base
- •Update ICE scoring calibration
- •Refine hypothesis templates
- •Build pattern library
- •Improve follow-up suggestions