Experiment Analyzer Skill

Analyze completed growth experiments, extract insights, and drive continuous learning.

When to Activate

This skill should activate when:

•User marks experiment as "completed"
•User asks "what did we learn?"
•User mentions "results", "outcomes", or "analysis"
•User asks "what should we do next?"
•User wants to compare multiple experiments
•User asks about experiment success rates

Analysis Framework

1. Result Classification

Win (Positive + Significant)

•Result is better than baseline
•Statistical significance ≥ 95%
•Change is meaningful (usually ≥5%)

Loss (Negative + Significant)

•Result is worse than baseline
•Statistical significance ≥ 95%
•Change is meaningful

Inconclusive

•Statistical significance < 95%
•Not enough data to make decision
•Sample size may be insufficient

Neutral

•Minimal change (< ±2%)
•No meaningful impact either way
•May indicate hypothesis was off

2. Hypothesis Validation

Compare original hypothesis to results:

Hypothesis Components:

•Proposed change → Was it implemented as planned?
•Target audience → Did we reach the right users?
•Expected outcome → Did we hit the target?
•Rationale → Was our reasoning correct?

Validation Questions:

•Did we achieve the expected outcome? (Yes/No/Partially)
•Was the underlying assumption correct?
•What surprised us?
•What would we do differently?

3. ICE Score Retrospective

Compare predicted vs actual:

Impact Score Validation:

•Predicted Impact: [original score]
•Actual Impact: [calculate based on results]
•Delta: [difference]
•Learning: Was our impact prediction accurate?

Confidence Score Validation:

•Predicted Confidence: [original score]
•Outcome: [win/loss/inconclusive]
•Learning: Was our confidence justified?

Ease Score Validation:

•Predicted Ease: [original score]
•Actual Time: [if tracked]
•Learning: Was implementation as easy as expected?

4. Insight Generation

Key Questions:

•What worked? Specific elements that drove success
•What didn't work? Elements that failed or harmed metrics
•What was surprising? Unexpected findings
•What patterns emerge? Connections to other experiments
•What new questions arise? Areas to investigate further

Secondary Metrics:

•Review all secondary metrics tracked
•Look for unintended positive effects
•Watch for negative side effects
•Consider holistic impact

5. Follow-up Experiment Suggestions

Based on the outcome, suggest 2-3 follow-up experiments:

For Wins:

•Scale: Roll out to 100% of users
•Amplify: Make the winning element more prominent
•Extend: Apply pattern to related areas
•Optimize: Test variations to improve further

For Losses:

•Pivot: Try alternative approach to same problem
•Investigate: Run research to understand why
•Revert: Document and move on
•Learn: Apply learnings to future experiments

For Inconclusive:

•Re-run: Increase sample size or duration
•Simplify: Test smaller version to isolate variable
•Segment: Test with specific user segments
•Refine: Adjust hypothesis based on early signals

Analysis Process

Step 1: Load and Validate

code

1. Read experiment JSON from completed/archived folder
2. Verify results data exists:
   - Primary metric
   - Baseline value
   - Result value
   - Statistical significance
   - Sample size
   - Duration
3. Check if hypothesis is documented
4. Review ICE scores

Step 2: Calculate Key Metrics

code

Change Percentage = ((Result - Baseline) / Baseline) × 100

Result Classification:
- IF change% > 2% AND significance >= 95% → Win
- IF change% < -2% AND significance >= 95% → Loss
- IF significance < 95% → Inconclusive
- IF abs(change%) < 2% → Neutral

Step 3: Generate Insights

code

1. Classify result (Win/Loss/Inconclusive/Neutral)
2. Validate hypothesis against results
3. Review ICE score predictions
4. Extract key learnings
5. Identify surprising findings
6. Check secondary metrics
7. Look for patterns across related experiments

Step 4: Create Follow-up Ideas

code

1. Based on result type, brainstorm 2-3 follow-ups
2. For each follow-up:
   - Draft hypothesis
   - Explain rationale (reference current learnings)
   - Suggest category
   - Provide preliminary ICE estimate
3. Prioritize follow-ups by potential impact

Step 5: Generate Report

code

1. Create markdown analysis report
2. Include:
   - Summary (result classification, key numbers)
   - Hypothesis validation
   - ICE score retrospective
   - Key insights (bulleted list)
   - Secondary metrics review
   - Recommendations
   - Follow-up experiment ideas
3. Save to experiments/archive/[id]_analysis.md
4. Update experiment JSON with learnings

Analysis Output Template

markdown

# Experiment Analysis: [Title]

**Date:** [Analysis date]
**Experiment ID:** [id]
**Status:** [Win/Loss/Inconclusive/Neutral] ✓/✗/?/○

## Summary

- **Primary Metric:** [metric name]
- **Baseline:** [baseline value]
- **Result:** [result value]
- **Change:** [+/-X%]
- **Statistical Significance:** [XX%]
- **Sample Size:** [count]
- **Duration:** [days]

## Hypothesis Validation

### Original Hypothesis
[Full hypothesis statement]

### Validation
- **Expected Outcome:** [what we expected]
- **Actual Outcome:** [what happened]
- **Hypothesis Validated:** [Yes/No/Partially]

**Analysis:**
[Explanation of whether and why hypothesis was validated]

## ICE Score Retrospective

| Component | Predicted | Actual/Assessment | Accuracy |
|-----------|-----------|------------------|----------|
| Impact | [score] | [calculate from results] | [good/overestimated/underestimated] |
| Confidence | [score] | [based on outcome] | [justified/overconfident/underconfident] |
| Ease | [score] | [based on actual effort] | [accurate/harder/easier] |

**Learnings for Future Scoring:**
- [What we learned about predicting impact]
- [What we learned about confidence]
- [What we learned about ease]

## Key Insights

1. **[Primary insight]** - [Explanation with data]
2. **[Secondary insight]** - [Explanation]
3. **[Surprising finding]** - [What we didn't expect]

## Secondary Metrics

| Metric | Change | Interpretation |
|--------|--------|----------------|
| [metric 1] | [+/-X%] | [Good/Bad/Neutral] |
| [metric 2] | [+/-X%] | [Good/Bad/Neutral] |

**Side Effects:**
- Positive: [Any unexpected positive impacts]
- Negative: [Any unexpected negative impacts]

## Recommendations

### Immediate Actions
- [ ] [Action item 1]
- [ ] [Action item 2]

### Strategic Implications
[Broader implications for product/growth strategy]

## Follow-up Experiment Ideas

### 1. [Experiment Title]
**Category:** [category]

**Hypothesis:**
[Full hypothesis following template]

**Rationale:**
[Why this follow-up based on current learnings]

**Preliminary ICE:**
- Impact: [score] - [reasoning]
- Confidence: [score] - [reasoning]
- Ease: [score] - [reasoning]
- **Total: [score]**

---

### 2. [Experiment Title]
[Repeat format]

---

### 3. [Experiment Title]
[Repeat format]

## Related Experiments

[List any related experiments and their outcomes for pattern recognition]

## Notes

[Any additional context, edge cases, or considerations]

Cross-Experiment Analysis

When user asks to analyze multiple experiments:

Metrics to Calculate:

•Success Rate: % of wins out of completed experiments
•Category Performance: Which funnel stages have best win rate?
•ICE Score Accuracy: How well do high-ICE experiments perform?
•Average Impact: What's the typical metric improvement?
•Cycle Time: Average days from backlog → completed

Pattern Recognition:

•Which types of experiments succeed most?
•Which audience segments respond best?
•Which testing methods are most reliable?
•What confidence levels actually predict success?

Portfolio View:

markdown

# Experiment Portfolio Analysis

## Overview
- Total Experiments: [count]
- Completed: [count]
- Win Rate: [X%]
- Average Change: [+X%]

## By Category
| Category | Experiments | Win Rate | Avg Impact |
|----------|-------------|----------|------------|
| Acquisition | [count] | [X%] | [+X%] |
| Activation | [count] | [X%] | [+X%] |
| Retention | [count] | [X%] | [+X%] |
| Revenue | [count] | [X%] | [+X%] |
| Referral | [count] | [X%] | [+X%] |

## ICE Score Performance
- Experiments with ICE > 500: [X% win rate]
- Experiments with ICE 300-500: [X% win rate]
- Experiments with ICE < 300: [X% win rate]

**Learning:** [Are high ICE scores actually better predictors?]

## Top Performers
1. [Experiment] - [+X%] change
2. [Experiment] - [+X%] change
3. [Experiment] - [+X%] change

## Key Patterns
- [Pattern 1 discovered across experiments]
- [Pattern 2]
- [Pattern 3]

## Recommendations
[Strategic recommendations based on portfolio analysis]

Integration Points

•Automatically trigger when /experiment-update sets status to "completed"
•Work with ICE scorer skill to validate predictions
•Inform hypothesis generator with learnings
•Feed into metrics calculator for portfolio analysis

Continuous Improvement

After each analysis:

•Store learnings in a knowledge base
•Update ICE scoring calibration
•Refine hypothesis templates
•Build pattern library
•Improve follow-up suggestions