A/B Test Statistics Calculator
Calculate statistical significance for A/B tests - know when your results are real, not random chance.
When to Use This Skill
- •Test analysis - Determine if results are statistically significant
- •Sample planning - Calculate required sample size before testing
- •Duration estimation - Know how long to run experiments
- •Power analysis - Ensure tests can detect meaningful differences
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Metric definitions |
| Identifies patterns in data | Business interpretation |
| Creates visualization templates | Dashboard design |
| Suggests optimization areas | Action priorities |
| Calculates statistical measures | Decision thresholds |
Dependencies
bash
pip install scipy numpy click
Commands
Check Significance
bash
python scripts/main.py significance --control 1000,50 --variant 1000,65 python scripts/main.py significance --control 5000,250 --variant 5000,300 --confidence 0.99
Calculate Sample Size
bash
python scripts/main.py sample-size --baseline 0.05 --mde 0.02 python scripts/main.py sample-size --baseline 0.10 --mde 0.01 --power 0.90
Estimate Duration
bash
python scripts/main.py duration --traffic 1000 --baseline 0.05 --mde 0.02
Examples
Example 1: Analyze Test Results
bash
# Control: 1000 visitors, 50 conversions (5%) # Variant: 1000 visitors, 65 conversions (6.5%) python scripts/main.py significance --control 1000,50 --variant 1000,65 # Output: # A/B Test Results # ───────────────────────── # Control: 5.00% (50/1000) # Variant: 6.50% (65/1000) # Lift: +30.0% # # Statistical Analysis # ───────────────────────── # p-value: 0.089 # Confidence: 91.1% # Result: NOT SIGNIFICANT (need 95%) # # Recommendation: Continue test for more data
Example 2: Plan Sample Size
bash
# Baseline 5% conversion, want to detect 20% relative lift (1% absolute) python scripts/main.py sample-size --baseline 0.05 --mde 0.01 # Output: # Sample Size Calculator # ────────────────────────────── # Baseline conversion: 5.0% # Minimum detectable effect: 1.0% (20% relative) # Target conversion: 6.0% # # Required per variant: 3,842 visitors # Total required: 7,684 visitors # # At 1000 daily visitors: ~8 days
Key Concepts
| Term | Definition |
|---|---|
| p-value | Probability result is due to chance |
| Confidence | 1 - p-value (usually want 95%+) |
| Power | Probability of detecting real effect (usually 80%) |
| MDE | Minimum Detectable Effect - smallest lift worth detecting |
| Lift | Relative improvement (variant - control) / control |
When Results Are Significant
| p-value | Confidence | Verdict |
|---|---|---|
| < 0.01 | > 99% | Highly Significant ✓ |
| < 0.05 | > 95% | Significant ✓ |
| < 0.10 | > 90% | Marginally Significant |
| ≥ 0.10 | < 90% | Not Significant ✗ |
Skill Boundaries
What This Skill Does Well
- •Structuring data analysis
- •Identifying patterns and trends
- •Creating visualization frameworks
- •Calculating statistical measures
What This Skill Cannot Do
- •Access your actual data
- •Replace statistical expertise
- •Make business decisions
- •Guarantee prediction accuracy
Related Skills
- •cohort-analysis - Analyze user cohorts
- •funnel-analyzer - Analyze conversion funnels
Skill Metadata
- •Mode: centaur
yaml
category: analytics subcategory: statistics dependencies: [scipy, numpy] difficulty: intermediate time_saved: 3+ hours/week