Pareto Analysis (80/20 Rule)
Systematically identify and prioritize the "vital few" causes that contribute to the majority of problems. Based on the Pareto Principle: roughly 80% of effects come from 20% of causes.
Integration with Other RCCA Tools
Pareto Analysis provides prioritization - identifying which problems or causes deserve attention first. Typical integration:
- •Pareto → Fishbone → 5 Whys: Prioritize with Pareto, brainstorm causes with Fishbone, drill into root causes with 5 Whys
- •Problem Definition → Pareto → Root Cause Tools: Define scope, prioritize focus areas, investigate top contributors
- •DMAIC Measure Phase: Pareto charts establish baseline and identify improvement targets
Workflow Overview
5 Phases (Q&A-driven):
- •Problem Scoping → Define what you're measuring and why
- •Data Collection → Gather frequency/cost/impact data by category
- •Chart Construction → Build Pareto chart with cumulative line
- •Analysis & Interpretation → Identify vital few, validate 80/20 pattern
- •Documentation → Generate chart and report
Phase 1: Problem Scoping
Goal: Establish clear measurement objective and categories.
Ask the user:
What problem or outcome are you trying to prioritize or analyze?
Examples:
- •"Customer complaints by type"
- •"Defects by category"
- •"Downtime by cause"
- •"Errors by department"
Then clarify:
What will you measure for each category?
Common measurements:
- •Frequency: Count of occurrences
- •Cost: Dollar impact per category
- •Time: Duration or delay per category
- •Severity: Weighted score (frequency × impact)
Quality Gate: Problem scope must:
- • Define a specific, measurable outcome
- • Identify the measurement type (frequency, cost, time, or weighted)
- • Have clear business relevance
Phase 2: Data Collection
Goal: Gather accurate, representative data by category.
Ask the user to provide data or guide collection:
Please provide your data in one of these formats:
Option A - Direct entry:
Category Count/Value Category A 45 Category B 30 ... ... Option B - Raw incident list: Provide a list of incidents with their categories, and I'll tabulate them.
Option C - Describe the data source: Tell me where the data comes from, and I'll help you structure it.
Data Quality Checks:
- • Representative time period (not too short to miss patterns)
- • Consistent category definitions (no overlaps)
- • Sufficient sample size (minimum 30-50 data points recommended)
- • Categories follow MECE principle (Mutually Exclusive, Collectively Exhaustive)
Category Guidelines (see references/category-guidelines.md):
- •Keep categories to 7-10 maximum
- •Use an "Other" category sparingly (should not exceed 10% of total)
- •Categories should be actionable (low enough in causal chain to address)
Phase 3: Chart Construction
Goal: Build the Pareto chart with calculations.
Once data is collected, calculate:
- •Sort categories by count/value in descending order
- •Calculate percentage for each:
(Category Value / Total) × 100 - •Calculate cumulative percentage: Running sum of percentages
- •Identify cutoff: Categories contributing to ≥80% cumulative
Run the calculation script:
python3 scripts/calculate_pareto.py --input data.json
Or provide data directly and I'll calculate:
- •Sort descending
- •Compute percentages
- •Compute cumulative percentages
- •Mark the 80% threshold
Output Structure:
Category | Count | % | Cumulative % ---------|-------|---|------------- Defect A | 45 | 36% | 36% Defect B | 30 | 24% | 60% ← Vital few boundary Defect C | 20 | 16% | 76% Defect D | 15 | 12% | 88% ← 80% threshold crossed Defect E | 10 | 8% | 96% Other | 5 | 4% | 100% ---------|-------|-----|------------ TOTAL | 125 |100% |
Phase 4: Analysis & Interpretation
Goal: Extract actionable insights from the Pareto chart.
Evaluate the analysis against these criteria:
Pattern Recognition
Strong Pareto Effect (steep cumulative curve):
- •Few categories (2-3) account for ≥80% of impact
- •Clear prioritization opportunity
- •Focus improvement efforts on vital few
Weak/No Pareto Effect (gradual cumulative curve):
- •Many categories contribute similar amounts
- •May indicate:
- •Wrong categorization level (too granular or too broad)
- •Truly distributed problem (no dominant causes)
- •Need to weight by severity, not just frequency
Validation Questions
Ask the user:
Looking at this Pareto analysis:
- •Do the top categories (vital few) align with your intuition about the biggest problems?
- •Are there any categories that should be split or combined?
- •Should we apply weighting (e.g., severity × frequency) for more meaningful prioritization?
- •What's the cost/effort to address each of the vital few?
Weighted Pareto (Optional)
If categories have unequal severity, apply weights:
Weighted Score = Frequency × Severity Weight
Then recalculate Pareto on weighted scores.
Phase 5: Documentation
Goal: Generate professional outputs.
Generate the Pareto chart:
python3 scripts/generate_chart.py --input data.json --output pareto_chart.svg
Generate the HTML report:
python3 scripts/generate_report.py --input data.json --output pareto_report.html
Report Contents
- •Problem statement and scope
- •Data collection period and sources
- •Pareto chart (SVG embedded)
- •Data table with calculations
- •Vital few identification
- •Recommendations for next steps
- •Quality score
Quality Scoring
See references/quality-rubric.md for detailed scoring criteria.
6 Dimensions (100 points total):
| Dimension | Weight | Focus |
|---|---|---|
| Problem Clarity | 15% | Clear scope, measurement type, business relevance |
| Data Quality | 25% | Representative, sufficient, consistent categories |
| Category Design | 20% | MECE, actionable, appropriate granularity |
| Calculation Accuracy | 15% | Correct sorting, percentages, cumulative line |
| Pattern Interpretation | 15% | Valid conclusions from cumulative curve |
| Actionability | 10% | Clear next steps, linked to improvement actions |
Passing threshold: 70 points
Common Pitfalls
See references/common-pitfalls.md for detailed descriptions.
- •Flat histogram - No dominant categories; may need recategorization
- •Large "Other" category - Obscures potentially important causes
- •Frequency-only focus - Ignoring cost, severity, or effort to fix
- •Insufficient data - Too short a period or too few observations
- •Overlapping categories - Violates MECE principle
- •Assuming 80/20 is exact - The ratio varies; focus on the pattern
- •Stopping at Pareto - Chart identifies priorities but not root causes
Examples
See references/examples.md for worked examples:
- •Manufacturing defects prioritization
- •Customer complaint analysis
- •IT incident categorization
- •Cost reduction opportunity identification
Session Conduct Guidelines
- •Validate categories early - Poor categories doom the analysis
- •Check for Pareto effect - Steep cumulative curve indicates prioritization opportunity
- •Consider weighting - Frequency alone may mislead
- •Link to root cause tools - Pareto prioritizes; Fishbone/5 Whys investigate
- •Iterate if needed - Drill down (nested Pareto) or re-categorize
- •Communicate visually - Pareto charts are excellent stakeholder tools