Metric Diagnosis Workflow
Purpose
Systematically investigate why a metric changed unexpectedly (dropped or spiked) by segmenting data, distinguishing internal vs. external factors, and testing hypotheses to identify root cause. Prevents rushing to wrong conclusions and ensures evidence-based responses.
When to Use This Workflow
Use this workflow when:
- •Key metrics drop or spike unexpectedly
- •Leadership asks "what happened to [metric]?"
- •Post-mortem analysis needed after incident
- •A/B test shows unexpected patterns
- •Need to validate suspected root cause
- •User behavior changed suddenly without obvious reason
Skills Sequence
This workflow applies the root-cause-diagnosis skill three times, then assesses North Star impact:
1. Root Cause Diagnosis (Phase 1) ↓ (Segment across 4 dimensions: People, Geography, Technology, Time) 2. Root Cause Diagnosis (Phase 2) ↓ (Distinguish intrinsic vs extrinsic factors) 3. Root Cause Diagnosis (Phase 3) ↓ (Build hypothesis table and test against data) 4. North Star Alignment ↓ (Assess impact on top-line metrics) OUTPUT: Narrowed scope, tested hypotheses, identified root cause, recommended actions
Required Inputs
Gather this information before starting:
Metric Change Information
- •The metric that changed
- •Name and definition
- •Example: "Weekly Active Users (WAU)"
- •Magnitude of change
- •Absolute and percentage
- •Example: "10,000 → 9,000 (10% drop)"
- •Timeframe
- •When was it noticed?
- •Over what period did change occur?
- •Example: "Noticed today, change happened over past week"
Data Access
- •Segmented data availability
- •Can you segment by user type, geography, platform?
- •Are raw logs accessible?
- •Comparison data
- •Historical baselines
- •Year-over-year seasonality
- •Competitor benchmarks (if available)
Recent Activity Log
- •Recent releases (past 1-4 weeks)
- •Feature launches
- •Bug fixes
- •Infrastructure changes
- •Running experiments
- •A/B tests active
- •Configuration changes
- •External context
- •Marketing campaigns
- •News events
- •Competitor activity
Workflow Steps
Phase 0: Data Quality Verification (5 minutes)
Before investigating, confirm the data is real:
- •
Check reporting systems
- •Is tracking still functional?
- •Any pipeline issues?
- •Data freshness correct?
- •
Cross-reference sources
- •Do multiple data sources show same pattern?
- •Internal dashboards vs. external tools aligned?
- •
Validate sample sizes
- •Sufficient data for statistical significance?
- •Time periods comparable?
Red flags that indicate data quality issues:
- •Metric went to exactly zero
- •Change happened at exact midnight
- •Only visible in one data source
- •Coincides with known tracking deployment
If data quality issue found: Fix tracking first, then re-investigate if problem persists.
If data is valid: Proceed to Phase 1.
Phase 1: Narrow the Scope (15-20 minutes)
Use the root-cause-diagnosis skill - 4 Dimension Segmentation
Ask clarifying questions across all four dimensions:
Dimension 1: People (User Segments)
Questions to ask:
- •Does it affect all users equally?
- •Specific demographics (age, gender, location)?
- •New vs. returning users?
- •Free vs. paid users?
- •Power users vs. casual users?
- •Specific cohorts or user types?
Request data segmentation by user attributes.
Dimension 2: Geography
Questions to ask:
- •Specific countries or regions affected?
- •Certain cities?
- •Urban vs. rural?
- •Climate zones?
- •Time zones?
Request data segmentation by location.
Dimension 3: Technology (Platform)
Questions to ask:
- •iOS vs. Android?
- •Web vs. mobile app?
- •Desktop vs. mobile web?
- •Specific browser or OS versions?
- •Device types?
- •Network types (WiFi vs. cellular)?
Request data segmentation by platform/device.
Dimension 4: Time
Questions to ask:
- •When exactly did it start?
- •Gradual decline or sudden drop?
- •Day of week patterns?
- •Time of day patterns?
- •Recurring or one-time?
- •Seasonal effects?
Request time-series data with granular breakdown.
Output of Phase 1:
Narrow problem statement from:
- •❌ "Usage is down"
To:
- •✓ "iOS usage among teens in US dropped 10% starting Sept 1st"
Document:
## Narrowed Scope **Original statement:** [Broad problem] **Narrowed statement:** [Specific problem] **Affected segments:** - People: [Which user groups] - Geography: [Which locations] - Technology: [Which platforms] - Time: [When and pattern] **Unaffected segments:** - [What stayed stable - equally important]
Phase 2: Generate Hypotheses (15-20 minutes)
Use the root-cause-diagnosis skill - Intrinsic vs. Extrinsic Factor Analysis
Brainstorm potential causes in both categories:
Intrinsic Factors (Internal Changes)
Engineering & Product:
- •Recent code releases (check deploy logs)
- •Bug introductions
- •Feature launches by any team
- •A/B experiments started/stopped
- •Algorithm or ranking changes
- •Performance degradations
Check with:
- •Engineering team (deploys, errors, performance)
- •Other product teams (their launches)
- •Data team (tracking changes)
Operations & Business:
- •Pricing changes
- •Policy updates
- •Support process changes
- •Marketing campaign changes
Check with:
- •Operations team
- •Marketing team
- •Customer success team
Extrinsic Factors (External Events)
Competition:
- •Competitor product launches
- •Competitor pricing changes
- •Competitor marketing campaigns
Market:
- •Economic shifts
- •Industry trends
- •Regulatory changes
- •Platform policy changes (iOS, Android, etc.)
Calendar/Seasonal:
- •Holidays
- •School schedules
- •Weather patterns
- •Cultural events
News/PR:
- •News coverage (positive or negative)
- •Social media trends
- •External events affecting user behavior
Check with:
- •Sales team (competitive intelligence)
- •Customer success (user feedback)
- •Marketing (market trends)
- •Industry news sources
Output of Phase 2:
List of 4-6 hypotheses (mix of intrinsic and extrinsic):
## Hypothesis List **Intrinsic (Internal):** 1. [Hypothesis name]: [Description] 2. [Hypothesis name]: [Description] 3. [Hypothesis name]: [Description] **Extrinsic (External):** 4. [Hypothesis name]: [Description] 5. [Hypothesis name]: [Description] 6. [Hypothesis name]: [Description]
Phase 3: Test Hypotheses (20-30 minutes)
Use the root-cause-diagnosis skill - Hypothesis Table Method
Step 3.1: Create Hypothesis Table
Columns: Data points observed (from Phase 1) Rows: Potential causes (from Phase 2)
Fill each cell with predicted impact:
- •↓ = Would decrease this segment
- •↑ = Would increase this segment
- •→ = No effect on this segment
- •? = Uncertain
Step 3.2: Match Patterns
For each hypothesis:
- •Do predictions match observations?
- •If ALL predictions match → Possible cause
- •If ANY predictions contradict → Rule out
Step 3.3: Request Differentiating Data
If multiple hypotheses fit:
- •Identify data points that would differ between them
- •Request additional segmentation
- •Narrow to single most likely cause
Example Table:
| Cause | iOS ↓ | Android → | Teens ↓ | Adults → | US ↓ | EU → |
|---|---|---|---|---|---|---|
| iOS bug in latest update | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| School started (US only) | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ |
| Competitor targeted teens | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ |
Pattern match: "School started (US)" fits all observations → Most likely cause
Output of Phase 3:
## Hypothesis Testing Results **Ruled out:** - [Hypothesis]: [Why ruled out] - [Hypothesis]: [Why ruled out] **Possible causes (remaining):** 1. [Most likely hypothesis] - Evidence supporting: [List] - Evidence against: [List if any] - Confidence: [High/Medium/Low] 2. [Second most likely] (if applicable) - Evidence supporting: [List] - Confidence: [High/Medium/Low] **Additional data needed to confirm:** - [Data point that would differentiate remaining hypotheses]
Phase 4: Assess North Star Impact (10 minutes)
Use the north-star-alignment skill
Evaluate whether identified root cause impacts company-level metrics:
Questions:
- •Does this affect North Star metrics?
- •Is impact temporary or permanent?
- •Does it require immediate action?
- •What's the business impact (revenue, growth, retention)?
Severity classification:
Critical (Act immediately):
- •Impacts revenue directly
- •Affects North Star metrics substantially
- •Permanent degradation if not fixed
- •Competitive disadvantage
Important (Act soon):
- •Impacts user experience
- •Affects intermediate metrics
- •May worsen over time
- •Needs monitoring
Low Priority (Monitor):
- •Expected variation (seasonality)
- •Affects non-critical segments
- •Self-resolving
- •No action needed
Output of Phase 4:
## North Star Impact Assessment **Affected North Star metrics:** - [Metric]: [Impact description] - [Metric]: [Impact description] **Business impact:** - Revenue: [Direct/Indirect/None] - Growth: [Slowed/Accelerated/Neutral] - Retention: [Harmed/Improved/Neutral] **Severity:** [Critical/Important/Low Priority] **Urgency:** [Immediate/This week/Monitor]
Phase 5: Recommend Actions (10 minutes)
Based on root cause and severity, recommend next steps:
Action categories:
- •
Fix (for bugs/issues)
- •What needs to be fixed?
- •Who owns the fix?
- •Timeline for fix?
- •
Mitigate (for external factors)
- •What can reduce harm?
- •Counter-measures available?
- •Timeline for mitigation?
- •
Monitor (for expected variations)
- •What tracking is needed?
- •Alert thresholds?
- •Review cadence?
- •
No Action (for acceptable changes)
- •Why no action needed?
- •What validates this decision?
Output document:
# Metric Diagnosis Report: [Metric Name] ## Executive Summary [One paragraph: What happened, why, what to do] ## Problem Statement - **Initial observation:** [Broad statement] - **Narrowed scope:** [Specific statement with segments] - **Magnitude:** [Absolute and % change] - **Timeframe:** [When occurred] ## Root Cause Analysis ### Narrowed Scope (Phase 1) - **Affected:** [User segments, geography, platforms, time pattern] - **Unaffected:** [What stayed stable] ### Hypotheses Tested (Phase 2-3) | Hypothesis | Category | Evidence | Conclusion | |------------|----------|----------|------------| | [Name] | Intrinsic/Extrinsic | [Supporting/Contradicting] | Ruled Out/Possible/Likely | ### Identified Root Cause **Most likely cause:** [Hypothesis name] **Evidence supporting:** 1. [Data point matching prediction] 2. [Data point matching prediction] 3. [Stakeholder confirmation] **Confidence level:** [High/Medium/Low] ### North Star Impact (Phase 4) - **Affected metrics:** [List with impact] - **Severity:** [Critical/Important/Low] - **Urgency:** [Immediate/Soon/Monitor] ## Recommended Actions ### Primary action: **[Fix/Mitigate/Monitor/No Action]** - What: [Specific action] - Who: [Owner] - When: [Timeline] - Success criteria: [How to know it worked] ### Secondary actions: - [Action 1] - [Action 2] ### Monitoring plan: - Metrics to track: [List] - Review frequency: [Daily/Weekly] - Alert thresholds: [When to escalate] ## Appendix - Hypothesis table: [Detailed] - Data sources: [Links] - Stakeholders consulted: [Names/teams]
Common Diagnostic Patterns
Pattern 1: Seasonal Variation
- •Symptoms: Recurring pattern (weekly, yearly)
- •Investigation: Compare to historical same period
- •Action: Usually monitor, adjust baselines
- •Example: School year start, holiday season
Pattern 2: Competitor Impact
- •Symptoms: Sudden shift, specific demographic
- •Investigation: Check competitor news, market research
- •Action: Counter-strategy, feature parity
- •Example: Competing app launch with targeted marketing
Pattern 3: Internal Bug
- •Symptoms: Sudden drop, platform-specific
- •Investigation: Check recent releases, error logs
- •Action: Hotfix, rollback if needed
- •Example: iOS update broke key feature
Pattern 4: Product Change Side Effect
- •Symptoms: Gradual shift after release
- •Investigation: Analyze feature adoption patterns
- •Action: Iterate on feature, A/B test variations
- •Example: Redesign changed user behavior
Pattern 5: Tracking Issue (False Alarm)
- •Symptoms: Exact numbers, midnight timing, single source
- •Investigation: Verify data pipeline
- •Action: Fix tracking, dismiss false alarm
- •Example: Analytics tag removed in deploy
Common Mistakes
| Mistake | Fix |
|---|---|
| Jumping to conclusions without segmentation | Complete Phase 1 (4D segmentation) first |
| Only considering one type of factor | Brainstorm both intrinsic AND extrinsic |
| Accepting first matching hypothesis | Test all hypotheses systematically |
| Skipping data quality check | Always verify data is valid first |
| Not consulting stakeholders | Talk to sales, CS, engineering teams |
| Immediate rollback without diagnosis | Understand cause before acting |
Success Criteria
Metric diagnosis succeeds when:
- •Data quality verified
- •Problem narrowed to specific segments (4 dimensions)
- •4-6 hypotheses brainstormed (intrinsic + extrinsic)
- •Hypotheses tested against data systematically
- •Root cause identified (or top 2-3 candidates with confidence levels)
- •North Star impact assessed
- •Recommended actions clear with owners and timelines
- •Diagnosis document created and shared
- •Stakeholders aligned on next steps
Real-World Example: Microsoft Teams Downloads Drop
Initial Report
"New app downloads down 50% from last week"
Phase 1: Narrow Scope (15 min)
Questions asked:
- •Geography? → "Predominantly US and Europe"
- •User type? → "Enterprise customers, not consumers"
- •Platform? → "Web and desktop primarily, some mobile"
- •Other metrics? → "Usage unchanged (existing users still active)"
Narrowed statement: "Enterprise customer downloads via web/desktop down 50% in US/Europe, starting last week. Existing user engagement unchanged."
Phase 2: Generate Hypotheses (15 min)
Intrinsic:
- •Bug in download flow
- •Recent release broke something
- •A/B test reducing visibility
Extrinsic: 4. Marketing promotion ended 5. Competitor launch 6. Economic downturn affecting enterprise spending
Stakeholder consults:
- •Engineering: No recent releases, no bugs reported
- •Marketing: "We were running 50% off for enterprises switching from Slack"
- •Sales: No competitor news, pipeline healthy
Phase 3: Test Hypotheses (20 min)
| Hypothesis | Enterprise | Consumer | US/EU | Usage Same | Downloads Only |
|---|---|---|---|---|---|
| Bug in flow | ✗ Both | ✗ Both | ✗ All | ? | ✓ |
| Promo ended | ✓ | ✗ | ✓ | ✓ | ✓ |
| Competitor | ✗ Both | ✗ Both | ✗ All | ? | ✓ |
Pattern match: "Promotion ended" fits all observations
Confirmation: Marketing confirms promotion ended last weekend
Phase 4: North Star Impact (5 min)
Root cause: End of promotional campaign
- •Downloads during promo = artificially inflated baseline
- •Current downloads = return to normal baseline
North Star impact:
- •MAU: No impact (existing users unaffected)
- •Revenue: No impact (existing customers retained)
- •Severity: Low (false alarm, not actual problem)
Phase 5: Recommended Actions (5 min)
Primary action: No action needed
- •Not an actual decline, just return to baseline
- •Adjust baselines for future comparisons
Secondary actions:
- •Update dashboards to flag promotional periods
- •Establish "normal" baseline for future monitoring
- •Document for future reference (prevent false alarms)
Time to diagnosis: 60 minutes
Related Skills
This workflow orchestrates these skills:
- •root-cause-diagnosis (Phases 1-3) - Core investigation method
- •north-star-alignment (Phase 4) - Assess strategic impact
Related Workflows
- •tradeoff-decision: May use diagnosis to understand why metrics changed before deciding
- •metrics-definition: Uses similar systematic approach for defining vs. diagnosing
Time Estimate
Total: 60-90 minutes
- •Phase 0 (Data quality): 5 min
- •Phase 1 (Narrow scope): 15-20 min
- •Phase 2 (Hypotheses): 15-20 min
- •Phase 3 (Test): 20-30 min
- •Phase 4 (Impact): 10 min
- •Phase 5 (Actions): 10 min
For complex issues, may require multiple iterations or longer investigation time.