Strong Inference Skill

Apply the "Strong Inference" methodology to investigate problems systematically through competing hypotheses and decisive experiments.

Overview

Strong Inference is a scientific method that accelerates problem-solving by:

•Generating multiple competing hypotheses
•Designing experiments that eliminate hypotheses
•Iterating until the most likely explanation remains

This skill helps developers investigate bugs, performance issues, and unexpected behaviors using a structured, hypothesis-driven approach.

Key Feature: In tmux mode, this skill can optionally leverage Codex for hypothesis generation and review while Claude handles verification execution.

Prerequisites

•The user has a problem, bug, or unexpected behavior to investigate
•Relevant code context is available
•For Codex collaboration mode: tmux session with Codex CLI available

Workflow Phases

Phase 1: Problem Definition

When the user presents a problem:

•
Collect information:
- •Error messages, logs, stack traces
- •Steps to reproduce
- •Expected vs actual behavior
- •Recent changes that might be related
•
Clarify scope:
- •Which components are involved?
- •When did it start happening?
- •Is it reproducible consistently?

Phase 2: Hypothesis Generation

Generate 2-4 competing hypotheses that are:

•Mutually exclusive: If H1 is true, H2 cannot be true
•Testable: Can be verified or eliminated with evidence
•Specific: Clear enough to design a decisive test

Example hypotheses for "API returns 500 intermittently":

code

H1: Database connection pool exhausted under load
H2: Race condition in cache update causing stale data
H3: External service timeout not handled properly
H4: Memory leak causing OOM conditions

Phase 3: Verification Design

For each hypothesis, design a "killer experiment" that:

•Can eliminate the hypothesis if the result is negative
•Requires minimal effort for maximum information gain
•Is safe to execute (no production impact)

Prioritize experiments by:

•Ease of execution (quick wins first)
•Discriminating power (eliminates multiple hypotheses)
•Risk level (non-destructive first)

Phase 4: Verification Execution

Execute verifications in priority order:

•Code inspection (reading files, checking logic)
•Log analysis (searching for patterns)
•Test execution (running specific tests)
•Instrumentation (adding debug output)

Safety guards:

•Confirm before any file modifications
•Set timeouts for long-running operations
•Log all executed commands

Phase 5: Analysis and Iteration

After each verification:

•Record evidence: What was observed?
•
Update hypothesis status:
- •[X] Eliminated (evidence contradicts)
- •[?] Pending (not yet tested)
- •[!] Supported (evidence aligns)
•Refine remaining hypotheses based on new information
•Generate new hypotheses if all were eliminated

Phase 6: Conclusion

When one hypothesis has strong supporting evidence:

•Summarize findings: Evidence trail and reasoning
•Propose solution: Based on confirmed hypothesis
•Suggest prevention: How to avoid similar issues

Role Distribution

Mode	Hypothesis Gen	Verification Design	Execution	Review
`codex`	Codex	Claude	Claude	Codex
`claude-only`	Claude	Claude	Claude	Claude

•Default mode: codex (when in tmux with Codex available)
•Fallback: claude-only (automatic when Codex unavailable)

Hypothesis Tree File

Investigation state is persisted to tmp/strong-inference/<task-id>.md:

yaml

---
schema: strong-inference/v1
task_id: abc123
created: 2026-02-02T12:00:00Z
problem: "API returns 500 intermittently"
mode: codex
---

# Investigation: API returns 500 intermittently

## Hypotheses

### H1: Database connection pool exhausted
- Status: [X] Eliminated
- Evidence: Connection count stable at 5/20 during error window
- Verified: 2026-02-02T12:15:00Z

### H2: Race condition in cache update
- Status: [?] Pending
- Test: Add mutex logging to CacheManager.update()
- Priority: High (matches timing pattern)

### H3: External service timeout
- Status: [!] Supported
- Evidence: Errors correlate with ExternalAPI latency spikes
- Next: Verify timeout handling in ApiClient.fetch()

## Verification Log

| Time | Action | Result |
|------|--------|--------|
| 12:05 | Read db/pool.go | Found pool size config |
| 12:10 | Check connection metrics | Stable at 5/20 |
| 12:15 | Eliminated H1 | Evidence contradicts |

Safety Guards

Before executing verification commands:

•Confirm destructive operations: File changes, test execution
•Set timeout: Default 60 seconds per operation
•Log all commands: Record in verification log

Stop conditions:

•All hypotheses eliminated (request new hypotheses)
•max_iterations reached (default: 10)
•User requests stop

Output Format

Progress Display

code

Strong Inference Investigation
==============================
Problem: API returns 500 error intermittently

Hypotheses:
  [X] H1: Database connection pool exhausted
      Evidence: Connection count normal (eliminated)

  [!] H2: Race condition in cache update
      Evidence: Timing matches error pattern (supported)

  [?] H3: External service timeout
      Evidence: Pending verification

Current: Designing test for H2

Completion Report

code

Investigation Complete
======================
Problem: API returns 500 error intermittently

Root Cause: Race condition in CacheManager.update()
Confidence: High (3 supporting evidence points)

Evidence Trail:
1. Errors occur only during cache refresh window
2. Adding mutex eliminated the error
3. Race condition visible in thread dump

Recommended Fix:
- Add mutex lock in CacheManager.update() line 45
- Consider using sync.RWMutex for better concurrency

Prevention:
- Add race detector to CI pipeline
- Review other cache operations for similar patterns

Invoking the Skill

Use the /strong-inference command:

bash

# Basic usage - investigate a problem
/strong-inference API sometimes returns 500 errors

# With mode selection
/strong-inference --mode claude-only Why is the test flaky?

# Japanese
/strong-inference このバグの原因を調査して

References

Detailed templates in references/:

•hypothesis-template.md - Template for Codex hypothesis generation
•verification-patterns.md - Common verification strategies