Fault Tree Analysis (FTA)
Conduct systematic Fault Tree Analysis using a structured, Q&A-based approach with Boolean logic gates, minimal cut set identification, and optional probability calculations.
Overview
Fault Tree Analysis is a top-down, deductive failure analysis method that maps how combinations of lower-level events (basic events) lead to an undesired system-level event (top event). Uses Boolean logic gates (AND, OR) to represent relationships between events.
Key Principle: One fault tree analyzes one specific undesired event. Start at the top (what failed?) and work down (what caused it?).
Analysis Types:
- •Qualitative: Identify failure pathways, minimal cut sets, single points of failure
- •Quantitative: Calculate failure probabilities using component failure data
Workflow
Phase 1: System Definition & Scope
Collect from user:
- •What system or process is being analyzed?
- •What are the system boundaries (what's in scope vs. out of scope)?
- •What are the operating conditions and assumptions?
- •What documentation exists (schematics, P&IDs, operating procedures)?
- •What is the purpose of this analysis (design review, incident investigation, safety case)?
Outputs:
- •System description with boundaries
- •Operating mode(s) under analysis
- •List of assumptions and exclusions
Phase 2: Top Event Definition
Collect from user:
- •What is the single undesired outcome to analyze?
- •How is this event defined (what state constitutes "failure")?
- •What is the severity/criticality of this event?
- •What is the mission time or exposure period?
Quality Gate - Top Event Must Be:
- •Single, specific, unambiguous event
- •Clearly defined failure state (not vague)
- •At appropriate system level (not too high or too low)
- •Observable or detectable
Good Example: "Pump fails to deliver required flow rate (>100 GPM) during normal operation" Poor Example: "System doesn't work" (too vague)
Phase 3: Fault Tree Construction
Build the tree iteratively from top to bottom:
For each event (starting with top event):
- •Identify immediate causes: "What events could directly cause this?"
- •Determine gate type:
- •OR gate: ANY one cause is sufficient (independent causes)
- •AND gate: ALL causes required simultaneously (redundancy/barriers)
- •Classify event type:
- •Intermediate event (rectangle): Requires further development
- •Basic event (circle): Component failure, terminal point
- •Undeveloped event (diamond): Insufficient data or out of scope
- •House event (house symbol): Normal occurrence, switch on/off
- •External event (house): Environmental or expected condition
- •Continue developing until all branches terminate in basic/undeveloped events
Stopping Criteria for Branch Development:
- •Component-level failure reached (basic event)
- •Out of scope (undeveloped event)
- •Normal expected condition (house event)
- •Insufficient information available
Critical Rules:
- •Each event must have clear, unambiguous description
- •No redundant events (same failure in multiple places)
- •No "miracles" (events that cannot physically occur)
- •Consistent naming conventions throughout
Phase 4: Qualitative Analysis
Identify Minimal Cut Sets (MCS): Minimal cut sets are the smallest combinations of basic events that cause the top event.
- •Order 1 MCS (single events): Most critical - single points of failure
- •Order 2 MCS (pairs): Critical for redundant systems
- •Higher order MCS: Less critical, require multiple failures
Analysis Tasks:
- •List all minimal cut sets by order
- •Identify single points of failure (Order 1)
- •Assess common cause failure potential
- •Evaluate effectiveness of redundancy
Run python scripts/calculate_fta.py --qualitative for automated MCS extraction.
Phase 5: Quantitative Analysis (Optional)
If failure probability data is available:
Collect failure data for each basic event:
- •Failure rate (λ) or probability (P)
- •Mission time or exposure period
- •Data source (field data, handbook, estimate)
- •Confidence level
Calculations:
- •OR gate: P(output) ≈ P(A) + P(B) - P(A)×P(B) ≈ P(A) + P(B) for small probabilities
- •AND gate: P(output) = P(A) × P(B) (for independent events)
Calculate:
- •Probability of each minimal cut set
- •Top event probability (sum of MCS probabilities with adjustments for overlapping events)
- •Importance measures (Fussell-Vesely, Birnbaum)
Run python scripts/calculate_fta.py --quantitative with probability data.
Phase 6: Common Cause Failure Analysis
Identify potential common causes across basic events:
- •Environmental (temperature, humidity, EMI)
- •Manufacturing (batch defects, supplier issues)
- •Maintenance (common procedures, same personnel)
- •Design (same components, shared software)
- •Human error (operator mistakes, procedure gaps)
For AND gates (redundant systems): Common cause failures can defeat redundancy. Apply beta-factor model if quantifying:
- •P(CCF) = β × P(independent failure)
- •Typical β values: 1-10% depending on diversity measures
Phase 7: Documentation & Reporting
Generate professional outputs:
- •
python scripts/generate_diagram.py- SVG fault tree diagram - •
python scripts/generate_report.py- Comprehensive HTML report
Symbols Reference
| Symbol | Name | Description |
|---|---|---|
| Rectangle | Intermediate Event | Fault resulting from combination of inputs; requires gate |
| Circle | Basic Event | Component failure; terminal event with probability data |
| Diamond | Undeveloped Event | Not further developed (out of scope or insufficient data) |
| House | House Event | Expected occurrence; can be set TRUE/FALSE |
| Flat OR gate | OR Gate | Output if ANY input occurs |
| Flat AND gate | AND Gate | Output if ALL inputs occur |
| Triangle | Transfer | Connects to another tree section |
Quality Scoring
Each analysis scored on six dimensions (see references/quality-rubric.md):
| Dimension | Weight | Description |
|---|---|---|
| System Definition | 15% | Clear boundaries, assumptions, operating conditions |
| Top Event Clarity | 15% | Specific, unambiguous, appropriate level |
| Tree Completeness | 25% | All pathways developed, no gaps, consistent logic |
| Minimal Cut Sets | 20% | Correctly identified, analyzed for SPOFs |
| Quantification | 15% | Accurate calculations, appropriate data sources |
| Actionability | 10% | Identifies design improvements, risk mitigations |
Scoring Scale: Each dimension rated 1-5 (Inadequate to Excellent) Overall Score: Weighted average × 20 = 0-100 points Passing Threshold: 70 points minimum
Run python scripts/score_analysis.py to calculate scores.
Common Pitfalls
See references/common-pitfalls.md for:
- •Incorrect gate selection (AND vs OR confusion)
- •Top event too vague or at wrong level
- •Missing common cause failures
- •Incomplete branch development
- •Ignoring human factors
- •Double-counting events
Examples
See references/examples.md for worked examples:
- •Pump system failure
- •Control system loss of function
- •Safety interlock bypass
- •Manufacturing equipment hazard
Integration with Other Tools
- •FMEA/FMECA: Bottom-up complements top-down FTA; use FMEA to identify basic events
- •5 Whys: Use for detailed investigation of specific failure pathways
- •Fishbone Diagram: Brainstorm potential causes before structuring in FTA
- •Reliability Block Diagram: Alternative view of system reliability
- •Event Tree Analysis: Use FTA for initiating event probabilities
When to Use FTA
Good candidates:
- •Safety-critical system design review
- •Accident/incident investigation
- •Regulatory compliance demonstration
- •Redundancy effectiveness evaluation
- •System failure probability estimation
Consider alternatives when:
- •Need to catalog ALL failure modes (use FMEA)
- •Analyzing success paths (use Success Tree/RBD)
- •Time-sequential dependencies critical (use Event Tree)