Fault Tree Analysis (FTA)

Conduct systematic Fault Tree Analysis using a structured, Q&A-based approach with Boolean logic gates, minimal cut set identification, and optional probability calculations.

Overview

Fault Tree Analysis is a top-down, deductive failure analysis method that maps how combinations of lower-level events (basic events) lead to an undesired system-level event (top event). Uses Boolean logic gates (AND, OR) to represent relationships between events.

Key Principle: One fault tree analyzes one specific undesired event. Start at the top (what failed?) and work down (what caused it?).

Analysis Types:

•Qualitative: Identify failure pathways, minimal cut sets, single points of failure
•Quantitative: Calculate failure probabilities using component failure data

Workflow

Phase 1: System Definition & Scope

Collect from user:

•What system or process is being analyzed?
•What are the system boundaries (what's in scope vs. out of scope)?
•What are the operating conditions and assumptions?
•What documentation exists (schematics, P&IDs, operating procedures)?
•What is the purpose of this analysis (design review, incident investigation, safety case)?

Outputs:

•System description with boundaries
•Operating mode(s) under analysis
•List of assumptions and exclusions

Phase 2: Top Event Definition

Collect from user:

•What is the single undesired outcome to analyze?
•How is this event defined (what state constitutes "failure")?
•What is the severity/criticality of this event?
•What is the mission time or exposure period?

Quality Gate - Top Event Must Be:

•Single, specific, unambiguous event
•Clearly defined failure state (not vague)
•At appropriate system level (not too high or too low)
•Observable or detectable

Good Example: "Pump fails to deliver required flow rate (>100 GPM) during normal operation" Poor Example: "System doesn't work" (too vague)

Phase 3: Fault Tree Construction

Build the tree iteratively from top to bottom:

For each event (starting with top event):

•Identify immediate causes: "What events could directly cause this?"
•
Determine gate type:
- •OR gate: ANY one cause is sufficient (independent causes)
- •AND gate: ALL causes required simultaneously (redundancy/barriers)
•
Classify event type:
- •Intermediate event (rectangle): Requires further development
- •Basic event (circle): Component failure, terminal point
- •Undeveloped event (diamond): Insufficient data or out of scope
- •House event (house symbol): Normal occurrence, switch on/off
- •External event (house): Environmental or expected condition
•Continue developing until all branches terminate in basic/undeveloped events

Stopping Criteria for Branch Development:

•Component-level failure reached (basic event)
•Out of scope (undeveloped event)
•Normal expected condition (house event)
•Insufficient information available

Critical Rules:

•Each event must have clear, unambiguous description
•No redundant events (same failure in multiple places)
•No "miracles" (events that cannot physically occur)
•Consistent naming conventions throughout

Phase 4: Qualitative Analysis

Identify Minimal Cut Sets (MCS): Minimal cut sets are the smallest combinations of basic events that cause the top event.

•Order 1 MCS (single events): Most critical - single points of failure
•Order 2 MCS (pairs): Critical for redundant systems
•Higher order MCS: Less critical, require multiple failures

Analysis Tasks:

•List all minimal cut sets by order
•Identify single points of failure (Order 1)
•Assess common cause failure potential
•Evaluate effectiveness of redundancy

Run python scripts/calculate_fta.py --qualitative for automated MCS extraction.

Phase 5: Quantitative Analysis (Optional)

If failure probability data is available:

Collect failure data for each basic event:

•Failure rate (λ) or probability (P)
•Mission time or exposure period
•Data source (field data, handbook, estimate)
•Confidence level

Calculations:

•OR gate: P(output) ≈ P(A) + P(B) - P(A)×P(B) ≈ P(A) + P(B) for small probabilities
•AND gate: P(output) = P(A) × P(B) (for independent events)

Calculate:

•Probability of each minimal cut set
•Top event probability (sum of MCS probabilities with adjustments for overlapping events)
•Importance measures (Fussell-Vesely, Birnbaum)

Run python scripts/calculate_fta.py --quantitative with probability data.

Phase 6: Common Cause Failure Analysis

Identify potential common causes across basic events:

•Environmental (temperature, humidity, EMI)
•Manufacturing (batch defects, supplier issues)
•Maintenance (common procedures, same personnel)
•Design (same components, shared software)
•Human error (operator mistakes, procedure gaps)

For AND gates (redundant systems): Common cause failures can defeat redundancy. Apply beta-factor model if quantifying:

•P(CCF) = β × P(independent failure)
•Typical β values: 1-10% depending on diversity measures

Phase 7: Documentation & Reporting

Generate professional outputs:

•python scripts/generate_diagram.py - SVG fault tree diagram
•python scripts/generate_report.py - Comprehensive HTML report

Symbols Reference

Symbol	Name	Description
Rectangle	Intermediate Event	Fault resulting from combination of inputs; requires gate
Circle	Basic Event	Component failure; terminal event with probability data
Diamond	Undeveloped Event	Not further developed (out of scope or insufficient data)
House	House Event	Expected occurrence; can be set TRUE/FALSE
Flat OR gate	OR Gate	Output if ANY input occurs
Flat AND gate	AND Gate	Output if ALL inputs occur
Triangle	Transfer	Connects to another tree section

Quality Scoring

Each analysis scored on six dimensions (see references/quality-rubric.md):

Dimension	Weight	Description
System Definition	15%	Clear boundaries, assumptions, operating conditions
Top Event Clarity	15%	Specific, unambiguous, appropriate level
Tree Completeness	25%	All pathways developed, no gaps, consistent logic
Minimal Cut Sets	20%	Correctly identified, analyzed for SPOFs
Quantification	15%	Accurate calculations, appropriate data sources
Actionability	10%	Identifies design improvements, risk mitigations

Scoring Scale: Each dimension rated 1-5 (Inadequate to Excellent) Overall Score: Weighted average × 20 = 0-100 points Passing Threshold: 70 points minimum

Run python scripts/score_analysis.py to calculate scores.

Common Pitfalls

See references/common-pitfalls.md for:

•Incorrect gate selection (AND vs OR confusion)
•Top event too vague or at wrong level
•Missing common cause failures
•Incomplete branch development
•Ignoring human factors
•Double-counting events

Examples

See references/examples.md for worked examples:

•Pump system failure
•Control system loss of function
•Safety interlock bypass
•Manufacturing equipment hazard

Integration with Other Tools

•FMEA/FMECA: Bottom-up complements top-down FTA; use FMEA to identify basic events
•5 Whys: Use for detailed investigation of specific failure pathways
•Fishbone Diagram: Brainstorm potential causes before structuring in FTA
•Reliability Block Diagram: Alternative view of system reliability
•Event Tree Analysis: Use FTA for initiating event probabilities

When to Use FTA

Good candidates:

•Safety-critical system design review
•Accident/incident investigation
•Regulatory compliance demonstration
•Redundancy effectiveness evaluation
•System failure probability estimation

Consider alternatives when:

•Need to catalog ALL failure modes (use FMEA)
•Analyzing success paths (use Success Tree/RBD)
•Time-sequential dependencies critical (use Event Tree)