CARF Causal Analysis Skill

Purpose

Perform causal inference using the DoWhy/EconML ecosystem. Discovers causal structure, estimates treatment effects, and validates with refutation tests.

When to Use

•Queries routed to "Complicated" domain
•Estimating causal effects from observational data
•Testing causal hypotheses with refutation
•Building causal DAGs for analysis

Causal Inference Workflow

mermaid

graph LR
    A[Query] --> B[Discover Structure]
    B --> C[Build DAG]
    C --> D[Estimate Effect]
    D --> E[Refutation Tests]
    E --> F[Interpretation]

Execution Steps

1. Prepare Data

Data can be provided as:

•List of dicts: [{"treatment": 1, "outcome": 10, ...}]
•Column format: {"treatment": [1,0,1], "outcome": [10,8,12]}
•Dataset reference: {"dataset_id": "registered_dataset_id"}

2. Define Causal Hypothesis

python

from src.services.causal import CausalHypothesis

hypothesis = CausalHypothesis(
    treatment="discount_applied",
    outcome="customer_churned",
    mechanism="Discount reduces churn risk",
    confounders=["tenure", "monthly_charges", "contract_type"]
)

3. Run Analysis via API

bash

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the causal effect of discount on churn?",
    "causal_estimation": {
      "data": [
        {"discount": 1, "churned": 0, "tenure": 12},
        {"discount": 0, "churned": 1, "tenure": 6}
      ],
      "treatment": "discount",
      "outcome": "churned",
      "covariates": ["tenure"]
    }
  }'

4. Run Analysis via Python

python

from src.services.causal import CausalInferenceEngine

engine = CausalInferenceEngine()

# Discover causal structure
hypothesis, graph = await engine.discover_causal_structure(
    query="What causes customer churn?",
    context={"domain": "telecom"}
)

# Estimate effect
result = await engine.estimate_effect(
    hypothesis=hypothesis,
    graph=graph,
    context={
        "causal_estimation": {
            "data": df.to_dict('records'),
            "treatment": "discount",
            "outcome": "churned"
        }
    }
)

print(f"Effect: {result.effect_estimate}")
print(f"CI: {result.confidence_interval}")
print(f"Passed Refutation: {result.passed_refutation}")

Causal Result Schema

json

{
  "causal_result": {
    "effect": 0.15,
    "unit": "percentage points",
    "p_value": 0.023,
    "ci_low": 0.08,
    "ci_high": 0.22,
    "description": "Discount reduces churn by ~15 percentage points",
    "refutations_passed": 3,
    "refutations_total": 3,
    "confounders_controlled": 2,
    "confounders_total": 3,
    "treatment": "discount",
    "outcome": "churned"
  }
}

Refutation Tests

The engine runs these refutation tests:

Test	Purpose	Pass Criteria
Random Common Cause	Add random confounder	Effect stable
Placebo Treatment	Randomize treatment	Effect → 0
Data Subset	Random 80% subset	Effect stable

DoWhy Estimation Methods

Default: backdoor.linear_regression

Available methods:

•backdoor.linear_regression
•backdoor.propensity_score_matching
•backdoor.propensity_score_weighting
•iv.instrumental_variable

Configure via:

json

{
  "causal_estimation": {
    "method_name": "backdoor.propensity_score_matching",
    ...
  }
}

Neo4j Persistence (Optional)

If Neo4j is configured, causal graphs are persisted:

python

from src.services.neo4j_service import get_neo4j_service
from src.services.causal import CausalInferenceEngine

neo4j = get_neo4j_service()
engine = CausalInferenceEngine(neo4j_service=neo4j)

# Analysis results are automatically persisted
# Query historical analyses:
history = await engine.find_historical_analyses(
    treatment="discount",
    outcome="churned",
    limit=5
)

Troubleshooting

"No causal effect found"

•Check data has variance in treatment/outcome
•Verify confounder selection is correct
•May indicate no causal relationship exists

Refutation Test Failed

•Causal claim may be spurious
•Check for missing confounders
•Consider different causal model

DoWhy Import Error

bash

pip install "carf[causal]"
# or
pip install dowhy econml causal-learn

Best Practices

•Always specify confounders - Omitted variable bias is common
•Check refutation tests - Don't trust unrefuted estimates
•Use domain knowledge - LLM-assisted discovery is a starting point
•Validate with stakeholders - Causal claims require human review