Terraform Drift Detection

Overview

Detect and categorize drift between Terraform-managed state and actual infrastructure. Drift indicates out-of-band changes that can cause problems during the next apply.

Announce at start: "I'm using the terraform-drift-detection skill to check for infrastructure drift."

The Process

Step 1: Verify Environment

bash

# Verify AWS credentials and account
aws sts get-caller-identity

# Confirm we're in the right directory/environment
pwd
ls -la *.tf 2>/dev/null | head -5

Step 2: Refresh State

bash

# Initialize if needed
terraform init

# Refresh state to detect drift
terraform plan -refresh-only -out=drift.out

# Convert to JSON for analysis
terraform show -json drift.out > drift.json

Step 3: Analyze Drift

Parse drift.json and categorize changes:

Drift Categories

Category	Severity	Examples
Security Drift	CRITICAL	Security groups, IAM, encryption
Configuration Drift	HIGH	Instance settings, networking
Tag Drift	LOW	Tags modified outside Terraform
Metadata Drift	INFO	AWS-managed fields that change

Step 4: Dispatch Analysis Agent

code

Task(drift-detector) → Categorize and assess drift impact

Agent should:

•Categorize each drifted resource
•Assess impact of accepting vs rejecting drift
•Identify potential causes (manual changes, AWS updates, etc.)

Step 5: Present Findings

markdown

## Drift Detection Report

### Summary
- Total drifted resources: X
- Critical drift: Y
- High drift: Z
- Low/Info drift: W

### Critical Drift (Requires Immediate Attention)
| Resource | Attribute | State Value | Actual Value |
|----------|-----------|-------------|--------------|
| ... | ... | ... | ... |

### Potential Causes
- Manual console changes: [list if detected]
- AWS service updates: [list if detected]
- Unknown origin: [list if detected]

### Recommended Actions
1. [Action for each drifted resource]

Step 6: Resolution Options

Present user with options:

•Accept Drift: Run terraform apply -refresh-only to update state to match actual
•Reject Drift: Run terraform apply to revert actual infrastructure to match code
•Investigate: Manual review needed before deciding
•Hybrid: Accept some drift, reject other drift

Never auto-resolve drift. Always get user approval.

Common Drift Sources

Intentional (Usually Accept)

•AWS auto-scaling adjustments
•Managed service updates
•Emergency manual fixes (document these!)

Unintentional (Usually Reject)

•Console click-ops mistakes
•Unapproved manual changes
•Conflicting automation

Systemic (Fix Root Cause)

•Multiple tools managing same resources
•Missing Terraform coverage
•CI/CD race conditions

Integration with Memory

Store detected drift patterns:

•Which resources commonly drift
•Common causes in this environment
•Resolution preferences

Query memory before analysis:

•Has this resource drifted before?
•What was the cause last time?
•What resolution was chosen?

Verification Checklist

Before presenting:

• Refresh completed successfully
• All drift categorized by severity
• Root causes identified where possible
• Resolution options are clear
• No sensitive values exposed in output