Intent
Use for production incidents or outages.
Steps
- •Identify scope, impact, and current environment.
- •Gather logs/metrics safely (Prometheus/Grafana, app logs).
- •Propose minimal fix or mitigation; avoid risky refactors.
- •Provide rollback plan and post-incident follow-ups.
Safety
- •No destructive commands without explicit approval.
- •Preserve evidence for postmortem.