SRE Runbook + Observability
Goal
Make the system operable: predictable diagnostics, safe rollbacks, and clear runtime failure behavior.
When to use
- •PRODUCTION+.
- •Any app with auth + persistence.
- •Integrations/jobs exist.
Minimal inputs (ask only if missing)
- •Hosting target.
- •Error reporting preference (or propose).
Procedure (MUST)
- •Define health checks.
- •Implement frontend error boundaries + API error conventions.
- •Define logging schema (no PII) + correlation IDs where possible.
- •Document triage + rollback steps.
- •Add release operations checklist.
Outputs (MUST produce)
- •
docs/runbook.md. - •Error boundaries + error conventions.
- •Logging schema notes.