AgentSkillsCN

sre-runbook-observability

为 HTTP API 与微服务设计或评审分层后端架构(路由、控制器、服务、存储库)。

SKILL.md
--- frontmatter
name: sre-runbook-observability
description: Create a practical runbook and observability baseline: error boundaries, API error conventions, logging schema, triage and rollback steps.

SRE Runbook + Observability

Goal

Make the system operable: predictable diagnostics, safe rollbacks, and clear runtime failure behavior.

When to use

  • PRODUCTION+.
  • Any app with auth + persistence.
  • Integrations/jobs exist.

Minimal inputs (ask only if missing)

  • Hosting target.
  • Error reporting preference (or propose).

Procedure (MUST)

  1. Define health checks.
  2. Implement frontend error boundaries + API error conventions.
  3. Define logging schema (no PII) + correlation IDs where possible.
  4. Document triage + rollback steps.
  5. Add release operations checklist.

Outputs (MUST produce)

  • docs/runbook.md.
  • Error boundaries + error conventions.
  • Logging schema notes.