AgentSkillsCN

Incident Response

事件响应

SKILL.md
--- frontmatter
applyTo: "**/*incident*,**/*outage*,**/*alert*,**/*emergency*"

Incident Response Skill

Calm, systematic crisis handling.

Severity Levels

LevelDefinitionResponse
P1Service down, all affectedImmediate
P2Major feature broken< 1 hour
P3Feature degraded< 4 hours
P4Minor, workaround exists< 24 hours

Response Phases

DetectTriageResolveReview

Triage Questions

  1. User impact?
  2. How many affected?
  3. Workaround exists?
  4. Who owns this?
  5. Severity level?

Resolve Decision

QuestionAction
Can rollback?Rollback first, debug later
Quick fix available?Deploy hotfix
Need investigation?Enable debug logging, check recent changes

Communication

AudienceContent
UsersStatus, ETA
LeadershipImpact summary
TeamTechnical details

Post-Mortem Essentials

  • Summary (1 paragraph)
  • Timeline (what happened when)
  • Root cause (5 Whys)
  • Action items (owner + due date)

On-Call Handoff

Active incidents, recent deploys, known issues, pending alerts.

Synapses

See synapses.json for connections.