change-risk-rollback

每当有变更拟用于发布或部署时，运用此技能评估运营风险，识别潜在故障模式，并确保回滚路径清晰且切实可行。

SKILL.md

--- frontmatter

name: change-risk-rollback
description: >
  Use this skill whenever a change is intended for release or deployment to
  assess operational risk, identify failure modes, and ensure rollback paths are
  explicit and realistic.

Change Risk & Rollback Assessment

Intent

Assess the operational risk of changes and ensure rollback options are explicit, tested where possible, and aligned to traceability practices.

When to Use

•Before releasing or deploying significant changes
•When touching critical paths, data handling, auth, or infrastructure
•When introducing migrations or behavioural changes with user impact

Precondition Failure Signal

•No clear failure modes or rollback strategy exists
•Rollback depends on guesswork or mutable deployments
•Risk is described qualitatively without concrete triggers/mitigations
•Deployment plan lacks version/tag traceability

Postcondition Success Signal

•Key risks and failure modes are identified and bounded
•Rollback options are explicit and traceable to versions/tags
•Mitigations are aligned to quality gates and environment traceability
•Decision to proceed is reviewable and justified

Process

•Source Review: Inspect the proposed changes, deployment environment, and current telemetry/monitoring coverage.
•Implementation: Identify potential failure modes and define the corresponding rollback or remediation steps.
•Verification: Conduct a walkthrough or simulation of the rollback plan to ensure it is executable and effective.
•Documentation: Record the risk assessment and rollback plan in the change request or as an ADR for high-risk architectural changes.
•Review: Release Manager/SRE and Tech Lead must validate the feasibility and completeness of the rollback strategy.

Example Test / Validation

•For a change, identify at least one failure mode and define a rollback path
•Verify rollback candidate versions/tags can be identified from traceability records
•Confirm mitigations are in place for highest-likelihood/highest-impact failures

Common Red Flags / Guardrail Violations

•“We can just hotfix it”
•Relying on manual steps without documentation
•Deploying without tags, making rollback ambiguous
•Treating rollback as “restore from backup” without feasibility validation

Recommended Review Personas

•Tech Lead – validates risk framing and scope discipline
•Platform/DevOps Engineer – validates deployment/rollback feasibility
•Domain Expert – validates business/functional failure impacts
•Security Reviewer – validates integrity risks where applicable

Skill Priority

P0 – Safety & Integrity

Conflict Resolution Rules

•Overrides delivery pressure when rollback/risk is unclear
•Requires traceability (versions/tags); no rollback plan without provenance
•If risk cannot be bounded, escalate and defer release

Conceptual Dependencies

•incremental-change-impact
•release-tagging-plan
•environment-traceability

Classification

Core
Governance

Notes

Rollback is not a plan if it cannot be executed from recorded state.