Stakeholder Communication During Incidents
Overview
Effective communication during incidents is critical for maintaining trust, managing expectations, and coordinating response efforts. Poor communication can turn a technical incident into a PR crisis.
Core Principle: "Communicate early, often, and clearly. Silence creates anxiety."
1. Who Are Stakeholders During Incidents
Stakeholder Categories
code
External Stakeholders: ✓ End users (customers) ✓ Enterprise clients ✓ Business partners / integrations ✓ Media / public (for major incidents) ✓ Regulators (for compliance incidents) Internal Stakeholders: ✓ Engineering teams ✓ Customer support ✓ Sales team ✓ Product management ✓ Executive leadership (C-suite) ✓ Legal / compliance ✓ PR / communications team
Stakeholder Needs by Type
typescript
interface StakeholderNeeds {
group: string;
primaryConcern: string;
updateFrequency: string;
detailLevel: string;
channel: string[];
}
const stakeholderNeeds: StakeholderNeeds[] = [
{
group: 'End Users',
primaryConcern: 'When will service be restored?',
updateFrequency: 'Every 30-60 minutes',
detailLevel: 'High-level, non-technical',
channel: ['Status page', 'Email', 'In-app banner']
},
{
group: 'Enterprise Clients',
primaryConcern: 'Business impact, SLA credits',
updateFrequency: 'Every 15-30 minutes',
detailLevel: 'Detailed, with business impact',
channel: ['Direct email', 'Phone call', 'Dedicated Slack']
},
{
group: 'Engineering Teams',
primaryConcern: 'Technical details, how to help',
updateFrequency: 'Real-time',
detailLevel: 'Highly technical',
channel: ['Slack incident channel', 'War room']
},
{
group: 'Executives',
primaryConcern: 'Business impact, PR risk, resolution ETA',
updateFrequency: 'Every 30 minutes',
detailLevel: 'Executive summary + key details',
channel: ['Slack DM', 'Email', 'Phone (SEV0)']
},
{
group: 'Customer Support',
primaryConcern: 'What to tell customers, workarounds',
updateFrequency: 'Every 15 minutes',
detailLevel: 'Customer-facing talking points',
channel: ['Slack support channel', 'Email']
}
];
2. Communication Channels
2.1 Status Page
code
Purpose: Public-facing incident updates Tools: - Statuspage.io (Atlassian) - Status.io - Instatus - Custom status page Best Practices: ✓ Update within 15 minutes of SEV0/1 ✓ Use clear, non-technical language ✓ Provide ETA when known (or "investigating") ✓ Update every 30-60 minutes ✓ Mark as resolved only when fully stable
Status Page Example:
markdown
## Major Outage - API Service **Status**: Investigating **Started**: Jan 15, 2024 10:13 UTC **Last Update**: Jan 15, 2024 10:45 UTC We are currently experiencing a major outage affecting our API service. Users are unable to access the application. Our team is actively investigating the issue and working on a resolution. **Impact**: All users **Affected Services**: Web App, Mobile App, Public API **Next Update**: 11:15 UTC (30 minutes)
2.2 Email Notifications
code
Purpose: Direct communication to affected users When to Use: - SEV0/1 incidents affecting all users - Extended outages (> 1 hour) - Post-incident summary - Enterprise customer notifications Segments: - All users - Affected users only - Enterprise customers - Free tier users
Email Template:
html
Subject: [Resolved] Service Disruption - January 15, 2024 Dear Customer, We experienced a service disruption today from 10:13 UTC to 11:00 UTC (47 minutes) that prevented access to our application. What Happened: A database connection issue caused our API service to become unavailable. Impact: - Duration: 47 minutes - Affected: All users - Services: Web app, mobile app, API Resolution: Our team identified and resolved the issue by rolling back a recent deployment. Service has been fully restored and is operating normally. What We're Doing: We're conducting a thorough investigation to prevent this from happening again. We'll publish a detailed postmortem within 48 hours. We sincerely apologize for the disruption and appreciate your patience. If you have any questions, please contact support@example.com. Best regards, The Example Team
2.3 In-App Messages
typescript
// In-app banner for active incidents
interface IncidentBanner {
severity: string;
message: string;
link: string;
dismissible: boolean;
}
const banner: IncidentBanner = {
severity: 'error',
message: 'We are experiencing technical difficulties. Some features may be unavailable.',
link: 'https://status.example.com',
dismissible: false
};
// Display banner
function showIncidentBanner(banner: IncidentBanner) {
const bannerEl = document.createElement('div');
bannerEl.className = `banner banner-${banner.severity}`;
bannerEl.innerHTML = `
<span>${banner.message}</span>
<a href="${banner.link}" target="_blank">View Status</a>
${banner.dismissible ? '<button class="close">×</button>' : ''}
`;
document.body.prepend(bannerEl);
}
2.4 Social Media
code
Platforms: - Twitter/X - LinkedIn - Facebook - Reddit (if community exists) When to Use: - SEV0 incidents - High user visibility - Proactive communication - Respond to user complaints Best Practices: ✓ Acknowledge issue quickly ✓ Link to status page ✓ Update as situation evolves ✓ Thank users for patience
Twitter Example:
code
🔴 We're aware of an issue preventing access to our service. Our team is investigating and working on a fix. We'll provide updates here and on our status page: https://status.example.com Updates: 10:45 UTC - Issue identified, implementing fix 11:00 UTC - Service restored, monitoring for stability 11:30 UTC - All systems operational Thank you for your patience. We're sorry for the disruption.
2.5 Internal Slack/Teams
code
Channels: - #incidents (all incidents) - #inc-YYYY-NNN (specific incident channel) - #customer-support (support team updates) - #executive-alerts (SEV0 only) Purpose: - Real-time coordination - Technical discussion - Status updates - Action item tracking
2.6 Direct Outreach (Enterprise Customers)
code
When to Use: - Enterprise customers affected - SLA breach - Revenue-critical customers - Contractual obligations Method: - Dedicated Slack channel - Direct phone call - Account manager email - Executive-to-executive (for major incidents) Frequency: - SEV0: Every 15 minutes - SEV1: Every 30 minutes - Proactive (don't wait for them to ask)
3. Communication Timing
3.1 Initial Notification
code
Timing by Severity: SEV0: - Status page: Within 5 minutes - Internal Slack: Immediate - Executive notification: Within 5 minutes - Enterprise customers: Within 15 minutes SEV1: - Status page: Within 15 minutes - Internal Slack: Within 5 minutes - Executive notification: Within 30 minutes - Enterprise customers: Within 30 minutes SEV2: - Status page: Optional (if customer-facing) - Internal Slack: Within 15 minutes - Executive notification: If prolonged - Enterprise customers: If affected SEV3/4: - No external communication - Internal ticket only
3.2 Regular Updates
code
Update Frequency: SEV0: - Every 15-30 minutes - Even if no progress: "Still investigating, next update in 15 min" SEV1: - Every 30-60 minutes - Include progress updates SEV2: - Every 1-2 hours - Or when significant progress Update Content: - Current status - What we've learned - What we're doing - Next update time - ETA (if known)
3.3 Resolution Notification
code
When to Declare Resolved: ✓ Issue completely fixed ✓ Monitoring shows stable state (15-30 minutes) ✓ Error rates back to normal ✓ No user reports Don't declare resolved if: ✗ Still monitoring ✗ Intermittent issues ✗ Partial mitigation only
Resolution Message:
markdown
## Resolved - API Service Outage **Status**: Resolved **Duration**: 47 minutes (10:13 UTC - 11:00 UTC) The issue has been fully resolved. All services are operating normally. **Root Cause**: Database connection pool exhaustion due to a bug in recent deployment. **Resolution**: Rolled back to previous version. Service fully restored at 11:00 UTC. **Next Steps**: We're conducting a full investigation and will publish a detailed postmortem within 48 hours. Thank you for your patience.
3.4 Postmortem Sharing
code
Timeline: - Internal postmortem: Within 24-48 hours - Public postmortem: Within 1 week (for major incidents) Share With: - Internal: All engineering, product, support - External: Public blog post (optional) - Enterprise customers: Direct email with detailed postmortem Content: - What happened - Root cause - Impact - Timeline - What we're doing to prevent recurrence
4. Message Structure
The 5 W's Framework
code
What: What is broken? - Specific, clear description - Avoid jargon Who: Who is affected? - All users, specific region, enterprise customers - Percentage or number When: When did it start? - Timestamp in UTC - Duration so far Where: Where is the impact? - Services affected - Geographic regions Why: What are we doing? - Current actions - Next steps - ETA (if known)
Message Template
markdown
## Incident Update **What**: API service returning errors **Who**: All users (~50,000 active users) **When**: Started 10:13 UTC (32 minutes ago) **Where**: All regions, all services (web, mobile, API) **Why**: Database connection issue, team is investigating **Current Status**: Investigating root cause **Next Steps**: 1. Checking database health 2. Reviewing recent deployments 3. Preparing rollback if needed **ETA**: Unknown at this time **Next Update**: 11:15 UTC (30 minutes)
5. Communication Templates by Severity
SEV0 Template: Complete Outage
markdown
## INCIDENT: Complete Service Outage **Status**: Investigating **Severity**: SEV0 **Started**: 2024-01-15 10:13 UTC **Impact**: All users unable to access service We are experiencing a complete service outage. Our entire team is engaged and working on a resolution. **Affected Services**: - Web application - Mobile apps - Public API **What We're Doing**: - Investigating root cause - All hands on deck - War room established **Next Update**: 10:30 UTC (15 minutes) We sincerely apologize for this disruption and will provide frequent updates.
SEV1 Template: Major Functionality Broken
markdown
## INCIDENT: Login System Unavailable **Status**: Investigating **Severity**: SEV1 **Started**: 2024-01-15 14:00 UTC **Impact**: Users unable to login (existing sessions unaffected) We are experiencing issues with our login system. Users who are already logged in can continue using the service, but new logins are currently unavailable. **Affected**: ~30% of users (those not currently logged in) **What We're Doing**: - Investigating authentication service - Checking recent deployments - Preparing rollback if needed **Workaround**: If you're already logged in, you can continue using the service. **Next Update**: 14:30 UTC (30 minutes)
SEV2 Template: Degraded Performance
markdown
## INCIDENT: Slow Search Performance **Status**: Investigating **Severity**: SEV2 **Started**: 2024-01-15 09:00 UTC **Impact**: Search feature responding slowly We are experiencing degraded performance with our search feature. Search results may take 5-10 seconds to load instead of the usual 1 second. **Affected**: All users (degraded, not broken) **What We're Doing**: - Investigating search infrastructure - Scaling up resources - Optimizing queries **Workaround**: Search is still functional, just slower than normal. **Next Update**: 11:00 UTC (2 hours)
6. Tone and Language
Clear and Honest
code
❌ Vague: "We're experiencing some technical difficulties." ✓ Clear: "Our API service is returning errors, preventing users from logging in." ❌ Dishonest: "Everything is fine, just a minor hiccup." ✓ Honest: "We're experiencing a major outage affecting all users. We're working urgently on a fix."
Avoid Jargon
code
❌ Technical Jargon: "The PostgreSQL primary instance experienced connection pool exhaustion due to a resource leak in the ORM layer." ✓ Plain Language: "Our database ran out of available connections, preventing the application from accessing user data." ❌ Acronyms: "The K8s pod in the us-east-1 AZ is experiencing OOM errors." ✓ Clear: "Our application servers in the US East region are running out of memory."
Empathy for Affected Users
code
❌ Dismissive: "We had a small issue. It's fixed now." ✓ Empathetic: "We know how disruptive this outage was, and we sincerely apologize for the inconvenience." ❌ Blame Users: "If you had used the workaround we posted, you wouldn't have had issues." ✓ Take Responsibility: "We should have communicated the workaround more clearly. We're sorry for the confusion."
No Premature Root Cause Claims
code
❌ Premature: "The issue was caused by AWS." (Later: Actually it was our code) ✓ Cautious: "We're investigating the root cause and will share details once confirmed." ❌ Speculative: "We think it might be a database issue, or maybe network, or possibly..." ✓ Factual: "We've identified the issue is related to our database layer. Investigation ongoing."
7. Internal vs External Communication
Internal Communication (Engineering)
code
Audience: Engineers, technical teams Style: ✓ Highly technical ✓ Real-time updates ✓ Detailed logs, metrics, traces ✓ Hypotheses and debugging steps ✓ Raw, unfiltered Channel: Slack incident channel Example: "Error rate spiked to 95% at 10:13 UTC. Logs show:
[ERROR] Connection pool exhausted (50/50 connections in use) [ERROR] Timeout waiting for connection
code
Hypothesis: Connection leak in v2.5.0 (deployed at 10:00 UTC) Action: Rolling back to v2.4.9 ETA: 5 minutes"
External Communication (Customers)
code
Audience: End users, non-technical Style: ✓ Non-technical language ✓ Clear and concise ✓ Empathetic tone ✓ Focus on impact and resolution ✓ Polished and professional Channel: Status page, email Example: "We're experiencing an issue preventing access to our service. Our team is working on a fix and we expect to have service restored within 15 minutes. We apologize for the disruption."
Translation Example
code
Internal (Technical): "PostgreSQL connection pool exhausted. Max connections: 50. Current: 50. Long-running queries holding connections. Killing idle connections and restarting app servers." External (Customer-Facing): "We're experiencing database connection issues that are preventing access to the service. Our team is actively working on a fix."
8. Executive Communication (C-Suite Updates)
Executive Summary Format
markdown
## Executive Incident Summary **Incident**: API Service Outage **Severity**: SEV0 **Status**: Resolved **Duration**: 47 minutes ### Business Impact - Users affected: 50,000 (100%) - Revenue loss: ~$50,000 - SLA breach: Yes (99.9% uptime) - Customer complaints: 237 support tickets - Enterprise customers affected: 12 ### Root Cause Database connection pool exhausted due to bug in v2.5.0 deployment. ### Resolution Rolled back to v2.4.9. Service fully restored at 11:00 UTC. ### Customer Communication - Status page: Updated every 15 minutes - Email: Sent to all users - Enterprise customers: Notified directly - Social media: Posted updates on Twitter ### Next Steps 1. Postmortem scheduled for tomorrow 10:00 AM 2. Investigating connection leak in v2.5.0 3. Reviewing deployment process 4. Considering SLA credits for enterprise customers ### Recommendations - Approve SLA credits for affected enterprise customers (~$10k) - Public postmortem blog post (builds trust) - Additional investment in testing infrastructure
Executive Update Frequency
code
SEV0: - Initial: Immediate (within 5 minutes) - Updates: Every 30 minutes - Resolution: Immediate - Postmortem: Within 24 hours SEV1: - Initial: Within 30 minutes - Updates: Every 60 minutes (if prolonged) - Resolution: When resolved - Postmortem: Within 48 hours SEV2: - Initial: If prolonged (> 4 hours) - Updates: Daily - Resolution: When resolved - Postmortem: Optional
9. Communication Ownership (IC vs Comms Team)
Incident Commander (IC) Responsibilities
code
IC Owns: ✓ Technical incident response ✓ Internal technical communication ✓ Initial status page update ✓ Engineering team coordination IC Does NOT Own: ✗ Customer emails (unless small company) ✗ Social media posts ✗ PR statements ✗ Executive communication (beyond updates)
Communications Team Responsibilities
code
Comms Team Owns: ✓ Customer-facing messaging ✓ Social media posts ✓ Email campaigns ✓ PR statements ✓ Media inquiries ✓ Executive communication drafts Comms Team Does NOT Own: ✗ Technical details ✗ Root cause analysis ✗ Resolution timeline
Collaboration Model
code
1. IC provides technical updates ↓ 2. Comms team translates to customer-facing language ↓ 3. IC reviews for accuracy ↓ 4. Comms team publishes Example: IC: "Connection pool exhausted, rolling back deployment" Comms: "We're experiencing database issues and are implementing a fix" IC: ✓ Approved Comms: Publishes to status page
10. Status Page Best Practices
Status Page Structure
code
Components: - API - Web Application - Mobile App - Database - Authentication - Payments Status Levels: - Operational (green) - Degraded Performance (yellow) - Partial Outage (orange) - Major Outage (red) - Under Maintenance (blue)
Update Cadence
code
Initial Update: - Within 5-15 minutes of incident - Acknowledge the issue - State you're investigating Progress Updates: - Every 15-30 minutes (SEV0/1) - Every 1-2 hours (SEV2) - Include what you've learned - Provide ETA if known Resolution Update: - Mark as resolved - Explain what happened - Apologize - Link to postmortem (when available)
Status Page Examples
Good Example:
markdown
## Investigating - API Service **Jan 15, 10:15 UTC** We are investigating reports of errors when accessing our API service. Users may experience failed requests or timeouts. Our team is actively investigating. **Jan 15, 10:45 UTC** We have identified the issue as a database connection problem and are implementing a fix. We expect to have service restored within 15 minutes. **Jan 15, 11:00 UTC** The issue has been resolved. All services are operating normally. We apologize for the disruption and will publish a detailed postmortem within 48 hours.
Bad Example:
markdown
## Issue **Jan 15, 10:15 UTC** We're having some problems. **Jan 15, 11:30 UTC** It's fixed.
11. Post-Incident Communication
Resolution Announcement
markdown
## Service Restored - API Outage Resolved **Duration**: 47 minutes (10:13 UTC - 11:00 UTC) **Impact**: All users The issue affecting our API service has been fully resolved. All services are now operating normally. **What Happened**: A bug in our recent deployment caused our database connection pool to become exhausted, preventing the application from accessing data. **How We Fixed It**: We rolled back to the previous version of our application, which immediately resolved the issue. **What We're Doing Next**: - Conducting a thorough investigation - Implementing additional testing to prevent similar issues - Publishing a detailed postmortem within 48 hours We sincerely apologize for this disruption and appreciate your patience.
Postmortem Summary (Customer-Facing)
markdown
## Postmortem: API Service Outage - January 15, 2024 **Summary**: On January 15, 2024, our API service experienced a complete outage for 47 minutes due to a database connection leak in a recent deployment. **Impact**: - Duration: 47 minutes - Users affected: 50,000 (100%) - Services affected: Web app, mobile app, API **What Happened**: We deployed version 2.5.0 at 10:00 UTC, which included a new feature for order recommendations. This feature had a bug that caused database connections to not be properly released when errors occurred. Over 13 minutes, all 50 available connections were exhausted, causing the service to become unavailable. **How We Responded**: - 10:13 UTC: Issue detected via monitoring - 10:15 UTC: Team began investigation - 10:30 UTC: Root cause identified - 10:35 UTC: Decision made to rollback - 10:40 UTC: Rollback initiated - 11:00 UTC: Service fully restored **Root Cause**: The new code failed to release database connections in error scenarios, leading to connection pool exhaustion. **What We're Doing to Prevent This**: 1. Added connection pool monitoring and alerting 2. Implemented circuit breakers for downstream services 3. Enhanced load testing to include error scenarios 4. Updated code review checklist to catch resource leaks 5. Implementing canary deployments (5% → 50% → 100%) **Lessons Learned**: - Always test error paths, not just happy paths - Monitor resource usage (connection pools, memory, file handles) - Gradual rollouts need proper limits to prevent widespread impact We're committed to learning from this incident and improving our systems. Thank you for your patience and continued trust. Full technical postmortem: [link]
12. Communication Antipatterns
Antipattern 1: Radio Silence
code
❌ Bad: - Incident starts at 10:00 - No communication until 12:00 - Users panicking on social media ✓ Good: - Incident starts at 10:00 - Status page updated at 10:05 - Updates every 30 minutes - Users informed and patient
Antipattern 2: Over-Promising Resolution Time
code
❌ Bad: "We'll have this fixed in 10 minutes." (2 hours later, still broken) ✓ Good: "We're working on a fix. We'll provide an update in 30 minutes." (Under-promise, over-deliver)
Antipattern 3: Blaming Users
code
❌ Bad: "The issue only affects users who didn't follow our documentation." ✓ Good: "We should have made this clearer in our documentation. We're updating it now."
Antipattern 4: Technical Jargon Overload
code
❌ Bad: "The Kubernetes pod in the us-east-1 AZ experienced OOM errors due to a memory leak in the JVM heap, causing the pod to enter CrashLoopBackOff state." ✓ Good: "Our application servers ran out of memory and restarted. We're investigating the cause and scaling up resources."
Antipattern 5: Declaring Victory Too Early
code
❌ Bad: 10:30 UTC: "Issue resolved!" 10:45 UTC: "Issue has returned..." ✓ Good: 10:30 UTC: "Fix implemented, monitoring for stability" 11:00 UTC: "Confirmed stable for 30 minutes, marking as resolved"
13. Real Communication Examples (Good and Bad)
Good Example: GitLab (2017)
code
GitLab's response to database deletion incident: ✓ Immediate acknowledgement on Twitter ✓ Live-streamed recovery process on YouTube ✓ Transparent about what went wrong ✓ Detailed postmortem published ✓ Honest about backup failures ✓ Community appreciated transparency Result: Increased trust despite major incident
Bad Example: Equifax (2017)
code
Equifax's response to data breach: ❌ Delayed disclosure (6 weeks) ❌ Vague initial statement ❌ Confusing communication ❌ Executives sold stock before disclosure ❌ Poor customer support Result: Congressional hearings, massive fines, CEO resigned
Good Example: AWS S3 (2017)
code
AWS's response to S3 outage: ✓ Status page updated quickly ✓ Regular updates every 30 minutes ✓ Detailed postmortem published ✓ Explained root cause clearly ✓ Outlined prevention measures Result: Industry-standard postmortem, trust maintained
14. Tools: Statuspage, Incident.io, Slack Workflows
Statuspage.io
typescript
// Statuspage API integration
import { StatuspageAPI } from 'statuspage-api';
const statuspage = new StatuspageAPI(process.env.STATUSPAGE_API_KEY);
// Create incident
async function createIncident(incident: Incident) {
await statuspage.incidents.create({
name: incident.title,
status: 'investigating', // investigating, identified, monitoring, resolved
impact: 'major', // none, minor, major, critical
body: incident.description,
components: incident.affectedComponents,
component_ids: ['api', 'web-app']
});
}
// Update incident
async function updateIncident(incidentId: string, update: string) {
await statuspage.incidents.update(incidentId, {
status: 'identified',
body: update
});
}
// Resolve incident
async function resolveIncident(incidentId: string, resolution: string) {
await statuspage.incidents.update(incidentId, {
status: 'resolved',
body: resolution
});
}
Incident.io
typescript
// Incident.io API integration
import { IncidentIO } from 'incident-io';
const incidentio = new IncidentIO(process.env.INCIDENT_IO_API_KEY);
// Create incident
const incident = await incidentio.incidents.create({
title: 'API Service Outage',
severity: 'sev1',
status: 'investigating',
summary: 'API returning 503 errors for all requests'
});
// Post update
await incidentio.incidents.postUpdate(incident.id, {
message: 'Identified root cause: database connection pool exhausted. Rolling back deployment.',
status: 'identified'
});
// Resolve
await incidentio.incidents.resolve(incident.id, {
message: 'Service restored. Monitoring for stability.',
resolution: 'Rolled back to v2.4.9'
});
Slack Workflows
typescript
// Automated Slack notifications
async function notifyStakeholders(incident: Incident) {
// Engineering team
await slack.postMessage({
channel: '#incidents',
text: `🚨 ${incident.severity}: ${incident.title}`,
blocks: [
{
type: 'header',
text: { type: 'plain_text', text: `${incident.severity}: ${incident.title}` }
},
{
type: 'section',
fields: [
{ type: 'mrkdwn', text: `*Impact:* ${incident.impact}` },
{ type: 'mrkdwn', text: `*Status:* ${incident.status}` }
]
},
{
type: 'actions',
elements: [
{ type: 'button', text: { type: 'plain_text', text: 'Join War Room' }, url: incident.warRoomUrl },
{ type: 'button', text: { type: 'plain_text', text: 'View Dashboard' }, url: incident.dashboardUrl }
]
}
]
});
// Executive team (SEV0 only)
if (incident.severity === 'SEV0') {
await slack.postMessage({
channel: '#executive-alerts',
text: `🚨 SEV0 Incident: ${incident.title}\nImpact: ${incident.impact}\nWar Room: ${incident.warRoomUrl}`
});
}
// Customer support
await slack.postMessage({
channel: '#customer-support',
text: `📢 Customer Impact Alert\n\n*Issue:* ${incident.title}\n*Impact:* ${incident.impact}\n*Status:* ${incident.status}\n\n*What to tell customers:* ${incident.customerMessage}`
});
}
Summary
Key takeaways for Stakeholder Communication:
- •Communicate early - Within 5-15 minutes for SEV0/1
- •Update frequently - Every 15-30 minutes, even if no progress
- •Use appropriate channels - Status page, email, Slack, social media
- •Tailor messaging - Technical for engineers, plain language for customers
- •Be honest and transparent - Don't hide or minimize issues
- •Avoid jargon - Use clear, simple language
- •Show empathy - Acknowledge impact on users
- •Don't over-promise - Under-promise, over-deliver on ETAs
- •Declare resolved carefully - Only when truly stable
- •Follow up with postmortem - Share learnings and prevention measures
Related Skills
- •
41-incident-management/incident-triage- Initial assessment before communication - •
41-incident-management/severity-levels- Severity determines communication urgency - •
41-incident-management/escalation-paths- Who to notify and when - •
40-system-resilience/postmortem-analysis- Post-incident communication and learning