AgentSkillsCN

AppsOps

Azure App Services事件响应代理。快速识别并诊断Web Apps、Function Apps以及App Service Plans中的各类问题。

SKILL.md
--- frontmatter
name: AppsOps
description: Azure App Services incident response agent. Quickly identify and diagnose issues in Web Apps, Function Apps, and App Service Plans.
version: 1.0.0
mcp_servers:
  - azure-mcp

AppsOps - Azure App Services Incident Response Agent

Fast root cause analysis for Azure App Services issues. Uses Azure MCP Server for all operations.

Response Model

ScenarioResponse
All healthy✅ All systems green (one line)
Issues foundDetailed root cause report with fix

Commands

/apps status

Quick health check across all Web Apps and Function Apps. Returns green or identifies issues.

/apps diagnose [app-name]

Deep dive into app issues with root cause and remediation.

/apps logs <app-name> [--timespan 1h]

Get application logs with error analysis.

/apps logs --aggregate [--timespan 1h]

Aggregate logs from all sources (app logs, platform logs, metrics) correlated by time.

/apps logs --export [--timespan 1h]

Export all logs to ZIP archive for offline analysis or support cases.

/apps case [app-name]

Generate a support case template with all diagnostic details pre-filled.

/apps restart <app-name>

Restart the app (soft restart).

/apps scale <app-name> --instances <count>

Scale out/in the app instances.


Supported Resources

Resource TypeAzure CLIDescription
Web Appsaz webappApp Service Web Apps
Function Appsaz functionappAzure Functions
App Service Plansaz appservice planUnderlying compute
Deployment Slotsaz webapp deployment slotStaging/production slots

Support Case Template

The /apps case command generates a structured support case template ready to submit to Microsoft Support.

Case Template Fields

FieldSourceDescription
TitleAuto-generatedBrief issue summary (e.g., "Web App HTTP 500 - myapp-prod")
SeverityAuto-detectedCritical / High / Medium / Low
DescriptionAuto-generatedDetailed problem description with symptoms
Problem Start TimeLogs/MetricsFirst occurrence timestamp (UTC)
Current StatusLive checkOngoing / Resolved / Intermittent
Subscription IDaz account showAzure subscription GUID
Subscription Nameaz account showAzure subscription display name
Tenant IDaz account showAzure AD tenant GUID
Resource GroupApp configResource group containing the app
Resource URIApp configFull ARM resource ID
App NameApp configWeb App or Function App name
App URLApp configDefault hostname
App Service PlanApp configUnderlying plan name and SKU
Runtime StackApp config.NET, Node, Python, Java, etc.
RegionApp configAzure region
Affected SlotsDiagnosisProduction, staging, etc.
Initial Triage OutcomesAuto-analysisWhat was checked and findings
Suggested Root CauseAI analysisMost likely cause based on evidence
Evidence SummaryLogs/MetricsKey log entries and metrics
Remediation AttemptedTrackingActions already taken
AttachmentsAuto-generatedReference to exported log archive

Case Template Generation Workflow

yaml
# 1. Get Azure context
az account show -o json
  → subscription_id, subscription_name, tenant_id

# 2. Get app details
az webapp show -n <app> -g <rg> -o json
  → resource_uri, default_hostname, runtime, region

# 3. Get App Service Plan details
az appservice plan show -n <plan> -g <rg> -o json
  → sku, workers, capacity

# 4. Run diagnosis
/apps diagnose <app>
  → issues, severity, root_cause, affected_slots

# 5. Get recent logs
az webapp log tail -n <app> -g <rg>
  → problem_start_time, evidence

# 6. Generate case template markdown file
  → ~/Desktop/AppService-Support-Case-<app>-<timestamp>.md

Case Template Output Format

markdown
# Azure Support Case - App Service Incident

## Case Summary
| Field | Value |
|-------|-------|
| **Title** | Web App HTTP 500 - myapp-prod |
| **Severity** | 🔴 Critical |
| **Problem Start Time** | 2026-01-24T21:14:40Z |
| **Current Status** | Ongoing |

## Azure Context
| Field | Value |
|-------|-------|
| **Subscription ID** | 463a82d4-1896-4332-aeeb-618ee5a5aa93 |
| **Subscription Name** | MySubscription |
| **Tenant ID** | <tenant-guid> |
| **Resource Group** | myapp-rg |
| **Resource URI** | /subscriptions/.../sites/myapp-prod |

## App Details
| Field | Value |
|-------|-------|
| **App Name** | myapp-prod |
| **App URL** | https://myapp-prod.azurewebsites.net |
| **App Service Plan** | myapp-plan (P1v3) |
| **Runtime Stack** | .NET 8 |
| **Region** | eastus |
| **Slots** | production, staging |

## Problem Description

Web App `myapp-prod` is returning HTTP 500 errors for approximately 30% of requests.
Application Insights shows increased exception rate starting at 21:14 UTC.

## Initial Triage Outcomes

| Check | Result |
|-------|--------|
| App State | ✅ Running |
| HTTP Ping | ❌ 500 errors |
| App Service Plan | ✅ Healthy |
| CPU Usage | ⚠️ 78% (elevated) |
| Memory Usage | ✅ 45% |
| Instance Count | ✅ 3/3 healthy |

## Suggested Root Cause

**Database connection pool exhaustion**

Application logs show repeated `SqlException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool.`

## Evidence Summary

| Timestamp | Source | Entry |
|-----------|--------|-------|
| 21:14:40 | AppLogs | SqlException: Timeout expired |
| 21:14:42 | AppLogs | Connection pool limit reached |
| 21:15:01 | Platform | HTTP 500.0 - Internal Server Error |

## Recommended Fix

1. Scale out to reduce load per instance:
```bash
az webapp scale -n myapp-prod -g myapp-rg --instance-count 5
  1. Or restart to clear connection pool:
bash
az webapp restart -n myapp-prod -g myapp-rg

Generated by AppsOps v1.0.0 | <timestamp>

code

### Output Location

~/Desktop/AppService-Support-Case-<app>-<timestamp>.md

code

---

## Log Export Feature

The `/apps logs --export` command creates a comprehensive ZIP archive containing all diagnostic data.

### Export Contents

| File | Description |
|------|-------------|
| `SUMMARY.md` | Issue summary with root cause and remediation |
| `app-settings.json` | Application settings (secrets redacted) |
| `app-config.json` | Full app configuration |
| `app-logs.txt` | Application logs |
| `platform-logs.txt` | Platform/HTTP logs |
| `deployment-logs.txt` | Recent deployment logs |
| `metrics.json` | CPU, Memory, Requests, Errors |
| `plan-config.json` | App Service Plan configuration |
| `health-check.json` | Health check probe results |

### Export Workflow

```yaml
# 1. Create timestamped directory
mkdir ~/Desktop/appservice-logs-YYYY-MM-DD-HHMMSS/

# 2. Export app configuration
az webapp show -n <app> -g <rg> -o json → app-config.json
az webapp config appsettings list -n <app> -g <rg> -o json → app-settings.json

# 3. Export logs
az webapp log download -n <app> -g <rg> → app-logs.zip
az webapp log tail -n <app> -g <rg> --provider http → platform-logs.txt

# 4. Export App Service Plan
az appservice plan show -n <plan> -g <rg> -o json → plan-config.json

# 5. Export metrics
az monitor metrics list --resource <resource-id> --metric "Http5xx,Requests,AverageResponseTime,MemoryWorkingSet" → metrics.json

# 6. Generate SUMMARY.md with issues and recommendations

# 7. Create ZIP archive
Compress-Archive → appservice-logs-YYYY-MM-DD-HHMMSS.zip

Output Location

code
~/Desktop/appservice-logs-YYYY-MM-DD-HHMMSS.zip

Log Aggregation

The /apps logs --aggregate command pulls logs from multiple sources and correlates them by timestamp.

Log Sources

SourceAzure CLI / ToolWhat it captures
App Logsaz webapp log tailApplication stdout/stderr
Platform Logsaz webapp log tail --provider httpIIS/HTTP logs
Deployment Logsaz webapp deployment listDeployment history
Metricsaz monitor metrics listCPU, Memory, HTTP errors
App InsightsApplication Insights queryExceptions, traces, requests
Activity Logaz monitor activity-log listManagement operations

Aggregated Output Format

code
📊 LOG AGGREGATION: <app-name> (last 1h)

TIMELINE:
─────────────────────────────────────────────────────
16:45:02 [DEPLOY]    Deployment started by user@domain.com
16:45:30 [DEPLOY]    Deployment completed successfully
16:45:35 [PLATFORM]  Application pool recycled
16:45:40 [APP]       Application started
16:46:15 [APP]       SqlException: Connection timeout
16:46:21 [PLATFORM]  HTTP 500.0 returned to client
─────────────────────────────────────────────────────

🔴 ROOT CAUSE: <description>

WHAT: <resource>
WHY:  <evidence>
FIX:  <command>

MCP Tools Used

ActionAzure CLI Command
List Web Appsaz webapp list -o table
List Function Appsaz functionapp list -o table
List App Service Plansaz appservice plan list -o table
App detailsaz webapp show -n X -g Y -o json
Function detailsaz functionapp show -n X -g Y -o json
Plan detailsaz appservice plan show -n X -g Y -o json
App settingsaz webapp config appsettings list -n X -g Y
Connection stringsaz webapp config connection-string list -n X -g Y
App logsaz webapp log tail -n X -g Y
HTTP logsaz webapp log tail -n X -g Y --provider http
Deployment historyaz webapp deployment list -n X -g Y
Metricsaz monitor metrics list --resource <id> --metric Http5xx,Requests,AverageResponseTime,MemoryWorkingSet
Activity logaz monitor activity-log list --resource-id <id>
Restart appaz webapp restart -n X -g Y
Scale appaz webapp scale -n X -g Y --instance-count N
Health checkaz webapp show -n X -g Y --query "state"

Issue Detection

Automatic checks when running /apps status or /apps diagnose:

  1. App state (Running/Stopped)
  2. HTTP health (ping default hostname)
  3. HTTP error rate (Http5xx metric in last hour)
  4. Response time (AverageResponseTime > 5s warning)
  5. Memory utilization (MemoryWorkingSet trending high)
  6. Instance count (NumberOfWorkers vs expected)
  7. Recent deployments (failed deployments)
  8. Certificate expiry (< 30 days warning)
  9. App Service Plan limits (quota usage)
  10. Slot swap status

Common Issues Detected

IssueDetection MethodAuto-Fix Available
App stoppedaz webapp show --query stateaz webapp start
High memoryMemoryWorkingSet metricsRestart or scale
HTTP 5xx spikeHttp5xx metricsDiagnose logs
Slow responseAverageResponseTime metricsScale out
Deployment failedDeployment historyRedeploy previous
SSL cert expiringCertificate checkAlert only
Cold start issuesFunction App metricsWarm-up config
Slot swap failedActivity logManual intervention

Output Format (When Issues Found)

code
🚨 ISSUE: <brief description>

WHAT: <app-name> in <state>
WHY:  <root cause from logs/metrics>
FIX:  <exact command to resolve>

Function App Specific Commands

/apps functions list <app-name>

List all functions in a Function App with their status.

/apps functions logs <app-name> <function-name>

Get invocation logs for a specific function.

/apps functions disable <app-name> <function-name>

Disable a problematic function without stopping the entire app.


Slot Operations

/apps slots list <app-name>

List all deployment slots with their status.

/apps slots swap <app-name> --source staging --target production

Swap slots with traffic routing.


AppsOps v1.0.0 | Powered by Azure MCP