AgentSkillsCN

azure-diagnostics

在 Azure 上调试并排查生产环境中的问题。涵盖容器应用诊断、使用 KQL 进行日志分析、健康检查,以及解决镜像拉取、冷启动与健康探针等常见问题。 适用场景:调试生产问题、排查容器应用、使用 KQL 分析日志、修复镜像拉取失败、解决冷启动问题、调查健康探针故障、检查资源健康状况、查看应用日志、定位错误根源。 切勿用于:部署应用(使用 Azure Deploy)、创建新资源(使用 Azure Prepare)、设置监控(使用 Azure Observability)、成本优化(使用 Azure Cost Optimization)。

SKILL.md
--- frontmatter
name: azure-diagnostics
description: |
  Debug and troubleshoot production issues on Azure. Covers Container Apps diagnostics, log analysis with KQL, health checks, and common issue resolution for image pulls, cold starts, and health probes.
  USE FOR: debug production issues, troubleshoot container apps, analyze logs with KQL, fix image pull failures, resolve cold start issues, investigate health probe failures, check resource health, view application logs, find root cause of errors
  DO NOT USE FOR: deploying applications (use azure-deploy), creating new resources (use azure-prepare), setting up monitoring (use azure-observability), cost optimization (use azure-cost-optimization)

Azure Diagnostics

AUTHORITATIVE GUIDANCE — MANDATORY COMPLIANCE

This document is the official source for debugging and troubleshooting Azure production issues. Follow these instructions to diagnose and resolve common Azure service problems systematically.

Triggers

Activate this skill when user wants to:

  • Debug or troubleshoot production issues
  • Diagnose errors in Azure services
  • Analyze application logs or metrics
  • Fix image pull, cold start, or health probe issues
  • Investigate why Azure resources are failing
  • Find root cause of application errors

Rules

  1. Start with systematic diagnosis flow
  2. Use AppLens (MCP) for AI-powered diagnostics when available
  3. Check resource health before deep-diving into logs
  4. Select appropriate troubleshooting guide based on service type
  5. Document findings and attempted remediation steps

Quick Diagnosis Flow

  1. Identify symptoms - What's failing?
  2. Check resource health - Is Azure healthy?
  3. Review logs - What do logs show?
  4. Analyze metrics - Performance patterns?
  5. Investigate recent changes - What changed?

Troubleshooting Guides by Service

ServiceCommon IssuesReference
Container AppsImage pull failures, cold starts, health probes, port mismatchescontainer-apps/

Quick Reference

Common Diagnostic Commands

bash
# Check resource health
az resource show --ids RESOURCE_ID

# View activity log
az monitor activity-log list -g RG --max-events 20

# Container Apps logs
az containerapp logs show --name APP -g RG --follow

AppLens (MCP Tools)

For AI-powered diagnostics, use:

code
mcp_azure_mcp_applens
  intent: "diagnose issues with <resource-name>"
  command: "diagnose"
  parameters:
    resourceId: "<resource-id>"

Provides:
- Automated issue detection
- Root cause analysis
- Remediation recommendations

Azure Monitor (MCP Tools)

For querying logs and metrics:

code
mcp_azure_mcp_monitor
  intent: "query logs for <resource-name>"
  command: "logs_query"
  parameters:
    workspaceId: "<workspace-id>"
    query: "<KQL-query>"

See kql-queries.md for common diagnostic queries.


Check Azure Resource Health

Using MCP

code
mcp_azure_mcp_resourcehealth
  intent: "check health status of <resource-name>"
  command: "get"
  parameters:
    resourceId: "<resource-id>"

Using CLI

bash
# Check specific resource health
az resource show --ids RESOURCE_ID

# Check recent activity
az monitor activity-log list -g RG --max-events 20

References