AgentSkillsCN

azure-deployment-operations

针对 Azure 静态 Web 应用、容器应用、应用服务以及基础架构,制定全面的生产部署方案。

SKILL.md
--- frontmatter
name: "azure-deployment-operations"
description: "Production deployment patterns for Azure Static Web Apps, Container Apps, App Service, and infrastructure"
metadata:
  inheritance: inheritable

Azure Deployment Operations

Battle-tested patterns for deploying and operating Azure services in production.

Scope: Inheritable skill. Covers SWA, Container Apps, App Service, security posture, rate limiting, production checklists, and multi-subscription management.

Azure Static Web Apps (SWA)

Environment Deployment

SWA defaults to preview environments. Always specify production explicitly:

bash
# Anti-pattern: deploys to preview environment
swa deploy

# Correct: explicitly target production
swa deploy --env production

Rule: Every SWA deployment command in CI/CD and scripts MUST include --env production unless creating a preview intentionally.

Custom Domain Registration

Custom domains require a two-step process:

StepActionValidates
1. CNAME recordPoint domain to SWA default hostnameDNS ownership
2. Hostname registrationaz staticwebapp hostname setAzure binding

Common error: Adding CNAME but forgetting hostname registration. Both steps are required.

SWA Configuration

json
{
  "navigationFallback": {
    "rewrite": "/index.html",
    "exclude": ["/api/*", "/assets/*"]
  },
  "routes": [
    { "route": "/api/*", "allowedRoles": ["authenticated"] }
  ],
  "globalHeaders": {
    "X-Content-Type-Options": "nosniff",
    "X-Frame-Options": "DENY"
  }
}

Azure Container Apps

Health Probe Configuration

Container Apps health probes have specific threshold ranges:

ParameterMinMaxRecommended
failureThreshold1483-5 for liveness, 10-30 for startup
periodSeconds124010 for liveness, 5 for startup
initialDelaySeconds0600 for liveness (use startup probe instead)
timeoutSeconds12405

Pattern: Use a startup probe with high failure threshold (30) for slow-starting apps instead of a high initialDelaySeconds on the liveness probe.

Container Apps Deployment

bash
# Update with new image
az containerapp update \
  --name myapp \
  --resource-group myrg \
  --image myregistry.azurecr.io/myapp:v1.2.3

# Scale configuration
az containerapp update \
  --name myapp \
  --resource-group myrg \
  --min-replicas 1 \
  --max-replicas 10

Azure App Service

11-Step Deployment Pipeline

Typical App Service deployment takes ~7 minutes:

StepDurationAction
15sAuthenticate to Azure
210sValidate resource group exists
330sBuild application
415sRun tests
520sPackage artifacts
610sUpload to staging slot
760sWarm up staging slot
85sRun smoke tests on staging
930sSwap staging → production
1010sValidate production health
115sTag release in source control

Rule: Always use staging slots for zero-downtime deployment. Direct-to-production deployments cause cold-start downtime.

Production Readiness Checklist

CategoryRequirementWhy
ComputeP1v3 or higherBurstable tiers have CPU throttling
NetworkingVNet integrationIsolate from public internet
DataPrivate endpoints for storage/DBNo public connection strings
IdentityManaged identity (no connection strings)Eliminates secret rotation
MonitoringApplication Insights enabledObservability
ScalingAuto-scale rules configuredHandle load spikes
BackupAutomated backup policyDisaster recovery
SSLCustom domain + managed certificateTrust and security

Security Posture Assessment

Pass/Fail Matrix

Use a matrix to track security controls across all resources:

ControlApp ServiceSQL DBStorageKey Vault
Managed Identity
Private Endpoint
Diagnostic Logs
RBAC (no keys)
Encryption at Rest

Rule: Any ❌ in the matrix is a tracked remediation item with a priority (P0-P3) and SLA.

Rate Limiting

Spread vs. Burst

For API calls and deployment operations:

StrategyPatternUse When
Spread10 calls/sec evenly spacedSustained throughput
Burst100 calls then waitQuick batch operations

Rule: Prefer spread over burst for production workloads. Azure APIs throttle based on request rate, and burst patterns hit throttle limits earlier than spread patterns with the same total throughput.

Retry Pattern

typescript
async function withRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3,
  baseDelay = 1000
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      if (attempt === maxRetries) throw err;
      if (err.statusCode === 429) {
        // Use Retry-After header if available
        const delay = err.headers?.['retry-after']
          ? parseInt(err.headers['retry-after']) * 1000
          : baseDelay * Math.pow(2, attempt);
        await new Promise(r => setTimeout(r, delay));
      } else {
        throw err; // Don't retry non-throttle errors
      }
    }
  }
  throw new Error('Unreachable');
}

Multi-Subscription Management

Folder Organization

For organizations with multiple Azure subscriptions:

code
azure/
├── production/
│   ├── main.bicep
│   └── parameters.prod.json
├── staging/
│   ├── main.bicep
│   └── parameters.staging.json
├── development/
│   └── parameters.dev.json
└── shared/
    ├── modules/          # Shared Bicep modules
    └── policies/         # Azure Policy definitions

Resource Inventory

Before any infrastructure changes, document what exists:

bash
# List all resources in subscription
az resource list --subscription "My Subscription" \
  --output table \
  --query "[].{Name:name, Type:type, RG:resourceGroup, Location:location}"

# Export to JSON for diff tracking
az resource list --subscription "My Subscription" -o json > inventory.json

Subscription Documentation Template

Each subscription should have a living document with:

SectionContent
PurposeWhat this subscription is for
OwnerTeam/person responsible
BudgetMonthly spend limit and alerts
ResourcesLink to inventory command output
AccessRBAC assignments and justification
NetworkingVNet topology, peering, DNS zones

Known Gotchas

Mail.Send on Corporate Tenants

Microsoft 365 corporate tenants often block Mail.Send permission for third-party apps. If your app needs to send email:

  1. Check tenant admin consent policies first
  2. Consider SendMail via Graph with delegated (not application) permissions
  3. Have a fallback (SMTP, SendGrid) for blocked tenants
  4. Document the limitation clearly for users

Azure CLI Context

bash
# Always verify which subscription is active
az account show --query "{Name:name, Id:id}" -o table

# Set explicitly before operations
az account set --subscription "Target Subscription"

Rule: Never assume the correct subscription is active. Always verify or set explicitly in scripts.

Infrastructure as Code

Bicep Best Practices for Deployment

PracticeWhy
Use modules for reusable componentsDRY principle
Parameters file per environmentEnvironment isolation
@secure() decorator for secretsPrevents logging
existing keyword for referencesNo accidental recreation
What-if before deployCatch unintended changes
bash
# Always preview changes before deploying
az deployment group what-if \
  --resource-group myrg \
  --template-file main.bicep \
  --parameters @parameters.prod.json