AgentSkillsCN

app-platform-troubleshooting

通过访问容器、分析日志、运行诊断工具并实施修复措施,对正在运行的 App Platform 应用进行调试。适用于应用无法成功部署、运行时崩溃、出现连接问题,或需要进行性能诊断时使用。

SKILL.md
--- frontmatter
name: app-platform-troubleshooting
version: 1.0.0
min_doctl_version: "1.82.0"
description: Debug running App Platform applications by accessing containers, analyzing logs, running diagnostics, and applying fixes. Use when apps fail to deploy, crash at runtime, have connectivity issues, or need performance diagnosis.
related_skills: [deployment, networking, postgres]
deprecated: false

App Platform Troubleshooting Skill

Transform debugging from guessing to rapid diagnosis and fix.

Philosophy

code
Traditional: See error → Guess → Change → Push → Wait 5-7 min → Repeat
With skill:  See error → Diagnose → Fix → Verify → Commit proper fix

Quick Decision

code
Is the app deployed with running containers?
├── YES → Can we access the shell?
│         ├── YES → LIVE MODE (SDK shell access)
│         └── NO  → LOGS-ONLY MODE (fetch logs)
└── NO (build/deploy failed) → LOGS-ONLY MODE

Mode 1: Live Troubleshooting (Quick Start)

python
from do_app_sandbox import Sandbox

app = Sandbox.get_from_id(app_id="<app-id>", component="web")

# Diagnostics
app.exec("env | grep DATABASE")
app.exec("curl -v localhost:8080/health")
app.exec("ps aux | head -10")

Full guide: See live-troubleshooting.md


Mode 2: Logs-Only (Quick Start)

bash
# Runtime logs
doctl apps logs <app_id> <component> --type run

# Build logs
doctl apps logs <app_id> <component> --type build

# Crash logs
doctl apps logs <app_id> --type=run_restarted

Full guide: See logs-analysis.md


Debug Container (Infrastructure Issues)

Deploy in ~30-45 seconds to isolate infrastructure from application:

yaml
services:
  - name: debug
    image:
      registry_type: GHCR
      registry: ghcr.io
      repository: bikramkgupta/debug-python
      tag: latest
    http_port: 8080
    envs:
      - key: DATABASE_URL
        value: ${db.DATABASE_URL}
bash
# Run validation suite
validate-infra all
validate-infra database
validate-infra kafka

Full guide: See debug-container.md


Quick Reference: Exit Codes

CodeSignalMeaning
0-Clean exit (shouldn't exit)
1-General error
127-Command not found
137SIGKILLOOM killed
143SIGTERMGraceful shutdown

Quick Reference: Common Fixes

ProblemQuick Fix
App exits immediatelyCheck if listening on $PORT
502 errorsCheck health endpoint, verify running
Database connection failsUse Debug Container, verify trusted sources
Build failsCheck dependencies, review build logs
OOM killsUpgrade instance size
Health checks failBind to 0.0.0.0, not localhost
Slow startupIncrease initial_delay_seconds

Reference Files


When to Escalate

Contact DigitalOcean Support when:

  • Internal error persists after redeploy
  • Resource limit increases needed
  • Multiple apps affected (platform issue)
  • VPC/networking issues can't be diagnosed

Before escalating, gather:

bash
doctl apps get <app_id> -o json > app_info.json
doctl apps logs <app_id> <component> --type run > runtime.log
doctl apps spec get <app_id> > app_spec.yaml

Integration with Other Skills

  • → deployment: After fixing, deploy proper changes
  • → devcontainers: Reproduce issues locally
  • → postgres: Database-specific configuration
  • → networking: Comprehensive networking docs