Backend Debug — 後端標準除錯技巧
原則:GitHub Issue 留言區 = 唯一真相來源。 所有除錯過程(重現、根因、修復、驗證)都必須完整記錄在 Issue 留言區。
Usage
code
/backend-debug {issue_number}
RULES
Rule 0: 所有過程記錄到 GitHub Issue
不管多簡單的 API bug,完整過程都必須 post 到 Issue 留言區(同 quick-verify Rule 0)。
Rule 1: 先重現,再讀 code
code
❌ 先讀 backend/app/routers/rooms.py 分析 ✅ 先 curl staging API 看到 500 error
Rule 2: Evidence = 真實 response,不是 code 分析
code
❌ "根據代碼分析,這裡應該會 return 500"
✅ curl -v https://staging-url/api/rooms → 實際得到 {"error": "..."} (HTTP 500)
Rule 3: 修完要驗證,驗完要貼 Issue
code
❌ "代碼改好了,應該沒問題" ✅ curl 打一次確認 200 OK,response 正確,貼到 Issue
Step 1: 準備 + 讀 Issue
bash
ISSUE_NUM={N}
mkdir -p /tmp/bugfix/issue-${ISSUE_NUM}
# 讀 issue
gh issue view ${ISSUE_NUM}
# 取得 staging URL
STAGING_URL=$(gh run list --branch staging --limit 1 --json url -q '.[0].url' 2>/dev/null)
# 或從 CLAUDE.md 取得已知 URL
Step 2: 重現(BEFORE evidence)
2a. API 錯誤重現
bash
# 基本 GET
curl -s -w "\nHTTP_STATUS: %{http_code}\n" \
https://staging-url/api/endpoint \
| tee /tmp/bugfix/issue-${ISSUE_NUM}/01-before-api.txt
# POST with auth
curl -s -w "\nHTTP_STATUS: %{http_code}\n" \
-X POST https://staging-url/api/endpoint \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{"key": "value"}' \
| tee /tmp/bugfix/issue-${ISSUE_NUM}/01-before-api.txt
# 確認 evidence 存在
ls -lh /tmp/bugfix/issue-${ISSUE_NUM}/01-before-*.txt
2b. GCP Cloud Run logs 檢查
bash
# 查看最近的 error logs
gcloud logging read \
'resource.type="cloud_run_revision" AND resource.labels.service_name="career-creator-backend-staging" AND severity>=ERROR' \
--project=career-creator-card \
--limit=20 \
--format="table(timestamp, textPayload, jsonPayload.message)" \
| tee /tmp/bugfix/issue-${ISSUE_NUM}/01-gcloud-logs.txt
# 查特定時間範圍
gcloud logging read \
'resource.type="cloud_run_revision" AND resource.labels.service_name="career-creator-backend-staging"' \
--project=career-creator-card \
--freshness=1h \
--limit=50
2c. DB 問題檢查
bash
# 連接 staging DB(從 .env 取得 DATABASE_URL) # 確認資料狀態 psql "$DATABASE_URL" -c "SELECT * FROM table WHERE condition LIMIT 5;" # 檢查 migration 狀態 cd backend && alembic current alembic history --verbose | head -20
2d. Health check
bash
# Backend health curl -s https://staging-url/health | python3 -m json.tool # 確認 Cloud Run revision gcloud run revisions list \ --service=career-creator-backend-staging \ --project=career-creator-card \ --region=asia-east1 \ --limit=3
HARD STOP: 如果無法重現 → 標記 cannot-reproduce,comment 到 Issue,不要猜測修復。
Step 3: 根因分析(5-Why)
重現後才能讀 code:
bash
# 現在可以讀 source code 了 # 從 error message / stack trace 追蹤 # FastAPI 常見問題檢查清單: # [ ] Router 是否正確 register?(app/main.py include_router) # [ ] Pydantic model validation error?(response_model vs 實際 return) # [ ] DB session 是否正確 commit/rollback? # [ ] Environment variable 是否在 Cloud Run 設定? # [ ] CORS origin 是否包含前端 URL? # [ ] Auth middleware 是否正確解 token?
常見後端 Bug 類型 + 排查路徑
| Bug 類型 | 症狀 | 排查順序 |
|---|---|---|
| ResponseValidationError | 500 + Pydantic error | response_model → return value → nullable fields |
| CORS | 前端 403/Network Error | backend CORS origins → Cloud Run ingress → LB config |
| Auth | 401/403 | token format → middleware → Supabase JWT verify |
| DB | 500 + sqlalchemy error | migration → schema match → connection pool |
| Deploy | 503/cold start | Cloud Run logs → health check → startup timeout |
| Env var | KeyError / None | Cloud Run env → .env → Secret Manager |
| Rate limit | 429 | Supabase plan → API quotas → retry logic |
FastAPI 特有的坑
python
# 1. ResponseValidationError(Issue #14 根因) # response_model 定義的 field 和實際 return 不匹配 # → 前端看到的是 500,不是有意義的 error message # → 解法:檢查 response_model 所有 field 都有值 # 2. Optional field 沒給 default # field: str # 如果 DB 該欄位是 NULL → ValidationError # field: str | None = None # 正確 # 3. Depends() 順序 # async def endpoint(db = Depends(get_db), user = Depends(get_current_user)): # → get_current_user 如果也需要 db,要確保 dependency chain 正確 # 4. Background task 裡的 DB session # background_tasks.add_task(some_func, db) # → db session 可能已經 closed → 要在 task 內建新 session
Step 4: 修復 + 驗證(AFTER evidence)
bash
# 切到 fix branch
git checkout -b fix/issue-${ISSUE_NUM}-description
# 修改代碼...
# 本地驗證
cd backend
uvicorn app.main:app --reload --port 8000
# 打 API 確認修復
curl -s -w "\nHTTP_STATUS: %{http_code}\n" \
http://localhost:8000/api/endpoint \
| tee /tmp/bugfix/issue-${ISSUE_NUM}/02-after-api.txt
# 跑測試
cd backend && pytest tests/ -v
# 對比 BEFORE vs AFTER
echo "=== BEFORE ==="
cat /tmp/bugfix/issue-${ISSUE_NUM}/01-before-api.txt
echo ""
echo "=== AFTER ==="
cat /tmp/bugfix/issue-${ISSUE_NUM}/02-after-api.txt
Step 5: 存 Evidence + 貼到 GitHub Issue
bash
REPO="Youngger9765/career-creator"
BRANCH=$(git branch --show-current)
# 存 evidence 到 repo
EVIDENCE_DIR=".claude/evidence/issue-${ISSUE_NUM}"
mkdir -p "$EVIDENCE_DIR"
cp /tmp/bugfix/issue-${ISSUE_NUM}/01-before-api.txt "$EVIDENCE_DIR/"
cp /tmp/bugfix/issue-${ISSUE_NUM}/02-after-api.txt "$EVIDENCE_DIR/"
cp /tmp/bugfix/issue-${ISSUE_NUM}/01-gcloud-logs.txt "$EVIDENCE_DIR/" 2>/dev/null
git add "$EVIDENCE_DIR/"
git commit -m "evidence(#${ISSUE_NUM}): add backend debug evidence"
git push origin "$BRANCH"
# 貼完整過程到 Issue
gh issue comment ${ISSUE_NUM} --body "$(cat <<COMMENT_EOF
## Backend Debug Report — Issue #${ISSUE_NUM}
### 1. 重現過程
- **環境**:staging API ($(echo $STAGING_URL | head -c 60)...)
- **請求**:
\`\`\`bash
curl -X POST https://staging-url/api/xxx -H "Content-Type: application/json" -d '{...}'
\`\`\`
- **回應(BEFORE)**:
\`\`\`
$(head -20 /tmp/bugfix/issue-${ISSUE_NUM}/01-before-api.txt)
\`\`\`
### 2. 根因分析
- **Why 1**:[表面原因 — 前端看到什麼錯誤]
- **Why 2**:[API 回什麼 — HTTP status + response body]
- **Why 3**:[Server log 說什麼 — gcloud logs]
- **Root Cause**:[最終根因]
- **相關檔案**:\`backend/app/xxx.py:line\`
### 3. 修復內容
- **PR**:#XX
- **改了什麼**:[一句話描述]
- **Branch**:\`${BRANCH}\`
### 4. 驗證結果(AFTER)
- **回應(AFTER)**:
\`\`\`
$(head -20 /tmp/bugfix/issue-${ISSUE_NUM}/02-after-api.txt)
\`\`\`
- **測試**:\`pytest tests/ -v\` → PASSED
### 5. 結論
- [ ] Bug 確認修復(待用戶確認)
- 在您確認之前,不會 merge PR 或 close issue
COMMENT_EOF
)"
部署驗證(Deploy 後)
修復 push 到 staging 後,必須確認 Cloud Run 部署成功:
bash
# 1. 等 CI/CD 完成
gh run list --branch staging --limit 1
gh run watch <run-id> --exit-status
# 2. 取得新的 staging URL
gh run view <run-id> --log | grep "Service URL:"
# 3. 打 health check
curl -s https://new-staging-url/health
# 4. 重新打原本出錯的 API
curl -s https://new-staging-url/api/original-endpoint
# → 確認 200 OK
# 5. 回 Issue comment 補充部署驗證結果
gh issue comment ${ISSUE_NUM} --body "### Deploy Verification
- Cloud Run revision: $(gcloud run revisions list --service=career-creator-backend-staging --project=career-creator-card --region=asia-east1 --limit=1 --format='value(name)')
- Health: 200 OK
- API endpoint: 200 OK
- Bug confirmed fixed on staging"
GCP 專案資訊(此專案專用)
code
Project: career-creator-card (849078733818) Region: asia-east1 Frontend Service: career-creator-frontend-staging Backend Service: career-creator-backend-staging
注意:不是 groovy-iris-473015-h3(那是另一個專案)。
Failure Mode Reference
| 症狀 | 錯誤做法 | 正確做法 |
|---|---|---|
| 前端 Network Error | 加 client-side CORS header | 查 backend CORS config |
| 500 但沒有 error detail | 猜測問題在哪 | gcloud logs 看完整 traceback |
| DB connection error | 改 code retry | 檢查 Cloud SQL proxy / connection string |
| Deploy 後 503 | 等它自己好 | 查 Cloud Run startup logs + health check |
| 前端 403 Forbidden | 改前端 fetch config | 檢查 backend auth middleware |
| Env var missing | 在 .env 加值 | 確認 Cloud Run env + Secret Manager 都有 |
Lessons Learned
- •Issue #14: 前端顯示「網路連線失敗」,根因是 FastAPI ResponseValidationError(response_model 和實際 return 不匹配)。curl 一打就看到 500 + Pydantic error。
- •Issue #16: 前端截圖壞掉,以為是 CORS → 加了
useCORS:true→ 沒用。根因是 GCS bucket 沒開 CORS。 - •GCP Project 混淆: 用錯 project ID 會查到別的服務的 logs,浪費 30 分鐘。