AgentSkillsCN

database-migration-deployment

在将数据库支持的服务部署至 Kubernetes 时使用——这些服务在应用启动前需要进行 Schema 迁移,尤其是在存在多个副本、使用 Prisma/Alembic/Liquibase 迁移,或当迁移时机与协调对防止 Schema 不匹配至关重要的情况下。

SKILL.md
--- frontmatter
name: database-migration-deployment
description: Use when deploying database-backed services to Kubernetes that require schema migrations before app startup, especially with multiple replicas, Prisma/Alembic/Liquibase migrations, or when migration timing and coordination is critical to prevent schema mismatches

Database Migration Deployment Pattern

Overview

Deploy database-backed services to Kubernetes using the Migration Job + Init Container pattern. This ensures database migrations complete successfully before application pods start, preventing schema mismatches and data corruption.

When to Use

Use this pattern when:

  • Service requires database schema migrations before startup
  • Deploying to Kubernetes with FluxCD
  • Using Prisma, Alembic, Liquibase, or similar migration tools
  • Running multiple replicas (horizontal scaling)
  • Need atomic, fail-fast deployments

Don't use when:

  • Database is external/managed and migrations run separately
  • Using schema-less databases (Redis, MongoDB without schemas)
  • Application handles migrations internally with locking

Core Pattern

code
Migration Job (runs once) → Init Container (waits) → Service Pods (start)

Key principle: Migrations are a separate Job, not part of the Deployment lifecycle.

Why This Pattern

ApproachProblemThis Pattern Solves
Migration in Deployment commandMultiple replicas race, conflictsSingle Job runs once
No coordinationPods start before migration completesInit container blocks startup
CI/CD runs migrationOutside K8s, harder to debugJob logs in cluster
Init container onlyHard to debug, no job statusSeparate Job with clear status

Implementation

1. Migration Job

yaml
# migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: myapp-migration-${IMAGE_TAG}  # ← Include version
  annotations:
    kustomize.toolkit.fluxcd.io/force: Enabled  # ← Force recreation
spec:
  backoffLimit: 0           # ← Fail fast, no retries
  ttlSecondsAfterFinished: 3600  # ← Auto-cleanup after 1 hour
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: registry.example.com/myapp-migration:${IMAGE_TAG}
          command: ["npm", "run", "db:migrate:deploy"]  # Or: npx prisma migrate deploy
          envFrom:
            - configMapRef:
                name: myapp-config
            - secretRef:
                name: myapp-secrets  # DATABASE_URL here

Critical details:

  • Job name includes version tag - ensures new Job per deployment
  • Force annotation - tells FluxCD to recreate even if spec unchanged
  • backoffLimit: 0 - fail immediately on error, don't retry
  • ttlSecondsAfterFinished - cleanup old jobs automatically
  • Same env as service - DATABASE_URL, credentials identical

2. Deployment with Init Container

yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 2  # ← Safe with multiple replicas
  template:
    spec:
      initContainers:
        - name: wait-for-migration
          image: groundnuty/k8s-wait-for:v2.0
          args:
            - "job-wr"  # ← Wait for job completion (with readiness)
            - "myapp-migration-${IMAGE_TAG}"  # ← Must match Job name
      containers:
        - name: service
          image: registry.example.com/myapp:${IMAGE_TAG}
          # ... rest of config

Critical details:

  • job-wr argument - waits for Job to complete successfully
  • Same version tag - init waits for SAME version migration
  • All replicas wait - every pod blocked until migration succeeds

3. Variable Substitution

yaml
# deployment-variables.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: deployment-variables
data:
  IMAGE_TAG: "v1.2286938253.0"  # ← Updated by CI/CD
yaml
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1
kind: Kustomization
resources:
  - deployment-variables.yaml
  - migration-job.yaml  # ← Before deployment
  - deployment.yaml
replacements:
  - source:
      kind: ConfigMap
      name: deployment-variables
      fieldPath: data.IMAGE_TAG
    targets:
      - select:
          kind: Job
          name: myapp-migration-.*
        fieldPaths:
          - metadata.name
        options:
          delimiter: '-'
          index: 2
      - select:
          kind: Deployment
        fieldPaths:
          - spec.template.spec.initContainers.[name=wait-for-migration].args.[1]

Critical details:

  • Kustomize replacements - substitute ${IMAGE_TAG} everywhere
  • Resource order - migration-job BEFORE deployment in list
  • Same tag everywhere - migration, init container, service all match

4. GitLab CI Integration

yaml
# Build two images
build:docker:service:
  script:
    - /kaniko/executor
      --destination ${CI_REGISTRY_IMAGE}/service:${IMAGE_TAG}

build:docker:migration:
  script:
    - /kaniko/executor
      --context ./
      --dockerfile ./Dockerfile.migration  # ← Same code, different command
      --destination ${CI_REGISTRY_IMAGE}/migration:${IMAGE_TAG}

# Update deployment variables
commit:image-tag:
  script:
    - sed -i "s/IMAGE_TAG: .*/IMAGE_TAG: \"${IMAGE_TAG}\"/"
        k8s/deployment-variables.yaml
    - git add k8s/deployment-variables.yaml
    - git commit -m "ci: update service image tag to ${IMAGE_TAG} [skip ci]"
    - git push origin ${CI_COMMIT_BRANCH}

# Package manifests with Flux
deploy:service-manifests:
  script:
    - flux push artifact oci://${CI_REGISTRY_IMAGE}/manifests:${IMAGE_TAG}
        --path="./k8s"
    - flux reconcile source oci -n myapp myapp-service

Critical details:

  • Two images, same version - service and migration always in sync
  • Commit tag back - deployment-variables.yaml updated in git
  • [skip ci] flag - prevent infinite CI loop

5. Dockerfile.migration

dockerfile
# Dockerfile.migration - Identical to service Dockerfile
FROM node:20-alpine AS base
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY prisma ./prisma
RUN npx prisma generate

COPY . .

# ← No CMD/ENTRYPOINT - Job specifies command

Critical details:

  • Same as service Dockerfile - ensures identical dependencies
  • No entrypoint - Job manifest specifies command
  • Include prisma client - npx prisma generate step

Deployment Flow

code
1. Push to main branch
   ↓
2. GitLab CI: Build service + migration images (same tag)
   ↓
3. GitLab CI: Update deployment-variables.yaml, commit, push
   ↓
4. FluxCD: Detect change, reconcile
   ↓
5. Create Migration Job (myapp-migration-v1.123.0)
   ├─ Runs: npm run db:migrate:deploy
   ├─ Success → Job status: Complete
   └─ Failure → Job status: Failed
   ↓
6. Create Deployment pods
   ├─ Init container: wait-for-migration
   ├─ Waits for Job myapp-migration-v1.123.0
   ├─ Job Complete → Init exits → Service starts
   └─ Job Failed → Init blocks forever → Pods stay Pending

Troubleshooting

Use the Flux MCP tools to:

  • Check Job status
  • Check logs

Pods stuck in "Init:0/1"

Common causes:

  • DATABASE_URL incorrect or unreachable
  • Migration syntax error
  • Database locked or busy

Job completes but pods still waiting

Common causes:

  • Job name mismatch (typo in args)
  • Namespace mismatch
  • RBAC permissions missing

Multiple Jobs with same name

This shouldn't happen due to force annotation If it does, manually delete old jobs

Migration succeeds but service fails

Migration and service are separate concerns:

  • Migration: database schema
  • Service: application code

Check if app code matches schema version

RBAC Requirements

ServiceAccount needs permissions to read Job status:

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: myapp-role
rules:
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: myapp-rolebinding
subjects:
  - kind: ServiceAccount
    name: myapp-service
roleRef:
  kind: Role
  name: myapp-role
  apiGroup: rbac.authorization.k8s.io

Common Mistakes

MistakeSymptomFix
Job name without versionJobs collide, FluxCD confusedInclude ${IMAGE_TAG} in name
Missing force annotationJob doesn't recreateAdd kustomize.toolkit.fluxcd.io/force
backoffLimit > 0Failed migration retries, wastes timeSet backoffLimit: 0
Different DATABASE_URLMigration succeeds, service failsUse same ConfigMap/Secret
Init waits for wrong jobPods stuck foreverMatch job name exactly in args
Migration in deployment order AFTER serviceRace conditionList migration-job.yaml BEFORE deployment.yaml

Real-World Example

See ralph-wiggum-service for complete working implementation:

  • ralph-wiggum-service/k8s/migration-job.yaml - Job definition
  • ralph-wiggum-service/k8s/deployment.yaml - Init container (line 29-40)
  • ralph-wiggum-service/k8s/deployment-variables.yaml - Version tracking
  • .gitlab/ci/service-build.yaml - Two-image build (line 42-135)
  • .gitlab/ci/service-deploy.yaml - FluxCD packaging

Quick Reference

Essential Job settings:

  • backoffLimit: 0 - fail fast
  • ttlSecondsAfterFinished: 3600 - auto-cleanup
  • restartPolicy: Never - no pod restarts
  • kustomize.toolkit.fluxcd.io/force: Enabled - force recreate

Essential Init container:

  • Image: groundnuty/k8s-wait-for:v2.0
  • Args: ["job-wr", "migration-job-name"]
  • Name must match Job name exactly

Version synchronization:

  • Same ${IMAGE_TAG} in Job name, init args, images
  • CI/CD updates deployment-variables.yaml
  • Kustomize substitutes everywhere