AgentSkillsCN

pipeline-design

在设计 CI/CD 流水线、选择部署策略,或架构持续交付工作流时使用。内容涵盖平台选型(GitHub Actions vs GitLab CI)、构建流水线结构、部署策略(滚动部署/金丝雀部署/蓝绿部署)、审批门禁,以及 DORA 指标。

SKILL.md
--- frontmatter
name: pipeline-design
description: Use when designing CI/CD pipelines, selecting deployment strategies, or architecting continuous delivery workflows. Covers platform decisions (GitHub Actions vs GitLab CI), build pipeline structure, deployment strategies (rolling/canary/blue-green), approval gates, and DORA metrics.

Pipeline Design

Platform Selection

FactorGitHub ActionsGitLab CI
Best forOSS, GitHub-native reposSelf-hosted, mono-repos, complex workflows
Reuse modelReusable workflows + composite actionsYAML anchors + include templates, dynamic child pipelines
Container buildsdocker/build-push-action with GHA cacheDinD service (slower) or Kaniko
Security scanningThird-party (Trivy, Snyk, CodeQL)Built-in templates (SAST, container, dependency)
SecretsEnvironment-scoped, OIDC for cloudCI/CD variables, group-level inheritance
Approval gatesEnvironment protection ruleswhen: manual + protected environments

Decision: GitHub for smaller teams + cloud-native, GitLab for self-hosted + complex orchestration.

Build Pipeline Structure

Minimum stages (never skip lint):

  1. Lint → unit tests (fail fast)
  2. Build → containerize, tag with $COMMIT_SHA
  3. Security scan → Trivy/CodeQL
  4. Staging deploy → auto-deploy from main
  5. Production deploy → manual approval + tag-triggered

Parallelism

  • Run lint + unit tests parallel (independent)
  • Matrix builds for multi-version/multi-OS testing
  • Docker layer caching: cache-from: type=gha + cache-to: type=gha,mode=max

Artifacts & Caching

  • GitHub: Use actions/setup-* built-in caching (cache: parameter), hashFiles('**/package-lock.json')
  • GitLab: Cache key ${CI_COMMIT_REF_SLUG}, use policy: pull-push, prefer artifacts for stage handoff
  • Expire artifacts: 1h intermediates, 1d deploy artifacts (unbounded artifacts = storage waste)

Deployment Strategy Selection

StrategyUse WhenRollback TimeRisk
Rolling updateLow-traffic services, no state migration5-10mMedium (partial outage if fails)
Blue-greenStateless APIs, need instant rollback<1mLow (switch load balancer)
CanaryHigh-traffic services, catch bugs early15-30mLow (gradual rollout)
Feature flagsComplex business logic, team gatingInstantLowest (conditional logic)

Guardrails: Always use one strategy consistently per environment. Never mix.

Approval Gates

PatternBest ForKey Setting
Manual approvalProduction deploys, complianceEnvironment protection rules (GitHub) / when: manual (GitLab)
Tag-triggeredRelease trains, semantic versioningOnly deploy on refs: tags/v*
Time-basedScheduled maintenance windowsGitLab when: delayed + start_in: 30m
Multi-approverEnterprise complianceAzure Pipelines ManualValidation@0 + email notifiers

Rule: Never auto-deploy to production. Always require manual gate OR tag-triggered release.

Security Scanning

Non-negotiable: Every pipeline must include dependency scanning + SAST.

GitHub Actions

  • Pin action versions to SHA (not tags): uses: actions/checkout@abc123... not @v4
  • Trivy for filesystem + container scanning → output SARIF → GitHub Security tab
  • CodeQL for deep SAST analysis
  • Set permissions: block explicitly per job (default token has limited perms)

GitLab CI

  • Use built-in include: Security/SAST.gitlab-ci.yml, Security/Dependency-Scanning.gitlab-ci.yml, Security/Container-Scanning.gitlab-ci.yml
  • Start with allow_failure: true, tighten to false once baseline clean
  • Use rules: syntax (deprecated only/except)

Container Registry Security

  • Tag immutably with $COMMIT_SHA, convenience tag with latest + semver on releases
  • Never embed secrets in Docker build → use --secret flag or build args
  • Use docker/metadata-action for consistent tagging across images

Post-Deployment Verification

  • Readiness: kubectl rollout status deployment/my-app --timeout=5m
  • Health checks: Hit /health endpoint 10x with 10s backoff
  • Error rate: Query Prometheus, rollback if error_rate > 1%
  • Metrics window: Wait 60s post-deploy before sampling metrics (startup noise)

Automated Rollback

yaml
deploy:
  steps:
    - run: kubectl apply -f k8s/
    - run: kubectl rollout status deployment/my-app --timeout=5m
    - run: curl -f https://app.example.com/health

    - if: failure()
      run: kubectl rollout undo deployment/my-app

DORA Metrics

MetricTargetHow to Track
Deployment Frequency≥dailyCount deployments/day
Lead Time<1h (elite)Merge commit → production
Change Failure Rate<15%Failed deployments / total
MTTR<1hRollback completion time

Dashboard: Export from CI/CD platform (GitHub Actions: jobs API, GitLab: CI metrics API).

Gotchas

  • GitHub: GITHUB_TOKEN has limited permissions → explicitly set permissions: block
  • GitHub: actions/checkout@v4 fetches 1 commit by default → add fetch-depth: 0 for full history (changelogs, version bumps)
  • GitLab: DinD (docker:24-dind service) requires DOCKER_TLS_CERTDIR: "/certs" (TLS errors otherwise)
  • Both: Never store secrets in YAML → use platform secret management
  • Both: Pin all third-party actions/images to specific versions (not latest or master)
  • Both: Coverage reports need explicit format → cobertura for GitLab MR display, lcov for Codecov
  • Both: Unbounded artifacts/cache destroy pipeline speed → set expiration policies
  • Kubernetes: Prefer GitOps (ArgoCD/Flux) over direct kubectl from CI for production

Cross-References

  • devops:github-actions-patterns -- deep GitHub Actions patterns: reusable workflows, OIDC, matrix strategies
  • devops:docker-patterns -- container build optimization, multi-stage builds for CI
  • devops:gitops-workflow -- ArgoCD/Flux deployment patterns, GitOps pipeline integration