AgentSkillsCN

infrastructure-management

覆盖 Docker、Kubernetes、GitOps 以及 CI/CD 流水线。包括多阶段构建、Helm Chart、ArgoCD/Flux 部署、GitHub Actions 工作流,以及容器安全防护。在容器化应用、搭建部署流程,或当用户询问“如何将应用部署到 Kubernetes?”或“怎样的 CI/CD 配置才是最优方案?”时,可灵活运用。

SKILL.md
--- frontmatter
name: infrastructure-management
description: >-
  Covers Docker, Kubernetes, GitOps, and CI/CD pipelines. Includes multi-stage
  builds, Helm charts, ArgoCD/Flux deployment, GitHub Actions workflows, and
  container security. Use when containerizing apps, setting up deployments, or
  when the user asks "how do I deploy to Kubernetes?" or "what's the best CI/CD
  setup?"
user-invocable: false
agent: devops
allowed-tools: 'Read, Write, Edit, Bash, Glob, Grep'

Infrastructure

Infrastructure patterns for containerization, orchestration, CI/CD pipelines, and deployment automation.

Stack Overview

LayerTechnologies
ContainersDocker, BuildKit, multi-stage builds
OrchestrationKubernetes, Helm, Kustomize
GitOpsArgoCD, Flux, Argo Rollouts
CI/CDGitHub Actions, GitLab CI
RegistriesGHCR, ECR, GCR, DockerHub

Philosophy

  1. Infrastructure as Code - All configuration in version control
  2. GitOps - Git as the single source of truth for deployments
  3. Security by Default - Non-root, minimal images, no secrets in code
  4. Observability - Health checks, probes, structured logging
  5. Reproducibility - Pinned versions, lockfiles, deterministic builds

Quick Reference

Docker Essentials

dockerfile
# Multi-stage build with non-root user
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

FROM python:3.12-slim
RUN useradd --uid 1000 --create-home appuser
COPY --from=builder /install /usr/local
COPY --chown=appuser:appuser . .
USER appuser
HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:8000/health || exit 1
ENTRYPOINT ["python", "-m", "app"]

Kubernetes Pod Spec

yaml
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
  containers:
    - name: app
      image: app:v1.0.0  # Never :latest
      resources:
        requests: {memory: "256Mi", cpu: "100m"}
        limits: {memory: "512Mi", cpu: "500m"}
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
      livenessProbe:
        httpGet: {path: /health, port: 8000}
        initialDelaySeconds: 10
      readinessProbe:
        httpGet: {path: /ready, port: 8000}
        initialDelaySeconds: 5

GitHub Actions Cache

yaml
- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-

Topics

TopicReference FileUse When
Dockerreferences/docker.mdWriting Dockerfiles, optimizing builds, adding health checks
Kubernetesreferences/kubernetes.mdCreating deployments, services, probes, resource limits
GitOpsreferences/gitops.mdSetting up ArgoCD, Kustomize, sync policies
CI/CDreferences/ci-cd.mdBuilding GitHub Actions workflows, caching, secrets
Troubleshootingreferences/troubleshooting.mdDebugging CI failures, version conflicts, cache issues

Available Scripts

ScriptUsageDescription
scripts/check-dockerfile.shcheck-dockerfile.sh <file>Validate Dockerfile best practices
scripts/validate-k8s-manifest.pyvalidate-k8s-manifest.py <file>Check K8s manifest for required fields

Critical Rules

Always

  • Use multi-stage builds to minimize image size
  • Run containers as non-root user (UID 1000)
  • Include health checks in all services
  • Pin specific image versions (no :latest)
  • Set resource requests AND limits
  • Use npm ci / pip-sync in CI (not install)
  • Commit lockfiles to version control

Never

  • Commit secrets to version control
  • Use :latest tags in production
  • Skip security scanning in CI
  • Deploy without rollback capability
  • Store state in containers
  • Run as root in production

CI Failure Triage

code
CI Failed
+-- Same code passes locally?
|   +-- YES --> Check environment differences
|   |   +-- Python/Node version
|   |   +-- Environment variables
|   |   +-- File permissions
|   |   +-- Installed dependencies
|   +-- NO --> Fix the actual bug
+-- Flaky (sometimes passes)?
|   +-- Check for race conditions, shared state, timeouts
+-- Always fails in CI?
    +-- Check runner resources (memory, timeout)
    +-- Check external service access
    +-- Check CI-specific config

Quick Diagnostics

bash
# Check local vs CI Python version
python --version

# Check installed package versions
pip freeze | grep -E "(pytest|mypy|black|ruff)"

# Check Node/npm versions
node --version && npm --version

# Compare lockfile changes
git diff origin/main -- package-lock.json requirements*.txt