platform-engineering

具备平台工程的专业知识。在构建内部开发者平台、自助式基础设施、黄金路径、Backstage门户、Crossplane组合、ArgoCD配置、Tekton流水线，或实现多租户抽象时，可使用此功能。

SKILL.md

--- frontmatter

name: platform-engineering
description: >
  Platform engineering expertise. Use when building internal developer
  platforms, self-service infrastructure, golden paths, Backstage portals,
  Crossplane compositions, ArgoCD configurations, Tekton pipelines,
  or multi-tenancy abstractions.

Platform Engineering Principles

Golden Paths

•Provide opinionated defaults that cover 80% of use cases
•Make the right thing the easy thing — secure, observable, compliant by default
•Allow escape hatches for the 20% that need customisation
•Version golden paths and communicate breaking changes

Self-Service Abstractions

•Platform teams own Compositions/XRDs (Crossplane) or Helm library charts
•Application teams consume Claims/values files

•Hide infrastructure complexity behind simple interfaces:

yaml

# What the app team writes:
apiVersion: platform.example.com/v1alpha1
kind: Application
spec:
  name: my-service
  team: payments
  tier: production
  replicas: 3

•Validate inputs with CEL or webhook admission

GitOps

•ArgoCD or Flux for continuous delivery
•App-of-apps pattern for managing multiple services
•ApplicationSets for templated multi-cluster/multi-env deployment
•Separate repos: application code vs deployment manifests
•Promotion: dev → staging → production via PR-based promotion

CI/CD

•Tekton Pipelines for Kubernetes-native CI
•GitHub Actions for source-level CI (lint, test, build)
•Image builds: ko for Go, buildah for general containers
•Image signing with Sigstore/cosign
•SBOM generation with syft
•Vulnerability scanning with trivy or grype

Multi-Tenancy

•Namespace-per-team with ResourceQuotas and LimitRanges
•NetworkPolicies for namespace isolation
•RBAC: team-scoped Roles, platform-scoped ClusterRoles
•Hierarchical Namespaces (HNC) for delegation
•Capsule or vCluster for stronger isolation

Observability Stack

•Metrics: Prometheus + Thanos/Cortex for long-term storage
•Logs: OpenTelemetry Collector → Loki or CloudWatch/Azure Monitor
•Traces: OpenTelemetry SDK → Tempo or Jaeger
•Dashboards: Grafana with provisioned dashboards-as-code
•SLO monitoring: Pyrra or Sloth for SLI/SLO management

Developer Experience

•Backstage for service catalog and developer portal
•Scaffolder templates for new service creation
•TechDocs for documentation-as-code
•Score or similar for environment-agnostic workload specs