AgentSkillsCN

kubernetes-expert

为Kubernetes提供专家级的协助,涵盖集群架构、工作负载管理、网络通信、存储管理、安全防护以及云端部署。当用户使用Kubernetes、kubectl、Helm、Pods、Deployments、StatefulSets、DaemonSets、Services、Ingress、ConfigMaps、Secrets、PersistentVolumes、RBAC、ServiceAccounts、NetworkPolicies、命名空间、资源配额、自动扩缩容(HPA/VPA/集群)、调度策略、污点与容忍度、亲和性、节点选择器,或任何Kubernetes资源时使用此功能。在提及EKS、GKE、AKS、kubeadm、minikube、kind、k3s、kustomize、容器编排、Pod设计模式(Sidecar、Init、Ambassador)、滚动更新、蓝绿部署、金丝雀发布、存活/就绪探针,或任何Kubernetes API组时也会触发此功能。

SKILL.md
--- frontmatter
name: kubernetes-expert
description: >
  Expert-level Kubernetes assistance covering cluster architecture, workload
  management, networking, storage, security, and cloud deployments. Use when
  the user is working with Kubernetes, kubectl, Helm, Pods, Deployments,
  StatefulSets, DaemonSets, Services, Ingress, ConfigMaps, Secrets,
  PersistentVolumes, RBAC, ServiceAccounts, NetworkPolicies, namespaces,
  resource quotas, autoscaling (HPA/VPA/cluster), scheduling, taints,
  tolerations, affinity, node selectors, or any Kubernetes resource. Also
  triggers on mentions of EKS, GKE, AKS, kubeadm, minikube, kind, k3s,
  kustomize, container orchestration, pod design patterns (sidecar, init,
  ambassador), rolling updates, blue-green deployments, canary releases,
  liveness/readiness probes, or any Kubernetes API group.

Kubernetes Expert

Architecture Overview

code
┌─────────────────────────────────────────────────────────────┐
│                       Control Plane                          │
│  ┌────────────┐ ┌──────────────┐ ┌────────────────────────┐│
│  │ kube-api-  │ │    etcd      │ │  kube-controller-      ││
│  │ server     │ │  (key-value  │ │  manager               ││
│  │            │ │   store)     │ │  (replicas, endpoints,  ││
│  │            │ │              │ │   nodes, SA, tokens)    ││
│  └─────┬──────┘ └──────────────┘ └────────────────────────┘│
│        │        ┌──────────────┐ ┌────────────────────────┐│
│        │        │ kube-        │ │  cloud-controller-     ││
│        │        │ scheduler    │ │  manager (optional)    ││
│        │        └──────────────┘ └────────────────────────┘│
└────────┼────────────────────────────────────────────────────┘
         │ API (REST + gRPC)
         ▼
┌─────────────────────────────────────────────────────────────┐
│                       Worker Nodes                           │
│  ┌──────────┐  ┌──────────────┐  ┌─────────────────┐       │
│  │ kubelet   │  │ kube-proxy   │  │ Container       │       │
│  │ (node     │  │ (Service     │  │ Runtime         │       │
│  │  agent)   │  │  networking) │  │ (containerd/    │       │
│  │           │  │              │  │  CRI-O)         │       │
│  └──────────┘  └──────────────┘  └─────────────────┘       │
│  ┌──────────────────────────────────────────────────┐       │
│  │ Pods   [container(s)] [container(s)] [...]       │       │
│  └──────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────┘

Control Plane components:

  • kube-apiserver — REST API gateway; all cluster communication goes through it
  • etcd — distributed key-value store; single source of truth for cluster state
  • kube-scheduler — assigns Pods to Nodes based on constraints, affinity, resources
  • kube-controller-manager — runs controllers (Deployment, ReplicaSet, Node, Job, etc.)
  • cloud-controller-manager — integrates with cloud provider APIs (load balancers, routes, volumes)

Node components:

  • kubelet — ensures containers in Pods are running per the PodSpec
  • kube-proxy — maintains network rules for Service abstraction (iptables/IPVS)
  • Container runtime — runs containers (containerd is the standard)

Core Resource Quick Reference

ResourceAPI GroupShort NamePurpose
Podcore/v1poSmallest deployable unit
Deploymentapps/v1deployDeclarative stateless app management
ReplicaSetapps/v1rsEnsures N pod replicas
StatefulSetapps/v1stsStateful app with stable identity
DaemonSetapps/v1dsOne pod per node
Jobbatch/v1Run-to-completion task
CronJobbatch/v1cjScheduled Jobs
Servicecore/v1svcStable network endpoint
Ingressnetworking.k8s.io/v1ingHTTP/HTTPS routing
ConfigMapcore/v1cmNon-sensitive configuration
Secretcore/v1Sensitive data (base64)
PersistentVolumeClaimcore/v1pvcRequest for storage
PersistentVolumecore/v1pvCluster storage resource
Namespacecore/v1nsVirtual cluster partition
ServiceAccountcore/v1saPod identity for API access
Role / ClusterRolerbac.authorization.k8s.io/v1Permission rules
RoleBinding / ClusterRoleBindingrbac.authorization.k8s.io/v1Bind roles to subjects
NetworkPolicynetworking.k8s.io/v1netpolPod-level firewall rules
HorizontalPodAutoscalerautoscaling/v2hpaScale pods by metrics

Essential kubectl Commands

bash
# Cluster info
kubectl cluster-info
kubectl get nodes -o wide
kubectl api-resources                    # list all resource types

# Common operations
kubectl get <resource> -n <ns>           # list resources
kubectl describe <resource> <name>       # detailed info
kubectl logs <pod> [-c container]        # container logs
kubectl logs <pod> --previous            # logs from crashed container
kubectl exec -it <pod> -- /bin/sh        # shell into container
kubectl port-forward <pod> 8080:80       # local port forward

# Apply and manage
kubectl apply -f manifest.yaml           # declarative create/update
kubectl delete -f manifest.yaml          # delete from manifest
kubectl diff -f manifest.yaml            # preview changes

# Debugging
kubectl get events --sort-by='.lastTimestamp'
kubectl top pods / kubectl top nodes     # resource usage (metrics-server)
kubectl run debug --image=busybox -it --rm -- sh  # ephemeral debug pod

# Context management
kubectl config get-contexts
kubectl config use-context <name>
kubectl config set-context --current --namespace=<ns>

Pod Lifecycle

code
Pending → ContainerCreating → Running → Succeeded/Failed
                                  ↓
                            Terminating → Terminated

Restart policies: Always (default for Deployments), OnFailure (Jobs), Never

Probes:

ProbePurposeFailure Action
startupProbeApp finished starting?Kill + restart container
livenessProbeApp still healthy?Kill + restart container
readinessProbeReady to receive traffic?Remove from Service endpoints
yaml
# Probe example
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5

Service Types

TypeDescriptionUse Case
ClusterIPInternal only (default)Service-to-service
NodePortExpose on each node's IP:port (30000-32767)Dev/testing
LoadBalancerCloud provider external LBProduction external access
ExternalNameCNAME to external DNSExternal service alias
Headless (ClusterIP: None)No cluster IP, returns pod IPsStatefulSet discovery

Resource Requests and Limits

yaml
resources:
  requests:          # scheduling guarantee (reserved)
    cpu: "250m"      # 0.25 CPU cores
    memory: "128Mi"
  limits:            # hard ceiling
    cpu: "500m"      # throttled if exceeded
    memory: "256Mi"  # OOMKilled if exceeded

CPU: 1 CPU = 1000m (millicores). Requests affect scheduling; limits cause throttling. Memory: Requests affect scheduling; exceeding limits triggers OOMKill.

Reference Documents

Load these as needed based on the specific topic:

TopicFileWhen to read
Introductionreferences/introduction.mdKubernetes history, container orchestration concepts, monoliths vs microservices, why Kubernetes (Ch 1)
Architecturereferences/architecture.mdControl plane internals, etcd, API server, scheduler, controller manager, kubelet, kube-proxy, container runtime, pod creation flow (Ch 2)
Cluster Setupreferences/cluster-setup.mdInstalling clusters with minikube, kind, kubeadm, k3s; kubectl configuration, context management (Ch 3)
Podsreferences/pods.mdPod spec, multi-container patterns (sidecar, init, ambassador), pod lifecycle, resource limits, probes, restart policies (Ch 4-5)
Namespacesreferences/namespaces.mdNamespace management, ResourceQuotas, LimitRanges, cross-namespace communication, multi-tenancy (Ch 6)
Configurationreferences/configuration.mdConfigMaps, Secrets, environment variables, volume mounts, immutable configs, secret types, external secret management (Ch 7)
Servicesreferences/services.mdClusterIP, NodePort, LoadBalancer, ExternalName, headless services, service discovery, DNS, endpoints, session affinity (Ch 8)
Storagereferences/storage.mdVolumes, PersistentVolumes, PersistentVolumeClaims, StorageClasses, dynamic provisioning, access modes, volume types (Ch 9)
Workloadsreferences/workloads.mdReplicaSets, Jobs, CronJobs, production workload patterns, labels, selectors, annotations (Ch 10)
Deploymentsreferences/deployments.mdDeployment strategies (rolling update, recreate), rollbacks, scaling, revision history, canary patterns, blue-green (Ch 11)
StatefulSetsreferences/statefulsets.mdStable network identity, ordered deployment/scaling, volumeClaimTemplates, headless service pairing, update strategies (Ch 12)
DaemonSetsreferences/daemonsets.mdPer-node pod scheduling, node selectors, tolerations, update strategies, use cases (logging, monitoring, networking) (Ch 13)
Helm & Operatorsreferences/helm.mdHelm chart structure, templates, values, repositories, install/upgrade/rollback, creating charts, Kubernetes Operators, Operator SDK (Ch 14)
Cloud Providersreferences/cloud-providers.mdGKE, EKS, AKS setup and management, managed vs self-managed node pools, cloud-specific features, IAM integration (Ch 15-17)
Securityreferences/auth.mdAuthentication, RBAC, Roles, ClusterRoles, RoleBindings, ServiceAccounts, NetworkPolicies, PodSecurityStandards, security contexts (Ch 18)
Schedulingreferences/scheduling.mdNode selectors, affinity/anti-affinity, taints and tolerations, topology spread, pod priority and preemption, custom schedulers (Ch 19)
Autoscalingreferences/autoscaling.mdHPA, VPA, cluster autoscaler, metrics-server, custom metrics, scaling policies, behavior configuration (Ch 20)
Traffic & Advancedreferences/ingress.mdIngress controllers, Ingress resources, TLS, Gateway API, traffic management, multi-cluster strategies, service mesh concepts (Ch 21)