AgentSkillsCN

flux-troubleshooting

适用于当 Flux 资源显示“Ready False”,日志中出现协调错误,部署无法从 Git 同步,HelmRelease 安装失败,源代码制品未能获取,或镜像自动化未能更新标签时使用。

SKILL.md
--- frontmatter
name: flux-troubleshooting
description: Use when Flux resources show Ready False, reconciliation errors appear in logs, deployments fail to sync from Git, HelmRelease installations fail, source artifacts are not being fetched, or image automation is not updating tags

Flux CD Troubleshooting

Diagnose and resolve Flux CD reconciliation failures and errors in GitOps environments.

Keywords

flux, fluxcd, troubleshooting, debug, error, failed, failure, reconciliation, kustomization, helmrelease, source, gitrepository, ocirepository, artifact, health check, diagnose, resolve, fix

When to Use This Skill

  • Flux resources show Ready: False status
  • Reconciliation errors appear in logs
  • Deployments are not syncing from Git
  • HelmRelease installations are failing
  • Source artifacts are not being fetched
  • Image automation is not updating tags

Related Skills

Quick Reference

TaskCommand
Check Flux healthflux check
View all errorsflux logs -A --level=error
Get all statusflux get all -A
Warning eventskubectl get events -n flux-system --field-selector type=Warning

Diagnostic Workflow

code
Is the resource Ready?
├─ Yes → Check if correct version/revision deployed
└─ No → Is the source Ready?
    ├─ Yes → Check Kustomization/HelmRelease logs
    │   ├─ dry-run failed → Fix YAML syntax in Git
    │   ├─ health check timeout → Check pod logs/resources
    │   └─ dependency not ready → Fix parent first
    └─ No → Check source credentials/connectivity
        ├─ authentication failed → Update deploy key/secret
        ├─ checkout failed → Verify branch/tag exists
        └─ artifact not found → Check repository URL

Diagnostic Commands

1. Check Controller Health

bash
flux check

2. View Error Logs

bash
flux logs --all-namespaces --level=error

3. Get Warning Events

bash
kubectl get events -n flux-system --field-selector type=Warning

4. Inspect Resource Status

bash
flux get kustomizations -A
flux get helmreleases -A
flux get sources all -A

5. Controller-Specific Logs

bash
kubectl logs -n flux-system deploy/source-controller
kubectl logs -n flux-system deploy/kustomize-controller
kubectl logs -n flux-system deploy/helm-controller
kubectl logs -n flux-system deploy/notification-controller
kubectl logs -n flux-system deploy/image-reflector-controller
kubectl logs -n flux-system deploy/image-automation-controller

Error Pattern Reference

Error PatternCauseGitOps Solution
failed to checkoutGit authentication or networkVerify deploy keys/credentials
dry-run failed: InvalidInvalid manifest YAMLFix syntax in Git repository
health check timeoutPods not becoming readyCheck resource limits, images
dependency not readyParent Kustomization failedFix upstream dependency first
artifact not foundSource not syncedCheck source status, reconcile
Unsupported valueInvalid field valueCorrect the value in Git
UNAUTHORIZEDRegistry auth failedCheck imagePullSecrets
MANIFEST_UNKNOWNOCI tag doesn't existVerify tag in registry

OCI Repository Troubleshooting

Common OCI Issues

bash
flux get sources oci -A
kubectl logs -n flux-system deploy/source-controller | grep -i oci
kubectl get secret -n flux-system flux-system -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

OCI Error Patterns

ErrorCauseFix
UNAUTHORIZEDInvalid credentialsUpdate docker-registry secret
MANIFEST_UNKNOWNTag not foundCheck tag exists in registry
DENIEDPermission deniedCheck registry permissions
NAME_UNKNOWNRepository not foundVerify repository path

Image Automation Troubleshooting

Check Image Policies

bash
flux get images policy -A
flux get images repository -A
flux get images update -A

Image Automation Errors

ErrorCauseFix
no new images foundFilter too restrictiveAdjust semver/regex pattern
access deniedRegistry authCheck image pull secrets
failed to pushGit write accessCheck deploy key has write permission

Webhook/Notification Debugging

bash
kubectl logs -n flux-system deploy/notification-controller
flux get alerts -A
flux get alert-providers -A
kubectl logs -n flux-system deploy/notification-controller | grep -i receiver

Structured Log Analysis

json
{
  "level": "error",
  "ts": "2024-01-15T09:36:41.286Z",
  "controllerGroup": "kustomize.toolkit.fluxcd.io",
  "controllerKind": "Kustomization",
  "name": "resource-name",
  "namespace": "namespace",
  "msg": "Reconciliation failed after 2s, next try in 5m0s",
  "revision": "main@sha1:abc123",
  "error": "specific error message"
}

GitOps Principles for Troubleshooting

  1. Never fix directly in cluster - Identify root cause, fix in Git
  2. Suspend before debugging - Prevent Flux from reverting test changes
  3. Check the full dependency chain - Infrastructure before apps
  4. Verify source availability - Git repos and registries must be accessible
  5. Review recent commits - Issues often correlate with recent changes

MCP Tools Available

When the Flux MCP server is connected:

  • mcp__flux-operator-mcp__get_flux_instance - Get Flux installation status
  • mcp__flux-operator-mcp__get_kubernetes_logs - Get pod logs
  • mcp__flux-operator-mcp__get_kubernetes_resources - Query resources
  • mcp__flux-operator-mcp__search_flux_docs - Search Flux documentation