AgentSkillsCN

ci-pipeline-operations

适用于调试 CI 失败、理解构建流水线、修改 GitHub Actions 工作流、处理工件缓存,或排查为何本地构建成功却在 CI 中失败时使用。

SKILL.md
--- frontmatter
name: ci-pipeline-operations
description: Use when debugging CI failures, understanding the build pipeline, modifying the GitHub Actions workflow, working with artifact caching, or troubleshooting why a build succeeded locally but fails in CI

CI Pipeline Operations

Overview

The CI pipeline (.github/workflows/build-egg.yml) builds the Bluefin OCI image inside the bst2 container on GitHub Actions, validates it with bootc container lint, and pushes to GHCR on main. Caching uses a two-tier architecture: GNOME upstream CAS (read-only) + project R2 cache (read-write via bazel-remote proxy).

Quick Reference

WhatValue
Workflow file.github/workflows/build-egg.yml
Runnerubuntu-24.04
Build targetoci/bluefin.bst
Build timeout120 minutes
bst2 containerregistry.gitlab.com/.../bst2:<sha> (pinned in workflow env.BST2_IMAGE)
GNOME CAS endpointgbm.gnome.org:11003 (gRPC, read-only)
R2 cache proxy (gRPC)localhost:9092
R2 cache proxy (HTTP status)localhost:8080
R2 bucketbst-cache
Published imageghcr.io/projectbluefin/egg:latest and :$SHA
Build logs artifactbuildstream-logs (7-day retention)
Cache proxy logs artifactbazel-remote-logs (7-day retention)

Workflow Steps

#StepWhat it doesNotes
1Free disk spaceRemoves pre-installed SDKsCritical -- builds need >50 GB; runner starts with ~30 GB free
2CheckoutClones the repoStandard
3Pull bst2 imagepodman pull of the pinned bst2 containerSame image as GNOME upstream CI
4Cache BST sourcesactions/cache for ~/.cache/buildstream/sourcesKey: hash of elements/**/*.bst + project.conf
5Disk space beforedf -h /Diagnostic
6Prepare cache dirmkdir -p ~/.cache/buildstream/sourcesEnsures cache restore has a target
7Start cache proxyDownloads + verifies bazel-remote v2.6.1, starts as background daemonSkips if R2 secrets are missing
8Generate BST configWrites buildstream-ci.conf with CI-tuned settingsAdds R2 remote only if proxy is running
9Seed R2 from upstreamPulls artifacts from GNOME CAS, pushes to R2Non-fatal; accumulates upstream artifacts in R2 over time
10Buildbst build oci/bluefin.bst inside bst2 container--privileged --device /dev/fuse, --network=host only if proxy running
11Push artifacts to R2bst artifact push --deps allNon-fatal safety net; ensures all artifacts reach R2
12Cache proxy statsLogs proxy status + last 50 lines of proxy logDiagnostic
13Disk space afterdf -h /Diagnostic
14Export OCI imagebst artifact checkout --tar - | podman loadStreams directly, no intermediate tar file on disk
15Verify image loadedpodman imagesDiagnostic
16bootc lintbootc container lint on exported imageValidates ostree structure, no /usr/etc, valid bootc metadata
17Upload build logsactions/upload-artifactAlways runs, even on failure
18Upload proxy logsactions/upload-artifactAlways runs
19Stop cache proxyKills bazel-remote processAlways runs
20Login to GHCRpodman login with GITHUB_TOKENMain only
21Tag for GHCRTags as :latest and :$SHAMain only
22Push to GHCRpodman push --retry 3 both tagsMain only

CI BuildStream Config

Generated as buildstream-ci.conf at step 8. Values and rationale:

SettingValueWhy
on-errorcontinueFind ALL failures in one run, not just the first
fetchers12Parallel downloads from artifact caches
builders1GHA has 4 vCPUs; conservative to avoid OOM
network-retries3Retry transient network failures
retry-failedTrueAuto-retry flaky builds
error-lines80Generous error context in logs
cache-buildtreesneverSave disk; only final artifacts matter
max-jobs0Let BuildStream auto-detect (uses nproc)

Caching Architecture

Three layers, checked in order:

code
1. Local CAS (~/.cache/buildstream/)
   |-- miss -->
2. R2 cache (grpc://localhost:9092 -> Cloudflare R2)
   |-- miss -->
3. GNOME upstream CAS (https://gbm.gnome.org:11003)
   |-- miss -->
4. Build from source

Layer Details

LayerConfigured inReadWriteContains
Local CASAutomaticAlwaysAlwaysEverything built/fetched this run
R2 cachebuildstream-ci.conf (added dynamically)When proxy runningWhen proxy runningBluefin-specific + seeded upstream artifacts
GNOME upstreamproject.conf artifacts: sectionAlwaysNeverfreedesktop-sdk + gnome-build-meta artifacts
Source cacheproject.conf source-caches: + actions/cacheAlwaysAlways (local)Upstream tarballs, git repos

bazel-remote Bridge

BuildStream speaks gRPC CAS. Cloudflare R2 speaks S3. bazel-remote v2.6.1 bridges them.

SettingValue
BinaryDownloaded from GitHub releases, SHA256-verified
gRPC port9092 (env: CACHE_GRPC_PORT)
HTTP port8080 (env: CACHE_HTTP_PORT)
Local disk cache/tmp/bazel-remote-cache (5 GB max)
S3 prefixcas
Health checkcurl http://localhost:8080/status (30s timeout)

The type: storage Trap

The R2 remote in buildstream-ci.conf MUST include type: storage:

yaml
artifacts:
  servers:
    - url: "grpc://localhost:9092"
      type: storage
      push: true

Without type: storage, BuildStream silently ignores the remote entirely. bazel-remote only implements CAS (Content Addressable Storage), not the Remote Asset API. The type: storage flag tells BuildStream to use pure CAS protocol.

PR vs Main Differences

BehaviorPRMain push
Build runs?YesYes
bootc lint?YesYes
R2 cache readYes (if secrets available)Yes
R2 cache writeYes (if secrets available)Yes
Fork PR gets R2 secrets?No -- GitHub doesn't expose secrets to forksN/A
Push to GHCR?NoYes
ConcurrencyGrouped by branch; new pushes cancel stale runsGrouped by SHA; every push runs

Secrets and Permissions

SecretRequired?Purpose
R2_ACCESS_KEYOptionalCloudflare R2 access key ID
R2_SECRET_KEYOptionalCloudflare R2 secret access key
R2_ENDPOINTOptionalR2 S3-compatible endpoint (https://<ACCOUNT_ID>.r2.cloudflarestorage.com)
GITHUB_TOKENAuto-providedGHCR login (main branch push only)

All R2 secrets are optional. If missing, the cache proxy is skipped and the build proceeds using only GNOME upstream CAS + local CAS. The build works without R2 -- it just takes longer.

Job permissions: contents: read, packages: write.

bst2 Container Configuration

The bst2 container runs via podman run (NOT as a GitHub Actions container:), because the disk-space-reclamation step needs host filesystem access.

FlagWhy
--privilegedRequired for bubblewrap sandboxing inside BuildStream
--device /dev/fuseRequired for buildbox-fuse (ext4 on GHA lacks reflinks)
--network=hostOnly when cache proxy is running; lets container reach localhost:9092
-v workspace:/src:rwMount repo into container
-v ~/.cache/buildstream:...:rwPersist CAS across steps
ulimit -n 1048576buildbox-casd needs many file descriptors
--no-interactivePrevents blocking on prompts in CI

Debugging CI Failures

Where to Find Logs

LogLocationContents
Build logbuildstream-logs artifact -> logs/build.logFull BuildStream build output
Cache proxy logbazel-remote-logs artifact -> bazel-remote.logR2 cache hits/misses, S3 errors
Workflow logGitHub Actions UI -> step outputEach step's stdout/stderr
Disk usage"Disk space before/after build" stepsdf -h / snapshots

Common Failures

SymptomLikely causeFix
Build OOM or hangsToo many parallel buildersbuilders is already 1; check if element's own build is too memory-heavy
"No space left on device"BuildStream CAS fills diskVerify disk reclamation step ran; check cache-buildtrees: never is set
Cache proxy failed to startR2 secrets misconfigured or endpoint unreachableCheck bazel-remote-logs; verify secrets in repo settings
bootc container lint failsImage has /usr/etc, missing ostree refs, or invalid metadataCheck oci/bluefin.bst assembly script; ensure /usr/etc merge runs
Build succeeds locally, fails in CIDifferent element versions cached, or network-dependent sourcesCompare bst show output locally vs CI; check if GNOME CAS has stale artifacts
Remote silently ignoredMissing type: storage on R2 remoteEnsure buildstream-ci.conf includes type: storage
GHCR push failsToken permissions or rate limitingCheck packages: write permission; --retry 3 handles transient failures
Source fetch timeoutGNOME CAS or upstream source unreachablenetwork-retries: 3 handles transient issues; check GNOME infra status
Seed step failsNormal -- non-fatal by designcontinue-on-error: true; check proxy logs if persistent

Debugging Workflow

  1. Download artifacts: Get buildstream-logs and bazel-remote-logs from the failed run
  2. Check disk space: Look at before/after disk space steps -- OOM and disk full are the most common issues
  3. Search build log: Look for [FAILURE] lines in logs/build.log; on-error: continue means all failures are collected
  4. Check cache hits: In bazel-remote.log, look for cache hit ratio; low hits mean long builds
  5. Reproduce locally: just bst build oci/bluefin.bst uses the same bst2 container

Cross-References

SkillWhen
local-e2e-testingReproducing CI issues locally
oci-layer-compositionUnderstanding what the build produces
debugging-bst-build-failuresDiagnosing individual element build failures
buildstream-element-referenceWriting or modifying .bst elements