CI/CD in Apache Beam
Overview
Apache Beam uses GitHub Actions for CI/CD. Workflows are located in .github/workflows/.
Workflow Types
PreCommit Workflows
- •Run on PRs and merges
- •Validate code changes before merge
- •Naming:
beam_PreCommit_*.yml
PostCommit Workflows
- •Run after merge and on schedule
- •More comprehensive testing
- •Naming:
beam_PostCommit_*.yml
Scheduled Workflows
- •Run nightly on master
- •Check for external dependency impacts
- •Tag master with
nightly-master
Key Workflows
PreCommit
| Workflow | Description |
|---|---|
beam_PreCommit_Java.yml | Java build and tests |
beam_PreCommit_Python.yml | Python tests |
beam_PreCommit_Go.yml | Go tests |
beam_PreCommit_RAT.yml | License header checks |
beam_PreCommit_Spotless.yml | Code formatting |
PostCommit - Java
| Workflow | Description |
|---|---|
beam_PostCommit_Java.yml | Full Java test suite |
beam_PostCommit_Java_ValidatesRunner_*.yml | Runner validation tests |
beam_PostCommit_Java_Examples_*.yml | Example pipeline tests |
PostCommit - Python
| Workflow | Description |
|---|---|
beam_PostCommit_Python.yml | Full Python test suite |
beam_PostCommit_Python_ValidatesRunner_*.yml | Runner validation |
beam_PostCommit_Python_Examples_*.yml | Examples |
Load & Performance Tests
| Workflow | Description |
|---|---|
beam_LoadTests_*.yml | Load testing |
beam_PerformanceTests_*.yml | I/O performance |
Triggering Tests
Automatic
- •PRs trigger PreCommit tests
- •Merges trigger PostCommit tests
Triggering Specific Workflows
Use trigger files to run specific workflows.
Workflow Dispatch
Most workflows support manual triggering via GitHub UI.
Understanding Test Results
Finding Logs
- •Go to PR → Checks tab
- •Click on failed workflow
- •Expand failed job
- •View step logs
Common Failure Patterns
Flaky Tests
- •Random failures unrelated to change
- •Solution: Use trigger files to re-run the specific workflow.
Timeout
- •Increase timeout in workflow if justified
- •Or optimize test
Resource Exhaustion
- •GCP quota issues
- •Check project settings
GCP Credentials
Workflows requiring GCP access use these secrets:
- •
GCP_PROJECT_ID- Project ID (e.g.,apache-beam-testing) - •
GCP_REGION- Region (e.g.,us-central1) - •
GCP_TESTING_BUCKET- Temp storage bucket - •
GCP_PYTHON_WHEELS_BUCKET- Python wheels bucket - •
GCP_SA_EMAIL- Service account email - •
GCP_SA_KEY- Base64-encoded service account key
Required IAM roles:
- •Storage Admin
- •Dataflow Admin
- •Artifact Registry Writer
- •BigQuery Data Editor
- •Service Account User
Self-hosted vs GitHub-hosted Runners
Self-hosted (majority of workflows)
- •Pre-configured with dependencies
- •GCP credentials pre-configured
- •Naming:
beam_*.yml
GitHub-hosted
- •Used for cross-platform testing (Linux, macOS, Windows)
- •May need explicit credential setup
Workflow Structure
yaml
name: Workflow Name
on:
push:
branches: [master]
pull_request:
branches: [master]
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
jobs:
build:
runs-on: [self-hosted, ...]
steps:
- uses: actions/checkout@v4
- name: Run Gradle
run: ./gradlew :task:name
Local Debugging
Run Same Commands as CI
Check workflow file's run commands:
bash
./gradlew :sdks:java:core:test ./gradlew :sdks:python:test
Common Issues
- •Clean gradle cache:
rm -rf ~/.gradle .gradle - •Remove build directory:
rm -rf build - •Check Java version matches CI
Snapshot Builds
Locations
- •Java SDK: https://repository.apache.org/content/groups/snapshots/org/apache/beam/
- •SDK Containers: https://gcr.io/apache-beam-testing/beam-sdk
- •Portable Runners: https://gcr.io/apache-beam-testing/beam_portability
- •Python SDK: gs://beam-python-nightly-snapshots
Release Workflows
| Workflow | Purpose |
|---|---|
cut_release_branch.yml | Create release branch |
build_release_candidate.yml | Build RC |
finalize_release.yml | Finalize release |
publish_github_release_notes.yml | Publish notes |