AgentSkillsCN

java-development

指导您在 Apache Beam 中开发 Java SDK,包括构建、测试、运行示例,以及理解项目结构。当您需要在 sdks/java/、runners/ 或 examples/java/ 中处理 Java 代码时,可使用此功能。

SKILL.md
--- frontmatter
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

name: java-development
description: Guides Java SDK development in Apache Beam, including building, testing, running examples, and understanding the project structure. Use when working with Java code in sdks/java/, runners/, or examples/java/.

Java Development in Apache Beam

Project Structure

Key Directories

  • sdks/java/core - Core Java SDK (PCollection, PTransform, Pipeline)
  • sdks/java/harness - SDK harness (container entrypoint)
  • sdks/java/io/ - I/O connectors (51+ connectors including BigQuery, Kafka, JDBC, etc.)
  • sdks/java/extensions/ - Extensions (SQL, ML, protobuf, etc.)
  • runners/ - Runner implementations:
    • runners/direct-java - Direct Runner (local execution)
    • runners/flink/ - Flink Runner
    • runners/spark/ - Spark Runner
    • runners/google-cloud-dataflow-java/ - Dataflow Runner
  • examples/java/ - Java examples including WordCount

Build System

Apache Beam uses Gradle with a custom BeamModulePlugin. Every Java project's build.gradle starts with:

groovy
apply plugin: 'org.apache.beam.module'
applyJavaNature( ... )

Common Commands

Build Commands

bash
# Compile a specific project
./gradlew -p sdks/java/core compileJava

# Build a project (compile + tests)
./gradlew :sdks:java:harness:build

# Run WordCount example
./gradlew :examples:java:wordCount

Running Unit Tests

bash
# Run all tests in a project
./gradlew :sdks:java:harness:test

# Run a specific test class
./gradlew :sdks:java:harness:test --tests org.apache.beam.fn.harness.CachesTest

# Run tests matching a pattern
./gradlew :sdks:java:harness:test --tests *CachesTest

# Run a specific test method
./gradlew :sdks:java:harness:test --tests *CachesTest.testClearableCache

Running Integration Tests

Integration tests have filenames ending in IT.java and use TestPipeline.

bash
# Run I/O integration tests on Direct Runner
./gradlew :sdks:java:io:google-cloud-platform:integrationTest

# Run with custom GCP project
./gradlew :sdks:java:io:google-cloud-platform:integrationTest \
  -PgcpProject=<project> -PgcpTempRoot=gs://<bucket>/path

# Run on Dataflow Runner
./gradlew :runners:google-cloud-dataflow-java:examplesJavaRunnerV2IntegrationTest \
  -PdisableSpotlessCheck=true -PdisableCheckStyle=true -PskipCheckerFramework \
  -PgcpProject=<project> -PgcpRegion=us-central1 -PgcsTempRoot=gs://<bucket>/tmp

Code Formatting

bash
# Format Java code
./gradlew spotlessApply

Writing Integration Tests

java
@Rule public TestPipeline pipeline = TestPipeline.create();

@Test
public void testSomething() {
  pipeline.apply(...);
  pipeline.run().waitUntilFinish();
}

Set pipeline options via -DbeamTestPipelineOptions='[...]':

bash
-DbeamTestPipelineOptions='["--runner=TestDataflowRunner","--project=myproject","--region=us-central1","--stagingLocation=gs://bucket/path"]'

Using Modified Beam Code

Publish to Maven Local

bash
# Publish a specific module
./gradlew -Ppublishing -p sdks/java/io/kafka publishToMavenLocal

# Publish all modules
./gradlew -Ppublishing publishToMavenLocal

Building SDK Container

bash
# Build Java SDK container (for Runner v2)
./gradlew :sdks:java:container:java11:docker

# Tag and push
docker tag apache/beam_java11_sdk:2.XX.0.dev \
  "us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"
docker push "us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"

Building Dataflow Worker Jar

bash
./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar

Test Naming Conventions

  • Unit tests: *Test.java
  • Integration tests: *IT.java

JUnit Report Location

After running tests, find HTML reports at: <project>/build/reports/tests/test/index.html

IDE Setup (IntelliJ)

  1. Open /beam (the repository root, NOT sdks/java)
  2. Wait for indexing to complete
  3. Find examples/java/build.gradle and click Run next to wordCount task to verify setup