blob: 809c1bef9dd1a16e6fb0d9ce5cc267dae4cfa9f0 [file] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Airflow Java SDK
A **JVM** SDK for Apache Airflow. You can use any JVM-compatible language to
write workflow bundles, and have Airflow consume the result.
The SDK and execution-time logic is implemented in Kotlin.
An example is bundled showing how the SDK can be used in Java.
## Building the SDK
```bash
./gradlew build
```
## Building documentation
```bash
./gradlew dokkaGenerate
```
This uses [Dokka](https://kotl.in/dokka) to build documentation of the Java SDK.
This generates both an HTML representation and Javadoc.
## Running the example
The SDK projects must first built and published:
```bash
./gradlew publishToMavenLocal -PskipSigning=true
```
After the build is successful, you should be able to see directories in `~/.m2/repository/org/apache/airflow/`.
Now `cd example` into the example project, and
* Put the [DAG with stub tasks](./example/src/resources/dags) to somewhere Airflow can find.
* Ensure the `java` command is available in the same environment the Airflow
task worker is in.
* Package the example to `./example/build/bundle`
```bash
# We're now in the 'example' directory, so gradlew is in parent.
../gradlew bundle
```
* Configure Airflow to route tasks in the *java* queue to be run with Java:
```bash
export AIRFLOW__SDK__COORDINATORS='{
"java": {
"classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
"kwargs": {"jars_root": ["/opt/airflow/java-sdk/example/build/bundle"]}
}
}'
export AIRFLOW__SDK__QUEUE_TO_COORDINATOR='{"java": "java"}'
```
* Ensure the Connection and Variable needed by the example DAG are available:
```bash
export AIRFLOW_CONN_TEST_HTTP='{
"conn_type": "http",
"login": "user",
"password": "pass",
"host": "example.com",
"port": 1234,
"extra": {"param1": "val1", "param2": "val2"}
}'
export AIRFLOW_VAR_MY_VARIABLE=123
```
## Publishing
The SDK is published to Maven Central via the
[ASF Nexus staging repository](https://repository.apache.org).
The full release process follows the
[ASF Maven publishing guide](https://infra.apache.org/publishing-maven-artifacts.html).
### Prerequisites
* An ASF committer account with access to
[repository.apache.org](https://repository.apache.org).
* A GPG key that has been
[added to the project KEYS file](https://infra.apache.org/release-signing.html)
and uploaded to a public key server.
### Bump the version
Edit `gradle.properties` and set the version for this release:
```properties
projectVersion=1.0.0
```
Commit the change and push it to the release branch.
### Verify the POM locally
Before touching any remote repository, publish to your local Maven cache and
inspect the generated POM:
```bash
rm -rf ~/.m2/repository/org/apache/airflow/ # Start clean.
./gradlew publishToMavenLocal -PskipSigning=true
# The airflow-sdk runtime.
less ~/.m2/repository/org/apache/airflow/airflow-sdk/*/airflow-sdk-*.pom
# The bill of materials of airflow-sdk
less ~/.m2/repository/org/apache/airflow/airflow-sdk-bom/*/*.pom
# The annotation processor for the builder pattern.
less ~/.m2/repository/org/apache/airflow/airflow-sdk-processor/*/airflow-sdk-*.pom
# The Gradle plugin for bundling.
less ~/.m2/repository/org/apache/airflow/airflow-sdk-gradle-plugin/*/airflow-sdk-*.pom
# The Gradle plugin's registration.
less ~/.m2/repository/org/apache/airflow/sdk/org.apache.airflow.sdk.gradle.plugin/*/*.pom
```
Check that the coordinates, description, license, SCM, and organization fields
look correct.
### Dry-run against a local repository
To test the full publish flow without touching ASF infrastructure, override the
repository URL to a local directory
```bash
rm -rf /tmp/local-maven-repo # Start clean.
./gradlew publish -PmavenUrl=file:///tmp/local-maven-repo -PskipSigning=true
ls /tmp/local-maven-repo/org/apache/airflow/
# This should contain the same components in ~/.m2 as inspected in the previous step.
```
*NOTE:* Signing is not required since nothing goes to Maven Central. If you want
to test signing, set the GPG private key and passphrase as described in the next
section, and remove `-PskipSigning=true` from the above command.
### Publish to ASF Nexus staging
Store the credentials in `~/.gradle/gradle.properties` so they are not exposed
in your shell history:
```properties
mavenUsername=your-asf-nexus-token-username
mavenPassword=your-asf-nexus-token-password
signing.password=your-gpg-key-passphrase
```
Then run the publish task.
```bash
./gradlew publish -P"signing.key=$(gpg --armor --export-secret-keys your-gpg-key-fingerprint)"
```
*NOTE:* The signing key is supplied through the command line since it contains
newlines, which does not work well in a Gradle properties file.
*NOTE:* You can also use the following environment variables to provide the
credentials instead: `ASF_NEXUS_USERNAME`, `ASF_NEXUS_PASSWORD`, `SIGNING_KEY`,
and `SIGNING_PASSWORD`. This is especially useful on e.g. CI.
### Verify the upload
Verify all artifacts have been released correctly to the
[ASF Nexus server](https://repository.apache.org/#nexus-search;quick~org.apache.airflow).
Check *Updated by* (should be your ID), *Uploaded Date*, and *Last Modified*.
## Technical Details
The user uses the SDK to implement a Java application that implements task
methods, and metadata on which DAG and task each method should be used
for.
When the Airflow Supervisor identifies a task should be run with Java, it
launches the Java application as a subprocess. The Java application accepts
flags `--comm` and `--logs` from the command line to identify TCP sockets it
should connect to, and communicates with the Supervisor through these channels
during execution.
1. On connection, the Supervisor immediately sends a StartupDetails message
through the comm socket.
2. The Java application finds and executes the relevant method.
3. During execution, the Java application uses the comm socket to retrieve
information (e.g. Variable) from, and send data (e.g. XCom) to Airflow.
4. The Java application informs the comm socket to tell the Supervisor the
task's terminal state.
5. The Java application exits.
During the Java application's lifetime, it also sends log messages generated by
the SDK (not user code) through the logs socket, so the Supervisor can append
them to Airflow logs.
Communication uses the same formats as the Python-based processes.
See [Architectural Design Records](./adr) in the `adr` directory to learn more.