Google Cloud Dataflow SDK for Java supports the Eclipse integrated development environment (IDE) for the development of both user pipelines and the SDK itself. This is in addition to other supported development environments, such as Apache Maven.
In addition to Eclipse, you need to install the M2Eclipse plugin prior to importing projects.
We provide the Eclipse starter project for getting started with Cloud Dataflow in Eclipse for the development of user pipelines and general usage of the Cloud Dataflow SDK for Java.
Start by cloning this repository or downloading its contents to your local machine. Now, in the Eclipse IDE, choose File
menu and then select Import
. In the Import
wizard, choose Existing Projects into Workspace
inside the General
group.
In the next window, set Select root directory
to point to the location with the contents of this repository. Projects
list should automatically populate with google-cloud-dataflow-starter
project. Make sure that project is selected and choose Finish
to complete the import wizard.
You can now run the starter pipeline on your local machine. From the Run
menu, select Run
. Choose LOCAL
run configuration. When the execution finishes, among other output, the console should contain text HELLO WORLD
.
You can also run the starter pipeline on the Google Cloud Dataflow Service using managed resources in the Google Cloud Platform. Start by following the general Cloud Dataflow Getting Started instructions. You should have a Google Cloud Platform project that has a Cloud Dataflow API enabled, a Google Cloud Storage bucket that will serve as a staging location, and installed and authenticated Google Cloud SDK. Now, from the Run
menu, select Run configurations
. Choose SERVICE
run configuration inside the Java Application
group. In the arguments tab, populate values for --project
and --stagingLocation
arguments. Click Run
to start the program. When the execution finishes, among other output, the console should contain Submitted job: <job_id>
and Job finished with status DONE
statements.
At this point, you should be ready to start making changes to StarterPipeline.java
and developing your own pipeline.
You can work on the development of the Cloud Dataflow SDK itself from Eclipse.
Start by cloning this repository or downloading its contents to your local machine. Now, in the Eclipse IDE, choose File
menu and then select Import
. In the Import
wizard, choose Existing Maven Projects
inside the Maven
group. If this import source is not available, you may not have installed the M2Eclipse plugin properly.
In the next window, set Root Directory
to point to the location with the contents of this repository. Projects
list should automatically populate with several projects including: /pom.xml
, sdk/pom.xml
and examples/pom.xml
. Make sure all projects are selected and choose Finish
to complete the import wizard.
In the Package Explorer
you can now select the src/test/java
package group in one of the projects. From the Run
menu, select Run
. Choose JUnit Test
run configuration. This will execute all unit tests of the particular project locally.
At this point, you should be ready to start making changes to the Cloud Dataflow SDK for Java. Please consider sharing your improvements with the rest of the Dataflow community by posting them as pull requests in our GitHub repository.
Good luck!