blob: 7665a7552caab63ed1f771baaaecb9407233fe96 [file] [log] [blame] [view]
---
title: "Sample Project using the Java API"
nav-title: Sample Project in Java
nav-parent_id: start
nav-pos: 0
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
* This will be replaced by the TOC
{:toc}
Start working on your Flink Java program in a few simple steps.
## Requirements
The only requirements are working __Maven 3.0.4__ (or higher) and __Java 8.x__ (or higher) installations.
## Create Project
Use one of the following commands to __create a project__:
<ul class="nav nav-tabs" style="border-bottom: none;">
<li class="active"><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li>
<li><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="maven-archetype">
{% highlight bash %}
$ mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \{% unless site.is_stable %}
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %}
-DarchetypeVersion={{site.version}}
{% endhighlight %}
This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name.
</div>
<div class="tab-pane" id="quickstart-script">
{% highlight bash %}
{% if site.is_stable %}
$ curl https://flink.apache.org/q/quickstart.sh | bash
{% else %}
$ curl https://flink.apache.org/q/quickstart-SNAPSHOT.sh | bash
{% endif %}
{% endhighlight %}
</div>
{% unless site.is_stable %}
<p style="border-radius: 5px; padding: 5px" class="bg-danger">
<b>Note</b>: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the commandline. If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For details about this change, please refer to <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven official document</a>
</p>
{% endunless %}
</div>
## Inspect Project
There will be a new directory in your working directory. If you've used
the _curl_ approach, the directory is called `quickstart`. Otherwise,
it has the name of your `artifactId`:
{% highlight bash %}
$ tree quickstart/
quickstart/
├── pom.xml
└── src
└── main
├── java
│   └── org
│   └── myorg
│   └── quickstart
│   ├── BatchJob.java
│   ├── SocketTextStreamWordCount.java
│   ├── StreamingJob.java
│   └── WordCount.java
└── resources
└── log4j.properties
{% endhighlight %}
The sample project is a __Maven project__, which contains four classes. _StreamingJob_ and _BatchJob_ are basic skeleton programs, _SocketTextStreamWordCount_ is a working streaming example and _WordCountJob_ is a working batch example. Please note that the _main_ method of all classes allow you to start Flink in a development/testing mode.
We recommend you __import this project into your IDE__ to develop and
test it. If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/)
allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import).
Some Eclipse bundles include that plugin by default, others require you
to install it manually. The IntelliJ IDE supports Maven projects out of
the box.
*A note to Mac OS X users*: The default JVM heapsize for Java is too
small for Flink. You have to manually increase it. In Eclipse, choose
`Run Configurations -> Arguments` and write into the `VM Arguments`
box: `-Xmx800m`.
## Build Project
If you want to __build your project__, go to your project directory and
issue the `mvn clean install -Pbuild-jar` command. You will
__find a jar__ that runs on every Flink cluster with a compatible
version, __target/original-your-artifact-id-your-version.jar__. There
is also a fat-jar in __target/your-artifact-id-your-version.jar__ which,
additionally, contains all dependencies that were added to the Maven
project.
## Next Steps
Write your application!
The quickstart project contains a `WordCount` implementation, the
"Hello World" of Big Data processing systems. The goal of `WordCount`
is to determine the frequencies of words in a text, e.g., how often do
the terms "the" or "house" occur in all Wikipedia texts.
__Sample Input__:
~~~bash
big data is big
~~~
__Sample Output__:
~~~bash
big 2
data 1
is 1
~~~
The following code shows the `WordCount` implementation from the
Quickstart which processes some text lines with two operators (a FlatMap
and a Reduce operation via aggregating a sum), and prints the resulting
words and counts to std-out.
~~~java
public class WordCount {
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// get input data
DataSet<String> text = env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,"
);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new LineSplitter())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
// execute and print result
counts.print();
}
}
~~~
The operations are defined by specialized classes, here the LineSplitter class.
~~~java
public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
~~~
{% gh_link /flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java "Check GitHub" %} for the full example code.
For a complete overview over our API, have a look at the
[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.
If you have any trouble, ask on our
[Mailing List](http://mail-archives.apache.org/mod_mbox/flink-user/).
We are happy to provide help.
{% top %}