|  | --- | 
|  | title: "Sample Project using the Java API" | 
|  | nav-title: Sample Project in Java | 
|  | nav-parent_id: start | 
|  | nav-pos: 0 | 
|  | --- | 
|  | <!-- | 
|  | Licensed to the Apache Software Foundation (ASF) under one | 
|  | or more contributor license agreements.  See the NOTICE file | 
|  | distributed with this work for additional information | 
|  | regarding copyright ownership.  The ASF licenses this file | 
|  | to you under the Apache License, Version 2.0 (the | 
|  | "License"); you may not use this file except in compliance | 
|  | with the License.  You may obtain a copy of the License at | 
|  |  | 
|  | http://www.apache.org/licenses/LICENSE-2.0 | 
|  |  | 
|  | Unless required by applicable law or agreed to in writing, | 
|  | software distributed under the License is distributed on an | 
|  | "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | 
|  | KIND, either express or implied.  See the License for the | 
|  | specific language governing permissions and limitations | 
|  | under the License. | 
|  | --> | 
|  |  | 
|  | * This will be replaced by the TOC | 
|  | {:toc} | 
|  |  | 
|  | Start working on your Flink Java program in a few simple steps. | 
|  |  | 
|  |  | 
|  | ## Requirements | 
|  |  | 
|  | The only requirements are working __Maven 3.0.4__ (or higher) and __Java 8.x__ (or higher) installations. | 
|  |  | 
|  | ## Create Project | 
|  |  | 
|  | Use one of the following commands to __create a project__: | 
|  |  | 
|  | <ul class="nav nav-tabs" style="border-bottom: none;"> | 
|  | <li class="active"><a href="#maven-archetype" data-toggle="tab">Use <strong>Maven archetypes</strong></a></li> | 
|  | <li><a href="#quickstart-script" data-toggle="tab">Run the <strong>quickstart script</strong></a></li> | 
|  | </ul> | 
|  | <div class="tab-content"> | 
|  | <div class="tab-pane active" id="maven-archetype"> | 
|  | {% highlight bash %} | 
|  | $ mvn archetype:generate                               \ | 
|  | -DarchetypeGroupId=org.apache.flink              \ | 
|  | -DarchetypeArtifactId=flink-quickstart-java      \{% unless site.is_stable %} | 
|  | -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/ \{% endunless %} | 
|  | -DarchetypeVersion={{site.version}} | 
|  | {% endhighlight %} | 
|  | This allows you to <strong>name your newly created project</strong>. It will interactively ask you for the groupId, artifactId, and package name. | 
|  | </div> | 
|  | <div class="tab-pane" id="quickstart-script"> | 
|  | {% highlight bash %} | 
|  | {% if site.is_stable %} | 
|  | $ curl https://flink.apache.org/q/quickstart.sh | bash | 
|  | {% else %} | 
|  | $ curl https://flink.apache.org/q/quickstart-SNAPSHOT.sh | bash | 
|  | {% endif %} | 
|  | {% endhighlight %} | 
|  |  | 
|  | </div> | 
|  | {% unless site.is_stable %} | 
|  | <p style="border-radius: 5px; padding: 5px" class="bg-danger"> | 
|  | <b>Note</b>: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the commandline. If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For details about this change, please refer to <a href="http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html">Maven official document</a> | 
|  | </p> | 
|  | {% endunless %} | 
|  | </div> | 
|  |  | 
|  | ## Inspect Project | 
|  |  | 
|  | There will be a new directory in your working directory. If you've used | 
|  | the _curl_ approach, the directory is called `quickstart`. Otherwise, | 
|  | it has the name of your `artifactId`: | 
|  |  | 
|  | {% highlight bash %} | 
|  | $ tree quickstart/ | 
|  | quickstart/ | 
|  | ├── pom.xml | 
|  | └── src | 
|  | └── main | 
|  | ├── java | 
|  | │   └── org | 
|  | │       └── myorg | 
|  | │           └── quickstart | 
|  | │               ├── BatchJob.java | 
|  | │               ├── SocketTextStreamWordCount.java | 
|  | │               ├── StreamingJob.java | 
|  | │               └── WordCount.java | 
|  | └── resources | 
|  | └── log4j.properties | 
|  | {% endhighlight %} | 
|  |  | 
|  | The sample project is a __Maven project__, which contains four classes. _StreamingJob_ and _BatchJob_ are basic skeleton programs, _SocketTextStreamWordCount_ is a working streaming example and _WordCountJob_ is a working batch example. Please note that the _main_ method of all classes allow you to start Flink in a development/testing mode. | 
|  |  | 
|  | We recommend you __import this project into your IDE__ to develop and | 
|  | test it. If you use Eclipse, the [m2e plugin](http://www.eclipse.org/m2e/) | 
|  | allows to [import Maven projects](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html#fig-creating-import). | 
|  | Some Eclipse bundles include that plugin by default, others require you | 
|  | to install it manually. The IntelliJ IDE supports Maven projects out of | 
|  | the box. | 
|  |  | 
|  |  | 
|  | *A note to Mac OS X users*: The default JVM heapsize for Java is too | 
|  | small for Flink. You have to manually increase it. In Eclipse, choose | 
|  | `Run Configurations -> Arguments` and write into the `VM Arguments` | 
|  | box: `-Xmx800m`. | 
|  |  | 
|  | ## Build Project | 
|  |  | 
|  | If you want to __build your project__, go to your project directory and | 
|  | issue the `mvn clean install -Pbuild-jar` command. You will | 
|  | __find a jar__ that runs on every Flink cluster with a compatible | 
|  | version, __target/original-your-artifact-id-your-version.jar__. There | 
|  | is also a fat-jar in __target/your-artifact-id-your-version.jar__ which, | 
|  | additionally, contains all dependencies that were added to the Maven | 
|  | project. | 
|  |  | 
|  | ## Next Steps | 
|  |  | 
|  | Write your application! | 
|  |  | 
|  | The quickstart project contains a `WordCount` implementation, the | 
|  | "Hello World" of Big Data processing systems. The goal of `WordCount` | 
|  | is to determine the frequencies of words in a text, e.g., how often do | 
|  | the terms "the" or "house" occur in all Wikipedia texts. | 
|  |  | 
|  | __Sample Input__: | 
|  |  | 
|  | ~~~bash | 
|  | big data is big | 
|  | ~~~ | 
|  |  | 
|  | __Sample Output__: | 
|  |  | 
|  | ~~~bash | 
|  | big 2 | 
|  | data 1 | 
|  | is 1 | 
|  | ~~~ | 
|  |  | 
|  | The following code shows the `WordCount` implementation from the | 
|  | Quickstart which processes some text lines with two operators (a FlatMap | 
|  | and a Reduce operation via aggregating a sum), and prints the resulting | 
|  | words and counts to std-out. | 
|  |  | 
|  | ~~~java | 
|  | public class WordCount { | 
|  |  | 
|  | public static void main(String[] args) throws Exception { | 
|  |  | 
|  | // set up the execution environment | 
|  | final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); | 
|  |  | 
|  | // get input data | 
|  | DataSet<String> text = env.fromElements( | 
|  | "To be, or not to be,--that is the question:--", | 
|  | "Whether 'tis nobler in the mind to suffer", | 
|  | "The slings and arrows of outrageous fortune", | 
|  | "Or to take arms against a sea of troubles," | 
|  | ); | 
|  |  | 
|  | DataSet<Tuple2<String, Integer>> counts = | 
|  | // split up the lines in pairs (2-tuples) containing: (word,1) | 
|  | text.flatMap(new LineSplitter()) | 
|  | // group by the tuple field "0" and sum up tuple field "1" | 
|  | .groupBy(0) | 
|  | .sum(1); | 
|  |  | 
|  | // execute and print result | 
|  | counts.print(); | 
|  | } | 
|  | } | 
|  | ~~~ | 
|  |  | 
|  | The operations are defined by specialized classes, here the LineSplitter class. | 
|  |  | 
|  | ~~~java | 
|  | public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> { | 
|  |  | 
|  | @Override | 
|  | public void flatMap(String value, Collector<Tuple2<String, Integer>> out) { | 
|  | // normalize and split the line | 
|  | String[] tokens = value.toLowerCase().split("\\W+"); | 
|  |  | 
|  | // emit the pairs | 
|  | for (String token : tokens) { | 
|  | if (token.length() > 0) { | 
|  | out.collect(new Tuple2<String, Integer>(token, 1)); | 
|  | } | 
|  | } | 
|  | } | 
|  | } | 
|  | ~~~ | 
|  |  | 
|  | {% gh_link /flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java "Check GitHub" %} for the full example code. | 
|  |  | 
|  | For a complete overview over our API, have a look at the | 
|  | [DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and | 
|  | [DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. | 
|  | If you have any trouble, ask on our | 
|  | [Mailing List](http://mail-archives.apache.org/mod_mbox/flink-user/). | 
|  | We are happy to provide help. | 
|  |  | 
|  | {% top %} |