docs/quickstart/scala_api

title: “Quickstart: Scala API”

This will be replaced by the TOC {:toc}

Start working on your Flink Scala program in a few simple steps.

Requirements

The only requirements are working Maven 3.0.4 (or higher) and Java 6.x (or higher) installations.

Create Project

Use one of the following commands to create a project:

Inspect Project

There will be a new directory in your working directory. If you've used the curl approach, the directory is called quickstart. Otherwise, it has the name of your artifactId.

The sample project is a Maven project, which contains two classes. Job is a basic skeleton program and WordCountJob a working example. Please note that the main method of both classes allow you to start Flink in a development/testing mode.

We recommend to import this project into your IDE. For Eclipse, you need the following plugins, which you can install from the provided Eclipse Update Sites:

Eclipse 4.x
Eclipse 3.7

The IntelliJ IDE also supports Maven and offers a plugin for Scala development.

Build Project

If you want to build your project, go to your project directory and issue the mvn clean package -Pbuild-jar command. You will find a jar that runs on every Flink cluster in target/your-artifact-id-1.0-SNAPSHOT.jar. There is also a fat-jar, target/your-artifact-id-1.0-SNAPSHOT-flink-fat-jar.jar. This also contains all dependencies that get added to the maven project.

Next Steps

Write your application!

The quickstart project contains a WordCount implementation, the “Hello World” of Big Data processing systems. The goal of WordCount is to determine the frequencies of words in a text, e.g., how often do the terms “the” or “house” occurs in all Wikipedia texts.

Sample Input:

big data is big

Sample Output:

big 2
data 1
is 1

The following code shows the WordCount implementation from the Quickstart which processes some text lines with two operators (FlatMap and Reduce), and writes the prints the resulting words and counts to std-out.

object WordCountJob {
  def main(args: Array[String]) {

    // set up the execution environment
    val env = ExecutionEnvironment.getExecutionEnvironment

    // get input data
    val text = env.fromElements("To be, or not to be,--that is the question:--",
      "Whether 'tis nobler in the mind to suffer", "The slings and arrows of outrageous fortune",
      "Or to take arms against a sea of troubles,")

    val counts = text.flatMap { _.toLowerCase.split("\\W+") }
      .map { (_, 1) }
      .groupBy(0)
      .sum(1)

    // emit result
    counts.print()

    // execute program
    env.execute("WordCount Example")
  }
}

{% gh_link /flink-examples/flink-scala-examples/src/main/scala/org/apache/flink/examples/scala/wordcount/WordCount.scala “Check GitHub” %} for the full example code.

For a complete overview over our API, have a look at the [Programming Guide]({{ site.baseurl }}/apis/programming_guide.html) and [further example programs]({{ site.baseurl }}/apis/examples.html). If you have any trouble, ask on our Mailing List. We are happy to provide help.

Alternative Build Tools: SBT

To build and run applications with SBT instead of Maven is pretty straight forward. After creating the standard sbt directory layout it's enough to add the Flink dependencies to the build.sbt file:

libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "{{site.version}}", "org.apache.flink" % "flink-clients" % "{{site.version}}")

Now the application can be executed by sbt run. By default SBT runs an application in the same JVM itself is running in. This can lead to lass loading issues with Flink. To avoid these, append the following line to build.sbt:

fork in run := true