{% include JB/setup %}
Apache Flink is an open source platform for distributed stream and batch data processing. Flinkās core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.
Zeppelin comes with pre-configured flink-local interpreter, which starts Flink in a local mode on your machine, so you do not need to install anything.
At the “Interpreters” menu, you have to create a new Flink interpreter and provide next properties:
For more information about Flink configuration, you can find it here.
You can find an example of Flink usage in the Zeppelin Tutorial folder or try the following word count example, by using the Zeppelin notebook from Till Rohrmann's presentation Interactive data analysis with Apache Flink for Apache Flink Meetup.
%sh rm 10.txt.utf-8 wget http://www.gutenberg.org/ebooks/10.txt.utf-8
{% highlight scala %} %flink case class WordCount(word: String, frequency: Int) val bible:DataSet[String] = benv.readTextFile(“10.txt.utf-8”) val partialCounts: DataSet[WordCount] = bible.flatMap{ line => “““\b\w+\b”””.r.findAllIn(line).map(word => WordCount(word, 1)) // line.split(" ").map(word => WordCount(word, 1)) } val wordCounts = partialCounts.groupBy(“word”).reduce{ (left, right) => WordCount(left.word, left.frequency + right.frequency) } val result10 = wordCounts.first(10).collect() {% endhighlight %}