| --- |
| author: Robert Metzger |
| author-twitter: rmetzger_ |
| date: "2015-12-18T10:00:00Z" |
| excerpt: <p>With 2015 ending, we thought that this would be good time to reflect on |
| the amazing work done by the Flink community over this past year, and how much this |
| community has grown.</p> |
| title: 'Flink 2015: A year in review, and a lookout to 2016' |
| aliases: |
| - /news/2015/12/18/a-year-in-review.html |
| --- |
| |
| With 2015 ending, we thought that this would be good time to reflect |
| on the amazing work done by the Flink community over this past year, |
| and how much this community has grown. |
| |
| Overall, we have seen Flink grow in terms of functionality from an |
| engine to one of the most complete open-source stream processing |
| frameworks available. The community grew from a relatively small and |
| geographically focused team, to a truly global, and one of the largest |
| big data communities in the the Apache Software Foundation. |
| |
| We will also look at some interesting stats, including that the |
| busiest days for Flink are Mondays (who would have thought :-). |
| |
| # Community growth |
| |
| Let us start with some simple statistics from [Flink's |
| github repository](https://github.com/apache/flink). During 2015, the |
| Flink community **doubled** in size, from about 75 contributors to |
| over 150. Forks of the repository more than **tripled** from 160 in |
| February 2015 to 544 in December 2015, and the number of stars of the |
| repository almost tripled from 289 to 813. |
| |
| <center> |
| <img src="/img/blog/community-growth.png" style="height:400px;margin:15px"> |
| </center> |
| |
| Although Flink started out geographically in Berlin, Germany, the |
| community is by now spread all around the globe, with many |
| contributors from North America, Europe, and Asia. A simple search at |
| meetup.com for groups that mention Flink as a focus area reveals [16 |
| meetups around the globe](http://apache-flink.meetup.com/): |
| |
| <center> |
| <img src="/img/blog/meetup-map.png" style="height:400px;margin:15px"> |
| </center> |
| |
| # Flink Forward 2015 |
| |
| One of the highlights of the year for Flink was undoubtedly the [Flink |
| Forward](http://2015.flink-forward.org/) conference, the first conference |
| on Apache Flink that was held in October in Berlin. More than 250 |
| participants (roughly half based outside Germany where the conference |
| was held) attended more than 33 technical talks from organizations |
| including Google, MongoDB, Bouygues Telecom, NFLabs, Euranova, RedHat, |
| IBM, Huawei, Intel, Ericsson, Capital One, Zalando, Amadeus, the Otto |
| Group, and ResearchGate. If you have not yet watched their talks, |
| check out the [slides](http://2015.flink-forward.org/?post_type=day) and |
| [videos](https://www.youtube.com/playlist?list=PLDX4T_cnKjD31JeWR1aMOi9LXPRQ6nyHO) |
| from Flink Forward. |
| |
| <center> |
| <img src="/img/blog/ff-speakers.png" style="height:400px;margin:15px"> |
| </center> |
| |
| # Media coverage |
| |
| And of course, interest in Flink was picked up by the tech |
| media. During 2015, articles about Flink appeared in |
| [InfoQ](http://www.infoq.com/Apache-Flink/news/), |
| [ZDNet](http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/), |
| [Datanami](http://www.datanami.com/tag/apache-flink/), |
| [Infoworld](http://www.infoworld.com/article/2919602/hadoop/flink-hadoops-new-contender-for-mapreduce-spark.html) |
| (including being one of the [best open source big data tools of |
| 2015](http://www.infoworld.com/article/2982429/open-source-tools/bossie-awards-2015-the-best-open-source-big-data-tools.html)), |
| the [Gartner |
| blog](http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/), |
| [Dataconomy](http://dataconomy.com/tag/apache-flink/), |
| [SDTimes](http://sdtimes.com/tag/apache-flink/), the [MapR |
| blog](https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data), |
| [KDnuggets](http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html), |
| and |
| [HadoopSphere](http://www.hadoopsphere.com/2015/02/distributed-data-processing-with-apache.html). |
| |
| <center> |
| <img src="/img/blog/appeared-in.png" style="height:400px;margin:15px"> |
| </center> |
| |
| It is interesting to see that Hadoop Summit EMEA 2016 had a whopping |
| number of 17 (!) talks submitted that are mentioning Flink in their |
| title and abstract: |
| |
| <center> |
| <img src="/img/blog/hadoop-summit.png" style="height:400px;margin:15px"> |
| </center> |
| |
| # Fun with stats: when do committers commit? |
| |
| To get some deeper insight on what is happening in the Flink |
| community, let us do some analytics on the git log of the project :-) |
| The easiest thing we can do is count the number of commits at the |
| repository in 2015. Running |
| |
| ``` |
| git log --pretty=oneline --after=1/1/2015 | wc -l |
| ``` |
| |
| on the Flink repository yields a total of **2203 commits** in 2015. |
| |
| To dig deeper, we will use an open source tool called gitstats that |
| will give us some interesting statistics on the committer |
| behavior. You can create these also yourself and see many more by |
| following four easy steps: |
| |
| 1. Download gitstats from the [project homepage](http://gitstats.sourceforge.net/).. E.g., on OS X with homebrew, type |
| |
| ``` |
| brew install --HEAD homebrew/head-only/gitstats |
| ``` |
| |
| 2. Clone the Apache Flink git repository: |
| |
| ``` |
| git clone git@github.com:apache/flink.git |
| ``` |
| |
| 3. Generate the statistics |
| |
| ``` |
| gitstats flink/ flink-stats/ |
| ``` |
| |
| 4. View all the statistics as an html page using your favorite browser (e.g., chrome): |
| |
| ``` |
| chrome flink-stats/index.html |
| ``` |
| |
| First, we can see a steady growth of lines of code in Flink since the |
| initial Apache incubator project. During 2015, the codebase almost |
| **doubled** from 500,000 LOC to 900,000 LOC. |
| |
| <center> |
| <img src="/img/blog/code-growth.png" style="height:400px;margin:15px"> |
| </center> |
| |
| It is interesting to see when committers commit. For Flink, Monday |
| afternoons are by far the most popular times to commit to the |
| repository: |
| |
| <center> |
| <img src="/img/blog/commit-stats.png" style="height:400px;margin:15px"> |
| </center> |
| |
| # Feature timeline |
| |
| So, what were the major features added to Flink and the Flink |
| ecosystem during 2015? Here is a (non-exhaustive) chronological list: |
| |
| <center> |
| <img src="/img/blog/feature-timeline.png" style="height:400px;margin:15px"> |
| </center> |
| |
| # Roadmap for 2016 |
| |
| With 2015 coming to a close, the Flink community has already started |
| discussing Flink's roadmap for the future. Some highlights |
| are: |
| |
| * **Runtime scaling of streaming jobs:** streaming jobs are running |
| forever, and need to react to a changing environment. Runtime |
| scaling means dynamically increasing and decreasing the |
| parallelism of a job to sustain certain SLAs, or react to changing |
| input throughput. |
| |
| * **SQL queries for static data sets and streams:** building on top of |
| Flink's Table API, users should be able to write SQL |
| queries for static data sets, as well as SQL queries on data |
| streams that continuously produce new results. |
| |
| * **Streaming operators backed by managed memory:** currently, |
| streaming operators like user-defined state and windows are backed |
| by JVM heap objects. Moving those to Flink managed memory will add |
| the ability to spill to disk, GC efficiency, as well as better |
| control over memory utilization. |
| |
| * **Library for detecting temporal event patterns:** a common use case |
| for stream processing is detecting patterns in an event stream |
| with timestamps. Flink makes this possible with its support for |
| event time, so many of these operators can be surfaced in the form |
| of a library. |
| |
| * **Support for Apache Mesos, and resource-dynamic YARN support:** |
| support for both Mesos and YARN, including dynamic allocation and |
| release of resource for more resource elasticity (for both batch |
| and stream processing). |
| |
| * **Security:** encrypt both the messages exchanged between |
| TaskManagers and JobManager, as well as the connections for data |
| exchange between workers. |
| |
| * **More streaming connectors, more runtime metrics, and continuous |
| DataStream API enhancements:** add support for more sources and |
| sinks (e.g., Amazon Kinesis, Cassandra, Flume, etc), expose more |
| metrics to the user, and provide continuous improvements to the |
| DataStream API. |
| |
| If you are interested in these features, we highly encourage you to |
| take a look at the [current |
| draft](https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-778DAD7eqw5GANwE/edit), |
| and [join the |
| discussion](https://mail-archives.apache.org/mod_mbox/flink-dev/201512.mbox/browser) |
| on the Flink mailing lists. |
| |