flink-connector-wikiedits

A non-parallel source that parses a live stream of Wikipedia edits.

Meta data about the edits is mirrored to the IRC channel #en.wikipedia. The source establishes a connection to this IRC channel and parses the messages into WikipediaEditEvent instances.

The purpose of this source is to ease the setup of demos of the DataStream API with live data.

The original idea is from the Hello Samza project of Apache Samza. The Samza code for this is located in the samza-hello-samza repository.

Example

Add the following dependency to your project:

<dependency>
  <groupId>com.alibaba.blink</groupId>
  <artifactId>flink-connector-wikiedits</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>

You can use the source like regular sources:

StreamExecutionEnvironment env = StreamExecutionEnvironment
    .getExecutionEnvironment();

DataStream<WikipediaEditEvent> edits = env
    .addSource(new WikipediaEditsSource());

Remember that it is non-parallel source and as such it will run with parallelism 1.