Fix ClickEventCount for parallelism higher than 1

Following the [Operations Tutorial](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/try-flink/flink-operations-playground/#upgrading--rescaling-a-job) you're asked to start the job with parallelism 3 but I wasn't getting any output when I did that.

The reason I realized is [idle partitions](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources) on the Kafka input because all input goes into partition 0.

By adding `withIdleness` to the WatermarkStrategy we are ensuring the job can work with higher parallelism and idle partitions.

Bonus: this also fixes the recovery in the [normal case](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/try-flink/flink-operations-playground/#step-2-introducing-a-fault). Without it I was seeing outputs of less than 1k clicks per window when resuming the job, whereas now it's the expected 1k every time.
1 file changed
tree: 05e2f2682e668cadcb08b6e961c09d5a9aaa538e
  1. docker/
  2. operations-playground/
  3. pyflink-walkthrough/
  4. table-walkthrough/
  5. .gitignore
  6. howto-update-playgrounds.md
  7. LICENSE
  8. README.md
README.md

Apache Flink Playgrounds

This repository provides playgrounds to quickly and easily explore Apache Flink's features.

The playgrounds are based on docker-compose environments. Each subfolder of this repository contains the docker-compose setup of a playground, except for the ./docker folder which contains code and configuration to build custom Docker images for the playgrounds.

Available Playgrounds

Currently, the following playgrounds are available:

  • The Flink Operations Playground (in the operations-playground folder) lets you explore and play with Flink's features to manage and operate stream processing jobs. You can witness how Flink recovers a job from a failure, upgrade and rescale a job, and query job metrics. The playground consists of a Flink cluster, a Kafka cluster and an example Flink job. The playground is presented in detail in “Flink Operations Playground”, which is part of the Try Flink section of the Flink documentation.

  • The Table Walkthrough (in the table-walkthrough folder) shows how to use the Table API to build an analytics pipeline that reads streaming data from Kafka and writes results to MySQL, along with a real-time dashboard in Grafana. The walkthrough is presented in detail in “Real Time Reporting with the Table API”, which is part of the Try Flink section of the Flink documentation.

  • The PyFlink Walkthrough (in the pyflink-walkthrough folder) provides a complete example that uses the Python API, and guides you through the steps needed to run and manage Pyflink Jobs. The pipeline used in this walkthrough reads data from Kafka, performs aggregations, and writes results to Elasticsearch that are visualized with Kibana. This walkthrough is presented in detail in the pyflink-walkthrough README.

About

Apache Flink is an open source project of The Apache Software Foundation (ASF).

Flink is distributed data processing framework with powerful stream and batch processing capabilities. Learn more about Flink at https://flink.apache.org/