ElasticSearch is used to store the results and to provide an efficient query service.
Kibana is an open source data visualization dashboard for ElasticSearch. You will use it to visualize the total transaction paymentAmount and proportion for each provinces in this PyFlink pipeline through a dashboard.
As mentioned, the environment for this walkthrough is based on Docker Compose; It uses a custom image to spin up Flink (JobManager+TaskManager), Kafka+Zookeeper, the data generator, and Elasticsearch+Kibana containers.
You can find the docker-compose.yaml file of the pyflink-walkthrough in the pyflink-walkthrough
root directory.
First, build the Docker image by running:
$ cd pyflink-walkthrough $ docker-compose build
Once the Docker image build is complete, run the following command to start the playground:
$ docker-compose up -d
One way of checking if the playground was successfully started is to access some of the services that are exposed:
Note: you may need to wait around 1 minute before all the services come up.
You can use the following command to read data from the Kafka topic and check whether it's generated correctly:
$ docker-compose exec kafka kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic payment_msg {"createTime":"2020-07-27 09:25:32.77","orderId":1595841867217,"payAmount":7732.44,"payPlatform":0,"provinceId":3} {"createTime":"2020-07-27 09:25:33.231","orderId":1595841867218,"payAmount":75774.05,"payPlatform":0,"provinceId":3} {"createTime":"2020-07-27 09:25:33.72","orderId":1595841867219,"payAmount":65908.55,"payPlatform":0,"provinceId":0} {"createTime":"2020-07-27 09:25:34.216","orderId":1595841867220,"payAmount":15341.11,"payPlatform":0,"provinceId":1} {"createTime":"2020-07-27 09:25:34.698","orderId":1595841867221,"payAmount":37504.42,"payPlatform":0,"provinceId":0}
You can also create a new topic by executing the following command:
$ docker-compose exec kafka kafka-topics.sh --bootstrap-server kafka:9092 --create --topic <YOUR-TOPIC-NAME> --partitions 8 --replication-factor 1
$ docker-compose exec jobmanager ./bin/flink run -py /opt/pyflink-walkthrough/payment_msg_proccessing.py -d
Navigate to the Flink Web UI after the job is submitted successfully. There should be a job in the running job list. Click the job to get more details. You should see that the StreamGraph
of the payment_msg_proccessing
consists of two nodes, each with a parallelism of 1. There is also a table in the bottom of the page that shows some metrics for each node (e.g. bytes received/sent, records received/sent). Note that Flink‘s metrics only report bytes and records and records communicated within the Flink cluster, and so will always report 0 bytes and 0 records received by sources, and 0 bytes and 0 records sent to sinks - so don’t be confused that noting is reported as being read from Kafka, or written to Elasticsearch.
payment_dashboard
. There will be a vertical bar chart and a pie chart demonstrating the total amount and the proportion of each province.