Flink CDC is a distributed data integration tool for real time data and batch data. Flink CDC brings the simplicity and elegance of data integration via YAML to describe the data movement and transformation in a Data Pipeline.
The Flink CDC prioritizes efficient end-to-end data integration and offers enhanced functionalities such as full database synchronization, sharding table synchronization, schema evolution and data transformation.
Flink CDC provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs. You will need to have a working Docker and Docker compose environment to use it.
git clone https://github.com/apache/flink-cdc.git --depth=1 to retrieve a copy of Flink CDC source code.cd tools/cdcup/ && ./cdcup.sh init to use the CdcUp tool to start a playground environment../cdcup.sh up to boot-up docker containers, and wait for them to be ready../cdcup.sh mysql to open a MySQL session, and create at least one table.-- initialize db and table CREATE DATABASE cdc_playground; USE cdc_playground; CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32)); -- insert test data INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida'); -- verify if it has been successfully inserted SELECT * FROM test_table;
./cdcup.sh pipeline pipeline-definition.yaml to submit the pipeline job. You may also edit the pipeline definition file for further configurations../cdcup.sh flink to access the Flink Web UI.FLINK_HOME environment variable.lib directory.If you're using macOS or Linux, you may use
brew install apache-flink-cdcto install Flink CDC and compatible connectors quickly.
source: type: mysql hostname: localhost port: 3306 username: root password: 123456 tables: app_db.\.* sink: type: doris fenodes: 127.0.0.1:8030 username: root password: "" transform: - source-table: adb.web_order01 projection: \*, format('%S', product_name) as product_name filter: addone(id) > 10 AND order_id > 100 description: project fields and filter - source-table: adb.web_order02 projection: \*, format('%S', product_name) as product_name filter: addone(id) > 20 AND order_id > 200 description: project fields and filter route: - source-table: app_db.orders sink-table: ods_db.ods_orders - source-table: app_db.shipments sink-table: ods_db.ods_shipments - source-table: app_db.products sink-table: ods_db.ods_products pipeline: name: Sync MySQL Database to Doris parallelism: 2 user-defined-function: - name: addone classpath: com.example.functions.AddOneFunctionClass - name: format classpath: com.example.functions.FormatFunctionClass
flink-cdc.sh script.bash bin/flink-cdc.sh /path/mysql-to-doris.yaml
Try it out yourself with our more detailed tutorial. You can also see connector overview to view a comprehensive catalog of the connectors currently provided and understand more detailed configurations.
There are many ways to participate in the Apache Flink CDC community. The mailing lists are the primary place where all Flink committers are present. For user support and questions use the user mailing list. If you've found a problem of Flink CDC, please create a Flink jira and tag it with the Flink CDC tag.
Bugs and feature requests can either be discussed on the dev mailing list or on Jira.
Welcome to contribute to Flink CDC, please see our Developer Guide and APIs Guide.
The Flink CDC community welcomes everyone who is willing to contribute, whether it's through submitting bug reports, enhancing the documentation, or submitting code contributions for bug fixes, test additions, or new feature development.
Thanks to all contributors for their enthusiastic contributions.