Doris provides the following ways to load data from MySQL:
Doris uses JDBC Catalog to map MySQL as an external catalog, allowing direct SQL queries against MySQL data. Combined with INSERT INTO or CREATE TABLE AS SELECT, this is suitable for one-time migration or periodic batch loading.
Doris uses Streaming Job to continuously sync full and incremental data from MySQL to Doris. By integrating Flink CDC reading capability, Doris keeps the job running, reads Binlog from MySQL and writes it to Doris tables with exactly-once semantics. Two modes are supported: SQL Mapping Sync and Auto Table Creation Sync. Available since Doris 4.1.0.
Use Flink Doris Connector together with Flink MySQL CDC for real-time synchronization. This is suitable for scenarios that require additional Flink stream processing logic. The connector also provides a one-click full-database synchronization tool. For details, see Flink Doris Connector.
Data integration tools such as DataX, SeaTunnel, and CloudCanal also support syncing data from MySQL to Doris.
In most cases, you can use JDBC Catalog directly for one-time data migration. When continuous full + incremental synchronization is required, Streaming Job is recommended.
Use JDBC Catalog to map MySQL as an external catalog, then use INSERT INTO or CREATE TABLE AS SELECT to load data. For detailed syntax, see JDBC MySQL Catalog.
CREATE TABLE test.students ( id INT PRIMARY KEY, name VARCHAR(64), age INT ); INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
CREATE CATALOG mysql_catalog PROPERTIES ( "type" = "jdbc", "user" = "root", "password" = "123456", "jdbc_url" = "jdbc:mysql://127.0.0.1:3306/test", "driver_url" = "mysql-connector-java-8.0.25.jar", "driver_class" = "com.mysql.cj.jdbc.Driver" );
CREATE DATABASE IF NOT EXISTS doris_db; CREATE TABLE doris_db.students ( id INT, name VARCHAR(64), age INT ) UNIQUE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1");
INSERT INTO doris_db.students SELECT id, name, age FROM mysql_catalog.test.students;
If the target table does not exist yet, you can also use CREATE TABLE AS SELECT to create the table and load data in one step:
CREATE TABLE doris_db.students PROPERTIES ("replication_num" = "1") AS SELECT * FROM mysql_catalog.test.students;
SELECT * FROM doris_db.students; +----+-------+------+ | id | name | age | +----+-------+------+ | 1 | Emily | 25 | | 2 | Bob | 30 | +----+-------+------+
Streaming Job continuously reads MySQL Binlog via Flink CDC and writes it to Doris. Two modes are supported:
include_tables to sync one, several, or all tables). Provides at-least-once semantics.Before submitting a Streaming Job, Binlog must be enabled on the MySQL side and the user must be granted the corresponding REPLICATION privileges. For environment-specific setup steps, see:
Auto Table Creation Sync uses the FROM MYSQL ... TO DATABASE ... syntax. The target is a Doris database, and the downstream tables are automatically created on first sync.
CREATE TABLE test.students ( id INT PRIMARY KEY, name VARCHAR(64), age INT ); INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
Auto Table Creation Sync does not require pre-creating tables, but the target database that hosts them must exist:
CREATE DATABASE IF NOT EXISTS doris_db;
The example below uses include_tables to sync only the students table (multiple tables can be comma-separated; leave empty to sync the whole database):
CREATE JOB mysql_db_sync ON STREAMING FROM MYSQL ( "jdbc_url" = "jdbc:mysql://127.0.0.1:3306", "driver_url" = "mysql-connector-java-8.0.25.jar", "driver_class" = "com.mysql.cj.jdbc.Driver", "user" = "root", "password" = "123456", "database" = "test", "include_tables" = "students", "offset" = "initial" ) TO DATABASE doris_db ( "table.create.properties.replication_num" = "1" -- set to 1 in single-BE deployments );
SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING";
SHOW TABLES FROM doris_db; SELECT * FROM doris_db.students;
For more common operations and full parameter reference, see MySQL CDC with Auto Table Creation.
CREATE TABLE test.students ( id INT PRIMARY KEY, name VARCHAR(64), age INT ); INSERT INTO test.students VALUES (1, 'Emily', 25), (2, 'Bob', 30);
SQL Mapping Sync requires the target table to exist beforehand:
CREATE DATABASE IF NOT EXISTS doris_db; CREATE TABLE doris_db.students ( id INT, name VARCHAR(64), age INT ) UNIQUE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ("replication_num" = "1");
Use CREATE STREAMING JOB with the INSERT INTO ... SELECT * FROM cdc_stream(...) syntax:
CREATE JOB mysql_students_sync ON STREAMING DO INSERT INTO doris_db.students SELECT * FROM cdc_stream( "type" = "mysql", "jdbc_url" = "jdbc:mysql://127.0.0.1:3306", "driver_url" = "mysql-connector-java-8.0.25.jar", "driver_class" = "com.mysql.cj.jdbc.Driver", "user" = "root", "password" = "123456", "database" = "test", "table" = "students", "offset" = "initial" );
SELECT * FROM jobs("type"="insert") WHERE ExecuteType = "STREAMING";
SELECT * FROM doris_db.students;
For more common operations (pause, resume, delete, check task, etc.) and full parameter reference, see MySQL CDC with SQL Mapping.