The Syncer must be able to communicate with both the upstream and downstream FE (Frontend) and BE (Backend).
The downstream BE and upstream BE must use the IP of the Doris BE process (as seen in show frontends/backends), which must be directly accessible.
When syncing, the user must provide accounts for both upstream and downstream, and these accounts must have the following permissions:
Additionally, Admin privileges are required (this may be removed in the future). These are needed to check the enable binlog configuration, which currently requires admin rights.
Minimum version required: v2.0.15
:::caution Starting from versions 2.1.8/3.0.4, the minimum supported Doris version for the ccr syncer is 2.1. Version 2.0 will no longer be supported. :::
Doris Versions:
Start Syncer according to the configurations and save a pid file in the default or specified path. The name of the pid file should follow host_port.pid.
Output file structure
The file structure can be seen under the output path after compilation:
output_dir bin ccr_syncer enable_db_binlog.sh start_syncer.sh stop_syncer.sh db [ccr.db] # Generated after running with the default configurations. log [ccr_syncer.log] # Generated after running with the default configurations.
The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.
Start options
--daemon
Run Syncer in the background, set to false by default.
bash bin/start_syncer.sh --daemon
--db_type
Syncer can currently use two databases to store its metadata, sqlite3 (for local storage) and mysql (for local or remote storage).
bash bin/start_syncer.sh --db_type mysql
The default value is sqlite3.
When using MySQL to store metadata, Syncer will use CREATE IF NOT EXISTS to create a database called ccr, where the metadata table related to CCR will be saved.
--db_dir
This option only works when db uses sqlite3.
It allows you to specify the name and path of the db file generated by sqlite3.
bash bin/start_syncer.sh --db_dir /path/to/ccr.db
The default path is SYNCER_OUTPUT_DIR/db and the default file name is ccr.db.
--db_host & db_port & db_user & db_password
This option only works when db uses mysql.
bash bin/start_syncer.sh --db_host 127.0.0.1 --db_port 3306 --db_user root --db_password "qwe123456"
The default values of db_host and db_port are shown in the example. The default values of db_user and db_password are empty.
--log_dir
Output path of the logs:
bash bin/start_syncer.sh --log_dir /path/to/ccr_syncer.log
The default path isSYNCER_OUTPUT_DIR/log and the default file name is ccr_syncer.log.
--log_level
Used to specify the output level of Syncer logs.
bash bin/start_syncer.sh --log_level info
The format of the log is as follows, where the hook will only be printed when log_level > info :
# time level msg hooks [2023-07-18 16:30:18] TRACE This is trace type. ccrName=xxx line=xxx [2023-07-18 16:30:18] DEBUG This is debug type. ccrName=xxx line=xxx [2023-07-18 16:30:18] INFO This is info type. ccrName=xxx line=xxx [2023-07-18 16:30:18] WARN This is warn type. ccrName=xxx line=xxx [2023-07-18 16:30:18] ERROR This is error type. ccrName=xxx line=xxx [2023-07-18 16:30:18] FATAL This is fatal type. ccrName=xxx line=xxx
Under --daemon, the default value of log_level is info.
When running in the foreground, log_level defaults to trace, and logs are saved to log_dir using the tee command.
--host && --port
Used to specify the host and port of Syncer, where host only plays the role of distinguishing itself in the cluster, which can be understood as the name of Syncer, and the name of Syncer in the cluster is host: port.
bash bin/start_syncer.sh --host 127.0.0.1 --port 9190
The default value of host is 127.0.0.1, and the default value of port is 9190.
--pid_dir
Used to specify the storage path of the pid file
The pid file is the credentials for closing the Syncer. It is used in the stop_syncer.sh script. It saves the corresponding Syncer process number. In order to facilitate management of Syncer, you can specify the storage path of the pid file.
bash bin/start_syncer.sh --pid_dir /path/to/pids
The default value is SYNCER_OUTPUT_DIR/bin.
Stop the Syncer according to the process number in the pid file under the default or specified path. The name of the pid file should follow host_port.pid.
Output file structure
The file structure can be seen under the output path after compilation:
output_dir bin ccr_syncer enable_db_binlog.sh start_syncer.sh stop_syncer.sh db [ccr.db] # Generated after running with the default configurations. log [ccr_syncer.log] # Generated after running with the default configurations.
The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.
Stop options
Syncer can be stopped in three ways:
Specify the host and port of the Syncer to be stopped. Be sure to keep it consistent with the host specified when start_syncer
Specify the names of the pid files to be stopped, wrap the names in "" and separate them with spaces.
Follow the default configurations.
--pid_dir
Specify the directory where the pid file is located. The above three stopping methods all depend on the directory where the pid file is located for execution.
bash bin/stop_syncer.sh --pid_dir /path/to/pids
The effect of the above example is to close the Syncer corresponding to all pid files under /path/to/pids ( method 3 ). -- pid_dir can be used in combination with the above three Syncer stopping methods.
The default value is SYNCER_OUTPUT_DIR/bin.
--host && --port
Stop the Syncer corresponding to host: port in the pid_dir path.
bash bin/stop_syncer.sh --host 127.0.0.1 --port 9190
The default value of host is 127.0.0.1, and the default value of port is empty. That is, specifying the host alone will degrade method 1 to method 3. Method 1 will only take effect when neither the host nor the port is empty.
--files
Stop the Syncer corresponding to the specified pid file name in the pid_dir path.
bash bin/stop_syncer.sh --files "127.0.0.1_9190.pid 127.0.0.1_9191.pid"
The file names should be wrapped in " " and separated with spaces.
Template for requests
curl -X POST -H "Content-Type: application/json" -d {json_body} http://ccr_syncer_host:ccr_syncer_port/operator
json_body: send operation information in JSON format
operator: different operations for Syncer
The interface returns JSON. If successful, the “success” field will be true. Conversely, if there is an error, it will be false, and then there will be an ErrMsgs field.
{"success":true} or {"success":false,"error_msg":"job ccr_test not exist"}
curl -X POST -H "Content-Type: application/json" -d '{ "name": "ccr_test", "src": { "host": "localhost", "port": "9030", "thrift_port": "9020", "user": "root", "password": "", "database": "demo", "table": "example_tbl" }, "dest": { "host": "localhost", "port": "9030", "thrift_port": "9020", "user": "root", "password": "", "database": "ccrt", "table": "copy" } }' http://127.0.0.1:9190/create_ccr
curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" }' http://ccr_syncer_host:ccr_syncer_port/get_lag
The job_name is the name specified when create_ccr.
curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" }' http://ccr_syncer_host:ccr_syncer_port/pause
curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" }' http://ccr_syncer_host:ccr_syncer_port/resume
curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" }' http://ccr_syncer_host:ccr_syncer_port/delete
curl http://ccr_syncer_host:ccr_syncer_port/version # > return {"version": "2.0.1"}
curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" }' http://ccr_syncer_host:ccr_syncer_port/job_status { "success": true, "status": { "name": "ccr_db_table_alias", "state": "running", "progress_state": "TableIncrementalSync" } }
Do not sync any more. Users can swap the source and target clusters.
curl -X POST -H "Content-Type: application/json" -d '{ "name": "job_name" }' http://ccr_syncer_host:ccr_syncer_port/desync
curl http://ccr_syncer_host:ccr_syncer_port/list_jobs {"success":true,"jobs":["ccr_db_table_alias"]}
Output file structure
The file structure can be seen under the output path after compilation:
output_dir bin ccr_syncer enable_db_binlog.sh start_syncer.sh stop_syncer.sh db [ccr.db] # Generated after running with the default configurations. log [ccr_syncer.log] # Generated after running with the default configurations.
The start_syncer.sh in the following text refers to the start_syncer.sh under its corresponding path.
Usage
bash bin/enable_db_binlog.sh -h host -p port -u user -P password -d db
Syncer high availability relies on MySQL. If MySQL is used as the backend storage, Syncer can detect other Syncers. If one crashes, others will take over its tasks.
IS_BEING_SYNCED AttributeWhen the CCR (Cluster-to-Cluster Replication) feature is enabled, a replica table (referred to as the target table, located in the target cluster) is created in the target cluster for each table in the source cluster’s sync scope (referred to as the source table, located in the source cluster). However, some features and attributes need to be disabled or erased during the creation of the replica table to ensure the correctness of the sync process.
For example:
storage_policy, which could cause the target table creation to fail or behave abnormally.Attributes that need to be erased due to invalidation during replication include:
storage_policycolocate_withFeatures that need to be disabled during synchronization include:
When creating the target table, these attributes will be controlled by Syncer, either added or removed. In the CCR functionality, there are two ways to create a target table:
Thus, there are two points of insertion for adding the is_being_synced attribute: during the restore process in full synchronization and during incremental synchronization via getDdlStmt.
During the restore process in full synchronization, Syncer triggers a restore of the snapshot in the original cluster via RPC. In this process, it will add the is_being_synced attribute to the RestoreStmt, which will be applied in the final restoreJob, executing the related isBeingSynced logic. In incremental synchronization, the getDdlStmt method will be enhanced with a boolean getDdlForSync parameter to distinguish if it’s a controlled transformation into target table DDL, and the isBeingSynced logic will be executed when creating the target table.
The erasure of invalid attributes needs no further explanation, but the disabling of the above features requires clarification:
false when getting the distribution information and checks the _auto_bucket attribute during table restoration. If the source table is an auto-bucketed table, the target table's autobucket field is set to true, bypassing the bucket count computation and directly applying the source table’s bucket count.olapTable.isBeingSynced() to the condition for performing add/drop partition operations. This ensures that during synchronization, the target table does not periodically perform add/drop partition operations.:::caution
Under normal circumstances, the is_being_synced attribute should be entirely controlled by Syncer, and users should not modify this attribute manually.
:::
restore_reset_index_id: If the table to be synced contains an inverted index, this must be set to false on the target cluster.ignore_backup_tmp_partitions: If the upstream creates temporary partitions, Doris will prohibit performing backups, causing the ccr-syncer synchronization to break. This can be avoided by setting ignore_backup_tmp_partitions=true in the FE configuration.max_backup_restore_job_num_per_db: This configures the number of backup/restore jobs per DB stored in memory. The default value is 10, and setting it to 2 should suffice.binlog.max_bytes: Maximum memory usage for binlogs. It is recommended to keep at least 4GB (default is unlimited).binlog.ttl_seconds: Binlog retention time. In versions prior to 2.0.5, the default is unlimited; in later versions, the default is one day (86400 seconds). For example, to set binlog TTL to one hour: ALTER TABLE table SET ("binlog.ttl_seconds"="3600")label_num_threshold: Controls the number of TXN labels.stream_load_default_timeout_second: Controls the TXN timeout duration.label_keep_max_second: Controls the retention time after TXN ends.streaming_label_keep_max_second: Same as above.max_backup_tablets_per_job: The maximum number of tablets involved in a single backup job. Adjust this based on the tablet count (default is 300,000). Too many tablets could risk FE OOM (Out of Memory), so consider reducing the tablet count if possible.thrift_max_message_size: Maximum RPC packet size allowed by the FE thrift server. The default is 100MB. If the snapshot info exceeds 100MB due to too many tablets, this limit needs to be adjusted (maximum is 2GB).snapshot response meta size: %d, job info size: %d. The snapshot info size is approximately meta size + job info size.fe_thrift_max_pkg_bytes: Another parameter for RPC packet size, which needs to be adjusted in version 2.0. The default value is 20MB.restore_download_task_num_per_be: The maximum number of download tasks per BE. The default is 3, which may be too small for restore jobs. It should be adjusted to 0 (i.e., disable this limit). From versions 2.1.8 and 3.0.4, this configuration is no longer needed.backup_upload_task_num_per_be: The maximum number of upload tasks per BE. The default is 3, which may be too small for backup jobs. It should be adjusted to 0 (i.e., disable this limit). From versions 2.1.8 and 3.0.4, this configuration is no longer needed.select/insert. Increase the following configuration to relax this limit, for example, adjusting it to the maximum of 1GB:[mysqld] max_allowed_packet = 1024MB
--mysql_max_allowed_packet parameter (in bytes). The default value is 1024MB.Note: Starting from versions 2.1.8 and 3.0.4, CCR syncer no longer stores snapshot info in the DB, so the default data packet size is already sufficient.
thrift_max_message_size: The maximum RPC packet size allowed by the BE thrift server. The default is 100MB. If the agent task size exceeds 100MB due to too many tablets, this limit needs to be adjusted (maximum is 2GB).be_thrift_max_pkg_bytes: The same parameter as above, only needs adjustment in version 2.0. The default value is 20MB.restore_job_compressed_serialization: Enable compression for restore jobs (affects metadata compatibility, default is off).backup_job_compressed_serialization: Enable compression for backup jobs (affects metadata compatibility, default is off).enable_restore_snapshot_rpc_compression: Enable compression for snapshot info, mainly affecting RPC (default is on).Note: Since identifying whether a backup/restore job is compressed requires additional code, and versions before 2.1.8 and 3.0.4 do not include this code, once a backup/restore job is generated, it cannot be rolled back to an earlier Doris version. There are two exceptions: Backup/restore jobs that are already canceled or finished will not be compressed. Therefore, waiting for the job to finish or canceling the job before rolling back will allow safe rollback.
label_regex_length can be adjusted to relax this limit (default value is 128).storage_policy attribute is not set on any table before creating the CCR job.If the user's data volume is very large, and the time required for backup and restore exceeds one day (the default value), the following parameters need to be adjusted as needed:
backup_job_default_timeout_ms: Timeout for backup/restore tasks. This needs to be configured on the FE of both the source and target clusters.ALTER DATABASE $db SET PROPERTIES ("binlog.ttl_seconds" = "xxxx")Downstream BE download speed is slow:
max_download_speed_kbps: Maximum download speed limit for a single download thread on a downstream BE, the default is 50MB/s.download_worker_count: Number of threads performing download tasks on the downstream BE, the default is 1. This should be adjusted based on the customer’s machine type. The goal is to increase the thread count to the maximum without affecting normal read/write operations. If this parameter is adjusted, there's no need to modify max_download_speed_kbps.download_worker_count should be set to 4, without changing the max_download_speed_kbps.Limit downstream BE's binlog download speed:
download_binlog_rate_limit_kbs=1024 # Limit the speed of binlog (including Local Snapshot) download from the source cluster to 1MB/s for each BE node.
Detailed parameters and explanations:
download_binlog_rate_limit_kbs parameter is configured on the source cluster BE nodes. Setting this parameter effectively limits the data pull speed.download_binlog_rate_limit_kbs parameter mainly controls the speed of a single BE node. If the overall speed of the cluster is calculated, the parameter value should be multiplied by the number of BE nodes in the cluster.