blob: 72ad6c9a61502afb2ec9d27ba7ac3af96855ce0f [file] [log] [blame]
[[debezium-component]]
= Debezium Component
:page-source: components/camel-debezium/src/main/docs/debezium-component.adoc
*Available as of Camel version 3.0*
The Debezium component is wrapper around https://debezium.io/[Debezium] using https://debezium.io/documentation/reference/0.9/operations/embedded.html[Debezium Embedded], which enables Change Data Capture from various databases using Debezium without the need for Kafka or Kafka Connect.
*Note on handling failures:* Per https://debezium.io/documentation/reference/0.9/operations/embedded.html#_handling_failures[Debezium Embedded Engine] documentation, the engines is actively recording source offsets and periodically flushes these offsets to a persistent storage, so when the application is restarted or crashed, the engine will resume from the last recorded offset.
Thus, at normal operation, your downstream routes will receive each event exactly once, however in case of an application crash (not having a graceful shutdown), the application will resume from the last recorded offset,
which may result in receiving duplicate events immediately after the restart. Therefore, your downstream routes should be tolerant enough of such case and deduplicate events if needed.
Maven users will need to add the following dependency to their `pom.xml`
for this component.
[source,xml]
------------------------------------------------------------
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-debezium</artifactId>
<version>x.x.x</version>
<!-- use the same version as your Camel core version -->
</dependency>
------------------------------------------------------------
== URI format
[source,java]
---------------------------
debezium:connector-type[?options]
---------------------------
== Supported Debezium Connectors
- https://debezium.io/documentation/reference/0.9/connectors/mysql.html[MySQL].
*Note:* Other Debezium connectors are _not_ supported at the moment.
*Note:* Due to licensing issues, you will need to add the dependency for `mysql-conenctor-java` if you are using MySQL connector, just add the following to your POM file:
[source,xml]
------------------------------------------------------------
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.15</version>
</dependency>
------------------------------------------------------------
== Options
// component options: START
The Debezium component supports 2 options, which are listed below.
[width="100%",cols="2,5,^1,2",options="header"]
|===
| Name | Description | Default | Type
| *configuration* (consumer) | Allow pre-configured Configurations to be set, you will need to extend EmbeddedDebeziumConfiguration in order to create the configuration for the component | | EmbeddedDebeziumConfiguration
| *basicPropertyBinding* (advanced) | Whether the component should use basic property binding (Camel 2.x) or the newer property binding with additional capabilities | false | boolean
|===
// component options: END
// endpoint options: START
The Debezium endpoint is configured using URI syntax:
----
debezium:connectorType
----
with the following path and query parameters:
=== Path Parameters (1 parameters):
[width="100%",cols="2,5,^1,2",options="header"]
|===
| Name | Description | Default | Type
| *connectorType* | *Required* The Debezium connector type that is supported by Camel Debezium component. | | String
|===
=== Query Parameters (47 parameters):
[width="100%",cols="2,5,^1,2",options="header"]
|===
| Name | Description | Default | Type
| *bridgeErrorHandler* (consumer) | Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions occurred while the consumer is trying to pickup incoming messages, or the likes, will now be processed as a message and handled by the routing Error Handler. By default the consumer will use the org.apache.camel.spi.ExceptionHandler to deal with exceptions, that will be logged at WARN or ERROR level and ignored. | false | boolean
| *internalKeyConverter* (consumer) | The Converter class that should be used to serialize and deserialize key data for offsets. The default is JSON converter. | org.apache.kafka.connect.json.JsonConverter | String
| *internalValueConverter* (consumer) | The Converter class that should be used to serialize and deserialize value data for offsets. The default is JSON converter. | org.apache.kafka.connect.json.JsonConverter | String
| *name* (consumer) | *Required* Unique name for the connector. Attempting to register again with the same name will fail. | | String
| *offsetCommitPolicy* (consumer) | The name of the Java class of the commit policy. It defines when offsets commit has to be triggered based on the number of events processed and the time elapsed since the last commit. This class must implement the interface .OffsetCommitPolicy. The default is a periodic commit policy based upon time intervals. | io.debezium.embedded.spi.OffsetCommitPolicy.PeriodicCommitOffsetPolicy | String
| *offsetCommitTimeoutMs* (consumer) | Maximum number of milliseconds to wait for records to flush and partition offset data to be committed to offset storage before cancelling the process and restoring the offset data to be committed in a future attempt. The default is 5 seconds. | 5000 | long
| *offsetFlushIntervalMs* (consumer) | Interval at which to try committing offsets. The default is 1 minute. | 60000 | long
| *offsetStorage* (consumer) | The name of the Java class that is responsible for persistence of connector offsets. | org.apache.kafka.connect.storage.FileOffsetBackingStore | String
| *offsetStorageFileName* (consumer) | Path to file where offsets are to be stored. Required when offset.storage is set to the FileOffsetBackingStore | | String
| *offsetStoragePartitions* (consumer) | The number of partitions used when creating the offset storage topic. Required when offset.storage is set to the .KafkaOffsetBackingStore. | | int
| *offsetStorageReplication Factor* (consumer) | Replication factor used when creating the offset storage topic. Required when offset.storage is set to the KafkaOffsetBackingStore | | int
| *offsetStorageTopic* (consumer) | The name of the Kafka topic where offsets are to be stored. Required when offset.storage is set to the KafkaOffsetBackingStore. | | String
| *exceptionHandler* (consumer) | To let the consumer use a custom ExceptionHandler. Notice if the option bridgeErrorHandler is enabled then this option is not in use. By default the consumer will deal with exceptions, that will be logged at WARN or ERROR level and ignored. | | ExceptionHandler
| *exchangePattern* (consumer) | Sets the exchange pattern when the consumer creates an exchange. | | ExchangePattern
| *basicPropertyBinding* (advanced) | Whether the endpoint should use basic property binding (Camel 2.x) or the newer property binding with additional capabilities | false | boolean
| *synchronous* (advanced) | Sets whether synchronous processing should be strictly used, or Camel is allowed to use asynchronous processing (if supported). | false | boolean
| *bigintUnsignedHandlingMode* (mysql) | Specifies how BIGINT UNSIGNED columns should be represented in change events, including: precise uses java.math.BigDecimal to represent values, which are encoded in the change events using a binary representation and Kafka Connects org.apache.kafka.connect.data.Decimal type; long (the default) represents values using Javas long, which may not offer the precision but will be far easier to use in consumers. long is usually the preferable setting. Only when working with values larger than 263, the precise setting should be used as those values cant be conveyed using long. See Data types. | long | String
| *columnBlacklist* (mysql) | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName. | | String
| *connectTimeoutMs* (mysql) | A positive integer value that specifies the maximum time in milliseconds this connector should wait after trying to connect to the MySQL database server before timing out. Defaults to 30 seconds. | 30000 | long
| *databaseBlacklist* (mysql) | An optional comma-separated list of regular expressions that match database names to be excluded from monitoring; any database name not included in the blacklist will be monitored. May not be used with database.whitelist. | | String
| *databaseHistory* (mysql) | The name of the DatabaseHistory class that should be used to store and recover database schema changes. | io.debezium.relational.history.FileDatabaseHistory | String
| *databaseHistoryFileName* (mysql) | The path to the file that will be used to record the database history | | String
| *databaseHistoryKafka BootstrapServers* (mysql) | The full name of the Kafka topic where the connector will store the database schema history. | | String
| *databaseHistoryKafkaTopic* (mysql) | The full name of the Kafka topic where the connector will store the database schema history. | | String
| *databaseHostName* (mysql) | *Required* IP address or hostname of the target database server. | | String
| *databasePassword* (mysql) | *Required* Password to use when connecting to the database server. | | String
| *databasePort* (mysql) | Integer port number of the database server. | 3306 | int
| *databaseServerId* (mysql) | *Required* A numeric ID of this database client, which must be unique across all currently-running database processes in the database cluster. This connector joins the database cluster as another server (with this unique ID) so it can read the binlog. | | int
| *databaseServerName* (mysql) | *Required* Logical name that identifies and provides a namespace for the particular database server/cluster being monitored. | | String
| *databaseUser* (mysql) | *Required* Name of the MySQL database to use when connecting to the database server. | | String
| *databaseWhitelist* (mysql) | An optional comma-separated list of regular expressions that match database names to be monitored; any database name not included in the whitelist will be excluded from monitoring. By default all databases will be monitored. May not be used with database.blacklist. | | String
| *ddlParserMode* (mysql) | Controls which parser should be used for parsing DDL statements when building up the meta-model of the captured database structure. Can be one of legacy (for the legacy hand-written parser implementation) or antlr (for new Antlr based implementation introduced in Debezium 0.8.0). While the legacy parser remains the default for Debezium 0.8.x, please try out the new implementation and report back any issues you encounter. The new parser is the default as of 0.9. The legacy parser as well as this configuration property has been removed as of 0.10. | antlr | String
| *decimalHandlingMode* (mysql) | Specifies how the connector should handle values for DECIMAL and NUMERIC columns: precise (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form; or double represents them using double values, which may result in a loss of precision but will be far easier to use. string option encodes values as formatted string which is easy to consume but a semantic information about the real type is lost. See Decimal values. | precise | String
| *eventDeserializationFailure HandlingMode* (mysql) | Specifies how the connector should react to exceptions during deserialization of binlog events. fail will propagate the exception (indicating the problematic event and its binlog offset), causing the connector to stop. warn will cause the problematic event to be skipped and the problematic event and its binlog offset to be logged (make sure that the logger is set to the WARN or ERROR level). ignore will cause problematic event will be skipped. | fail | String
| *gtidNewChannelPosition* (mysql) | When set to latest, when the connector sees a new GTID channel, it will start consuming from the last executed transaction in that GTID channel. If set to earliest, the connector starts reading that channel from the first available (not purged) GTID position. earliest is useful when you have a active-passive MySQL setup where Debezium is connected to master, in this case during failover the slave with new UUID (and GTID channel) starts receiving writes before Debezium is connected. These writes would be lost when using latest. | latest | String
| *gtidSourceExcludes* (mysql) | A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching none of these exclude patterns will be used. May not be used with gtid.source.includes. | | String
| *gtidSourceIncludes* (mysql) | A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching one of these include patterns will be used. May not be used with gtid.source.excludes. | | String
| *includeQuery* (mysql) | Boolean value that specifies whether the connector should include the original SQL query that generated the change event. Note: This option requires MySQL be configured with the binlog_rows_query_log_events option set to ON. Query will not be present for events generated from the snapshot process. Warning: Enabling this option may expose tables or fields explicitly blacklisted or masked by including the original SQL statement in the change event. For this reason this option is defaulted to 'false'. | false | boolean
| *includeSchemaChanges* (mysql) | Boolean value that specifies whether the connector should publish changes in the database schema to a Kafka topic with the same name as the database server ID. Each schema change will be recorded using a key that contains the database name and whose value includes the DDL statement(s). This is independent of how the connector internally records database history. The default is true. | true | boolean
| *inconsistentSchemaHandling Mode* (mysql) | Specifies how the connector should react to binlog events that relate to tables that are not present in internal schema representation (i.e. internal representation is not consistent with database) fail will throw an exception (indicating the problematic event and its binlog offset), causing the connector to stop. warn will cause the problematic event to be skipped and the problematic event and its binlog offset to be logged (make sure that the logger is set to the WARN or ERROR level). ignore will cause the problematic event to be skipped. | fail | String
| *maxBatchSize* (mysql) | Positive integer value that specifies the maximum size of each batch of events that should be processed during each iteration of this connector. Defaults to 2048. | 2048 | int
| *maxQueueSize* (mysql) | Positive integer value that specifies the maximum size of the blocking queue into which change events read from the database log are placed before they are written to Kafka. This queue can provide backpressure to the binlog reader when, for example, writes to Kafka are slower or if Kafka is not available. Events that appear in the queue are not included in the offsets periodically recorded by this connector. Defaults to 8192, and should always be larger than the maximum batch size specified in the max.batch.size property. | 8192 | int
| *pollIntervalMs* (mysql) | Positive integer value that specifies the number of milliseconds the connector should wait during each iteration for new change events to appear. Defaults to 1000 milliseconds, or 1 second. | 1000 | long
| *tableBlacklist* (mysql) | An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be excluded from monitoring; any table not included in the blacklist will be monitored. Each identifier is of the form databaseName.tableName. May not be used with table.whitelist. | | String
| *tableWhitelist* (mysql) | An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored; any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default the connector will monitor every non-system table in each monitored database. May not be used with table.blacklist. | | String
| *timePrecisionMode* (mysql) | Time, date, and timestamps can be represented with different kinds of precision, including: adaptive_time_microseconds (the default) captures the date, datetime and timestamp values exactly as in the database using either millisecond, microsecond, or nanosecond precision values based on the database columns type, with the exception of TIME type fields, which are always captured as microseconds; adaptive (deprecated) captures the time and timestamp values exactly as in the database using either millisecond, microsecond, or nanosecond precision values based on the database columns type; or connect always represents time and timestamp values using Kafka Connects built-in representations for Time, Date, and Timestamp, which uses millisecond precision regardless of the database columns' precision. See Temporal values. | adaptive_time_microseconds | String
| *tombstonesOnDelete* (mysql) | Controls whether a tombstone event should be generated after a delete event. When true the delete operations are represented by a delete event and a subsequent tombstone event. When false only a delete event is sent. Emitting the tombstone event (the default behavior) allows Kafka to completely delete all events pertaining to the given key once the source record got deleted. | false | boolean
|===
// endpoint options: END
// spring-boot-auto-configure options: START
== Spring Boot Auto-Configuration
When using Spring Boot make sure to use the following Maven dependency to have support for auto configuration:
[source,xml]
----
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-debezium-starter</artifactId>
<version>x.x.x</version>
<!-- use the same version as your Camel core version -->
</dependency>
----
The component supports 3 options, which are listed below.
[width="100%",cols="2,5,^1,2",options="header"]
|===
| Name | Description | Default | Type
| *camel.component.debezium.basic-property-binding* | Whether the component should use basic property binding (Camel 2.x) or the newer property binding with additional capabilities | false | Boolean
| *camel.component.debezium.configuration* | Allow pre-configured Configurations to be set, you will need to extend EmbeddedDebeziumConfiguration in order to create the configuration for the component. The option is a org.apache.camel.component.debezium.configuration.EmbeddedDebeziumConfiguration type. | | String
| *camel.component.debezium.enabled* | Whether to enable auto configuration of the debezium component. This is enabled by default. | | Boolean
|===
// spring-boot-auto-configure options: END
For more information about configuration:
https://debezium.io/documentation/reference/0.9/operations/embedded.html#engine-properties[https://debezium.io/documentation/reference/0.9/operations/embedded.html#engine-properties]
https://debezium.io/documentation/reference/0.9/connectors/mysql.html#connector-properties[https://debezium.io/documentation/reference/0.9/connectors/mysql.html#connector-properties]
== Message headers
=== Consumer headers
The following headers are available when consuming change events from Debezium.
[width="100%",cols="2m,2m,1m,5",options="header"]
|===
| Header constant | Header value | Type | Description
| DebeziumConstants.HEADER_IDENTIFIER | "CamelDebeziumIdentifier" | String | The identifier of the connector, normally is this format "{server-name}.{database-name}.{table-name}".
| DebeziumConstants.HEADER_KEY | "CamelDebeziumKey" | Struct | The key of the event, normally is the table Primary Key.
| DebeziumConstants.HEADER_SOURCE_METADATA | "CamelDebeziumSourceMetadata" | Map | The metadata about the source event, for example `table` name, database `name`, log position, etc, please refer to the Debezium documentation for more info.
| DebeziumConstants.HEADER_OPERATION | "CamelDebeziumOperation" | String | If presents, the type of event operation. Values for the connector are `c` for create (or insert), `u` for update, `d` for delete or `r` in case of a snapshot event.
| DebeziumConstants.HEADER_TIMESTAMP | "CamelDebeziumTimestamp" | Long | If presents, the time (using the system clock in the JVM) at which the connector processed the event.
| DebeziumConstants.HEADER_BEFORE | "CamelDebeziumBefore" | Struct | If presents, contains the state of the row before the event occurred.
|===
== Message body
The message body if is not `null` (in case of tombstones), it contains the state of the row after the event occurred as `Struct` format or `Map` format if you use the included Type Converter from `Struct` to `Map` (please look below for more explanation).
== Samples
=== Consuming events
Here is a very simple route that you can use in order to listen to Debezium events from MySQL connector.
[source,java]
----
from("debezium:mysql?name=dbz-test-1&offsetStorageFileName=/usr/offset-file-1.dat&databaseHostName=localhost&databaseUser=debezium&databasePassword=dbz&databaseServerName=my-app-connector&databaseHistoryFileName=/usr/history-file-1.dat")
.log("Event received from Debezium : ${body}")
.log(" with this identifier ${headers.CamelDebeziumIdentifier}")
.log(" with these source metadata ${headers.CamelDebeziumSourceMetadata}")
.log(" the event occured upon this operation '${headers.CamelDebeziumSourceOperation}'")
.log(" on this database '${headers.CamelDebeziumSourceMetadata[db]}' and this table '${headers.CamelDebeziumSourceMetadata[table]}'")
.log(" with the key ${headers.CamelDebeziumKey}")
.log(" the previous value is ${headers.CamelDebeziumBefore}")
----
By default, the component will emit the events in the body and `CamelDebeziumBefore` header as https://kafka.apache.org/22/javadoc/org/apache/kafka/connect/data/Struct.html[`Struct`] data type, the reasoning behind this, is to perceive the schema information in case is needed.
However, the component as well contains a xref:manual::type-converter.adoc[Type Converter] that converts
from default output type of https://kafka.apache.org/22/javadoc/org/apache/kafka/connect/data/Struct.html[`Struct`] to `Map` in order to leverage Camel's rich xref:manual::data-format.adoc[Data Format] types which many of them work out of box with `Map` data type.
To use it, you can either add `Map.class` type when you access the message e.g: `exchange.getIn().getBody(Map.class)`, or you can convert the body always to `Map` from the route builder by adding `.convertBodyTo(Map.class)` to your Camel Route DSL after `from` statement.
We mentioned above about the schema, which can be used in case you need to perform advance data transformation and the schema is needed for that. If you choose not to convert your body to `Map`,
you can obtain the schema information as https://kafka.apache.org/22/javadoc/org/apache/kafka/connect/data/Schema.html[`Schema`] type from `Struct` like this:
[source,java]
----
from("debezium:[connectorType]?[options]])
.process(exchange -> {
final Struct bodyValue = exchange.getIn().getBody(Struct.class);
final Schema schemaValue = bodyValue.schema();
log.info("Body value is :" + bodyValue);
log.info("With Schema : " + schemaValue);
log.info("And fields of :" + schemaValue.fields());
log.info("Field name has `" + schemaValue.field("name").schema() + "` type");
});
----
*Important Note:* This component is a thin wrapper around Debezium Engine as mentioned, therefore before using this component in production, you need to understand how Debezium works and how configurations can reflect the expected behavior, especially in regards to https://debezium.io/documentation/reference/0.9/operations/embedded.html#_handling_failures[handling failures].