Since https://github.com/apache/seatunnel/issues/1608 We Added Connector V2 Features. Connector V2 is a connector defined based on the SeaTunnel Connector API interface. Unlike Connector V1, V2 supports the following features:
Source connectors have some common core features, and each source connector supports them to varying degrees.
If each piece of data in the data source will only be sent downstream by the source once, we think this source connector supports exactly once.
In SeaTunnel, we can save the read Split and its offset (The position of the read data in split at that time, such as line number, byte size, offset, etc.) as StateSnapshot when checkpointing. If the task restarted, we will get the last StateSnapshot and then locate the Split and offset read last time and continue to send data downstream.
For example File
, Kafka
.
If the connector supports reading only specified columns from the data source (Note that if you read all columns first and then filter unnecessary columns through the schema, this method is not a real column projection)
For example JDBCSource
can use sql to define reading columns.
KafkaSource
will read all content from topic and then use schema
to filter unnecessary columns, This is not column projection
.
Batch Job Mode, The data read is bounded and the job will stop after completing all data read.
Streaming Job Mode, The data read is unbounded and the job never stop.
Parallelism Source Connector support config parallelism
, every parallelism will create a task to read the data. In the Parallelism Source Connector, the source will be split into multiple splits, and then the enumerator will allocate the splits to the SourceReader for processing.
Support multimodal data integration, including structured and unstructured text data, video, images, binary files, etc.
User can config the split rule.
Supports reading multiple tables in one SeaTunnel job
Sink connectors have some common core features, and each sink connector supports them to varying degrees.
When any piece of data flows into a distributed system, if the system processes any piece of data accurately only once in the whole processing process and the processing results are correct, it is considered that the system meets the exact once consistency.
For sink connector, the sink connector supports exactly-once if any piece of data only write into target once. There are generally two ways to achieve this:
MySQL
, Kudu
.File
, MySQL
.If a sink connector supports writing row kinds(INSERT/UPDATE_BEFORE/UPDATE_AFTER/DELETE) based on primary key, we think it supports cdc(change data capture).
Supports write multiple tables in one SeaTunnel job, users can dynamically specify the table's identifier by configuring placeholders.
Support multimodal data integration, including structured and unstructured text data, video, images, binary files, etc.