commit	0567e2af27461bed76b1658da290225efcefbafd	[log] [tgz]
author	lvlv <40759793+lvlv-feifei@users.noreply.github.com>	Thu Oct 27 11:47:50 2022 +0800
committer	GitHub <noreply@github.com>	Thu Oct 27 11:47:50 2022 +0800
tree	ae1f5e053532a88c0011089deff3daee09094c6d
parent	1ece805ab1fce3ec8e8c9d3ddc5311ab62ac8873 [diff]

[Core][Metrics] Add Seatunnel Metrics module (#2888)

* seatunnel-metrics init commit

* seatunnel-metrics config add

* codeStyle update

* codeStyle update again

* codeStyle seatunnel-spark update

* [Imporve][Connector-V2]Parameter verification for connector V2 kafka sink (#2866)

* parameter verification

* update

* update

* [Improve][DOC] Perfect the connector v2 doc (#2800)

* [Improve][DOC] Perfect the connector v2 doc

* Update seatunnel-connectors-v2/README.zh.md

Co-authored-by: Hisoka <fanjiaeminem@qq.com>

* [Improve][DOC] A little tinkering

* [Improve][DOC] A little tinkering

* [Doc][connector] add Console sink doc

close #2794

* [Doc][connector] add Console sink doc

close #2794

* fix some problem

* fix some problem

* fine tuning

Co-authored-by: Hisoka <fanjiaeminem@qq.com>

* add seatunnel-examples from gitignore (#2892)

* [Improve][connector-jdbc] Calculate splits only once in JdbcSourceSplitEnumerator (#2900)

* [Bug][Connector-V2] Fix wechat sink data serialization (#2856)

* [Improve][Connector-V2] Improve orc write strategy to support all data types (#2860)

* [Improve][Connector-V2] Improve orc write strategy to support all data types

Co-authored-by: tyrantlucifer <tyrantlucifer@gmail.com>

* [Bug][seatunnel-translation-base] Fix Source restore state NPE (#2878)

* [Improve][Connector-v2-Fake]Supports direct definition of data values(row) (#2839)

* [Improve][Connector-v2]Supports direct definition of data values(row)

* seatunnel-prometheus update

* seatunnel-prometheus update

* seatunnel-prometheus update

* 1. Seatunnel unified configuration naming
2. Use reflection to automate assembly
3. Modify the flink/spark startup function
4. Try packaging configuration (todo)

Co-authored-by: TaoZex <45089228+TaoZex@users.noreply.github.com>
Co-authored-by: liugddx <804167098@qq.com>
Co-authored-by: Hisoka <fanjiaeminem@qq.com>
Co-authored-by: Eric <gaojun2048@gmail.com>
Co-authored-by: Xiao Zhao <zhaomin1423@163.com>
Co-authored-by: hailin0 <wanghailin@apache.org>
Co-authored-by: tyrantlucifer <tyrantlucifer@gmail.com>
Co-authored-by: Laglangyue <35491928+laglangyue@users.noreply.github.com>

62 files changed

tree: ae1f5e053532a88c0011089deff3daee09094c6d

README.md

Apache SeaTunnel (Incubating)

SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.

SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies.

Why do we need SeaTunnel

SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data:

Data loss and duplication
Task accumulation and delay
Low throughput
Long cycle to be applied in the production environment
Lack of application running status monitoring

SeaTunnel use scenarios

Mass data synchronization
Mass data integration
ETL with massive data
Mass data aggregation
Multi-source data processing

Features of SeaTunnel

Easy to use, flexible configuration, low code development
Real-time streaming
Offline multi-source data analysis
High-performance, massive data processing capabilities
Modular and plug-in mechanism, easy to extend
Support data processing and aggregation by SQL
Support Spark structured streaming
Support Spark 2.x

Workflow of SeaTunnel

Source[Data Source Input] -> Transform[Data Processing] -> Sink[Result Output]

The data processing pipeline is constituted by multiple filters to meet a variety of data processing needs. If you are accustomed to SQL, you can also directly construct a data processing pipeline by SQL, which is simple and efficient. Currently, the filter list supported by SeaTunnel is still being expanded. Furthermore, you can develop your own data processing plug-in, because the whole system is easy to expand.

Plugins supported by SeaTunnel

Connectors supported check out
Transform supported check out

Environmental dependency

java runtime environment, java >= 8
If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:

Spark on Yarn
Spark Standalone

If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark and Flink.

Compiling project

Follow this document.

Downloads

Download address for run-directly software package : https://seatunnel.apache.org/download

Quick start

Spark https://seatunnel.apache.org/docs/deployment

Flink https://seatunnel.apache.org/docs/deployment

Detailed documentation on SeaTunnel https://seatunnel.apache.org/docs/intro/about

Application practice cases

Weibo, Value-added Business Department Data Platform

Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task monitoring for hundreds of real-time streaming computing tasks.

Sina, Big Data Operation Analysis Platform

Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and maintenance for Sina News, CDN and other services, and write it into Clickhouse.

Sogou, Sogou Qiqian System

Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.

Qutoutiao, Qutoutiao Data Center

Qutoutiao Data Center uses SeaTunnel to support mysql to hive offline ETL tasks, real-time hive to clickhouse backfill technical support, and well covers most offline and real-time tasks needs.

Yixia Technology, Yizhibo Data Platform
Yonghui Superstores Founders' Alliance-Yonghui Yunchuang Technology, Member E-commerce Data Analysis Platform

SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a new retail brand of Yonghui Yunchuang Technology.

Shuidichou, Data Platform

Shuidichou adopts SeaTunnel to do real-time streaming and regular offline batch processing on Yarn, processing 3~4T data volume average daily, and later writing the data to Clickhouse.

Tencent Cloud

Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through Seatunnel, and then store into Clickhouse.

For more use cases, please refer to: https://seatunnel.apache.org/blog

Code of conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please follow the REPORTING GUIDELINES to report unacceptable behavior.

Developer

Thanks to all developers!

Contact Us

Mail list: dev@seatunnel.apache.org. Mail to dev-subscribe@seatunnel.apache.org, follow the reply to subscribe the mail list.
Slack: https://join.slack.com/t/apacheseatunnel/shared_invite/zt-123jmewxe-RjB_DW3M3gV~xL91pZ0oVQ
Twitter: https://twitter.com/ASFSeaTunnel
Bilibili (for Chinese users)

Landscapes

Our Users

Various companies and organizations use SeaTunnel for research, production and commercial products. Visit our website to find the user page.

License

Apache 2.0 License.