English | 中文
Auxiliary tools for Apache SeaTunnel focusing on developer/operator productivity around configuration, conversion, LLM integration, packaging, and diagnostics.
| Tool | Purpose | Status |
|---|---|---|
| SeaTunnel Skill | Claude AI integration for SeaTunnel operations | ✅ New |
| SeaTunnel MCP Server | Model Context Protocol for LLM integration | ✅ Available |
| x2seatunnel | Configuration converter (DataX → SeaTunnel) | ✅ Available |
Installation & Setup:
# 1. Clone this repository git clone https://github.com/apache/seatunnel-tools.git cd seatunnel-tools # 2. Copy seatunnel-skill to Claude Code skills directory cp -r seatunnel-skill ~/.claude/skills/ # 3. Restart Claude Code or reload skills # Then use: /seatunnel-skill "your prompt here"
Quick Example:
# Query SeaTunnel documentation /seatunnel-skill "How do I configure a MySQL to PostgreSQL job?" # Get connector information /seatunnel-skill "List all available Kafka connector options" # Debug configuration issues /seatunnel-skill "Why is my job failing with OutOfMemoryError?"
# Download binary (recommended) wget https://archive.apache.org/dist/seatunnel/2.3.12/apache-seatunnel-2.3.12-bin.tar.gz tar -xzf apache-seatunnel-2.3.12-bin.tar.gz cd apache-seatunnel-2.3.12 # Verify installation ./bin/seatunnel.sh --version # Run your first job ./bin/seatunnel.sh -c config/hello_world.conf -e spark
Step 1: Copy Skill File
mkdir -p ~/.claude/skills cp -r seatunnel-skill ~/.claude/skills/
Step 2: Verify Installation
# In Claude Code, try: /seatunnel-skill "What is SeaTunnel?"
Step 3: Start Using
# Help with configuration /seatunnel-skill "Create a MySQL to Elasticsearch job config" # Troubleshoot errors /seatunnel-skill "My Kafka connector keeps timing out" # Learn features /seatunnel-skill "Explain CDC (Change Data Capture) in SeaTunnel"
Supported Platforms: Linux, macOS, Windows
# Download latest version VERSION=2.3.12 wget https://archive.apache.org/dist/seatunnel/${VERSION}/apache-seatunnel-${VERSION}-bin.tar.gz # Extract tar -xzf apache-seatunnel-${VERSION}-bin.tar.gz cd apache-seatunnel-${VERSION} # Set environment export JAVA_HOME=/path/to/java export PATH=$PATH:$(pwd)/bin # Verify seatunnel.sh --version
# Clone repository git clone https://github.com/apache/seatunnel.git cd seatunnel # Build mvn clean install -DskipTests # Run from distribution cd seatunnel-dist/target/apache-seatunnel-*-bin/apache-seatunnel-* ./bin/seatunnel.sh --version
# Pull official image docker pull apache/seatunnel:latest # Run container docker run -it apache/seatunnel:latest /bin/bash # Run job directly docker run -v /path/to/config:/config \ apache/seatunnel:latest \ seatunnel.sh -c /config/job.conf -e spark
config/mysql_to_postgres.conf
env { job.mode = "BATCH" job.name = "MySQL to PostgreSQL" } source { Jdbc { driver = "com.mysql.cj.jdbc.Driver" url = "jdbc:mysql://mysql-host:3306/mydb" user = "root" password = "password" query = "SELECT * FROM users" connection_check_timeout_sec = 100 } } sink { Jdbc { driver = "org.postgresql.Driver" url = "jdbc:postgresql://pg-host:5432/mydb" user = "postgres" password = "password" database = "mydb" table = "users" primary_keys = ["id"] connection_check_timeout_sec = 100 } }
Run:
seatunnel.sh -c config/mysql_to_postgres.conf -e spark
config/kafka_to_es.conf
env { job.mode = "STREAMING" job.name = "Kafka to Elasticsearch" parallelism = 2 } source { Kafka { bootstrap.servers = "kafka-host:9092" topic = "events" consumer.group = "seatunnel-group" format = "json" schema = { fields { event_id = "bigint" event_name = "string" timestamp = "bigint" } } } } sink { Elasticsearch { hosts = ["es-host:9200"] index = "events" username = "elastic" password = "password" } }
Run:
seatunnel.sh -c config/kafka_to_es.conf -e flink
config/mysql_cdc_kafka.conf
env { job.mode = "STREAMING" job.name = "MySQL CDC to Kafka" } source { Mysql { server_id = 5400 hostname = "mysql-host" port = 3306 username = "root" password = "password" database = ["mydb"] table = ["users", "orders"] startup.mode = "initial" } } sink { Kafka { bootstrap.servers = "kafka-host:9092" topic = "mysql_cdc" format = "canal_json" semantic = "EXACTLY_ONCE" } }
Run:
seatunnel.sh -c config/mysql_cdc_kafka.conf -e flink
Source Connectors
Jdbc - Generic JDBC databases (MySQL, PostgreSQL, Oracle, SQL Server)Kafka - Apache Kafka topicsMysql - MySQL with CDC supportMongoDB - MongoDB collectionsPostgreSQL - PostgreSQL with CDCS3 - Amazon S3 and compatible storageHttp - HTTP/HTTPS endpointsFakeSource - For testingSink Connectors
Jdbc - Write to JDBC-compatible databasesKafka - Publish to Kafka topicsElasticsearch - Write to Elasticsearch indicesS3 - Write to S3 bucketsRedis - Write to RedisHBase - Write to HBase tablesConsole - Output to consoleTransform Connectors
Sql - Execute SQL transformationsFieldMapper - Rename/map columnsJsonPath - Extract data from JSON# Java configuration export JAVA_HOME=/path/to/java export JVM_OPTS="-Xms1G -Xmx4G" # Spark configuration (if using Spark engine) export SPARK_HOME=/path/to/spark export SPARK_MASTER=spark://master:7077 # Flink configuration (if using Flink engine) export FLINK_HOME=/path/to/flink # SeaTunnel configuration export SEATUNNEL_HOME=/path/to/seatunnel
env { job.mode = "BATCH" parallelism = 8 # Increase for larger clusters } source { Jdbc { split_size = 100000 # Parallel reads fetch_size = 5000 } } sink { Jdbc { batch_size = 1000 # Batch inserts max_retries = 3 } }
env { job.mode = "STREAMING" parallelism = 4 checkpoint.interval = 30000 # 30 seconds } source { Kafka { consumer.group = "seatunnel-consumer" max_poll_records = 500 } }
seatunnel-tools/ ├── seatunnel-skill/ # Claude Code AI skill ├── seatunnel-mcp/ # MCP server for LLM integration ├── x2seatunnel/ # DataX to SeaTunnel converter └── README.md
seatunnel/ ├── seatunnel-api/ # Core APIs ├── seatunnel-core/ # Execution engine ├── seatunnel-engines/ # Engine implementations │ ├── seatunnel-engine-flink/ │ ├── seatunnel-engine-spark/ │ └── seatunnel-engine-zeta/ ├── seatunnel-connectors/ # Connector implementations └── seatunnel-dist/ # Distribution package
# Full build git clone https://github.com/apache/seatunnel.git cd seatunnel mvn clean install -DskipTests # Build specific module mvn clean install -pl seatunnel-connectors/seatunnel-connectors-seatunnel-kafka -DskipTests
# Unit tests mvn test # Specific test class mvn test -Dtest=MySqlConnectorTest # Integration tests mvn verify
Solution:
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.33.jar cp mysql-connector-java-8.0.33.jar $SEATUNNEL_HOME/lib/ seatunnel.sh -c config/job.conf -e spark
Solution:
export JVM_OPTS="-Xms2G -Xmx8G" echo 'JVM_OPTS="-Xms2G -Xmx8G"' >> $SEATUNNEL_HOME/bin/seatunnel-env.sh
Solution:
# Verify connectivity ping source-host telnet source-host 3306 # Check credentials mysql -h source-host -u root -p
Solution:
-- Check binlog status SHOW VARIABLES LIKE 'log_bin'; -- Enable binlog in my.cnf [mysqld] log_bin = mysql-bin binlog_format = row
Solution:
env { parallelism = 8 # Increase parallelism } source { Jdbc { fetch_size = 5000 split_size = 100000 } } sink { Jdbc { batch_size = 2000 } }
Solution:
source { Kafka { auto.offset.reset = "earliest" # or "latest" } }
Q: What's the difference between BATCH and STREAMING mode?
A:
Q: How do I handle schema changes during CDC?
A: Configure auto-detection in source:
source { Mysql { schema_change_mode = "auto" } }
Q: Can I transform data during synchronization?
A: Yes, use SQL transform:
transform { Sql { sql = "SELECT id, UPPER(name) as name FROM source" } }
Q: What's the maximum throughput?
A: Typical throughput is 100K - 1M records/second per executor. Depends on:
Q: How do I handle errors in production?
A: Configure restart strategy:
env { restart_strategy = "exponential_delay" restart_strategy.exponential_delay.initial_delay = 1000 restart_strategy.exponential_delay.max_delay = 30000 restart_strategy.exponential_delay.multiplier = 2.0 }
Q: Is there a web UI for job management?
A: Yes! Use SeaTunnel Web Project:
git clone https://github.com/apache/seatunnel-web.git cd seatunnel-web mvn clean install java -jar target/seatunnel-web-*.jar # Access at http://localhost:8080
Q: How do I use the SeaTunnel Skill with Claude Code?
A: After copying to ~/.claude/skills/, use:
/seatunnel-skill "your question about SeaTunnel"
Q: Which engine should I use: Spark, Flink, or Zeta?
A:
cp -r seatunnel-skill ~/.claude/skills//seatunnel-skill "your question"Issues and PRs are welcome!
For the main SeaTunnel engine, see Apache SeaTunnel.
For these tools, please contribute to SeaTunnel Tools.
Last Updated: 2026-01-28 | License: Apache 2.0