METALAKE

Since Seatunnel requires database usernames, passwords, and other sensitive information to be written in plaintext within scripts when executing tasks, this may lead to information leakage and is also difficult to maintain. When data source information changes, manual modifications are often required.

To address this, Metalake is introduced. Data source information can be stored in Metalake systems such as Apache Gravitino. Task scripts then use sourceId and placeholders instead of actual usernames and passwords. At runtime, the Seatunnel engine retrieves the information from Metalake via HTTP requests and replaces the placeholders accordingly.

To enable Metalake, you first need to modify the environment variables in seatunnel-env.sh:

  • METALAKE_ENABLED
  • METALAKE_TYPE
  • METALAKE_URL

Set METALAKE_ENABLED to true. Currently, METALAKE_TYPE only supports gravitino.

For Apache Gravitino, set METALAKE_URL to:

http://host:port/api/metalakes/your_metalake_name/catalogs/

Usage Example

First, create a catalog in Gravitino, for example:

curl -L 'http://localhost:8090/api/metalakes/test_metalake/catalogs' \
-H 'Content-Type: application/json' \
-H 'Accept: application/vnd.gravitino.v1+json' \
-d '{
    "name": "test_catalog",
    "type": "relational",
    "provider": "jdbc-mysql",
    "comment": "for metalake test",
    "properties": {
        "jdbc-driver": "com.mysql.cj.jdbc.Driver",
        "jdbc-url": "not used",
        "jdbc-user": "root",
        "jdbc-password": "Abc!@#135_seatunnel"
    }
}'

This creates a test_catalog under test_metalake (note: metalake itself must be created in advance).

Thus, METALAKE_URL can be set to:

http://localhost:8090/api/metalakes/test_metalake/catalogs/

You can then define the source as:

source {
    Jdbc {
        url = "jdbc:mysql://mysql-e2e:3306/seatunnel?useSSL=false&serverTimezone=UTC&allowPublicKeyRetrieval=true"
        driver = "${jdbc-driver}"
        connection_check_timeout_sec = 100
        sourceId = "test_catalog"
        user = "${jdbc-user}"
        password = "${jdbc-password}"
        query = "select * from source"
    }
}

Here, sourceId refers to the catalog name, allowing other fields to use ${} placeholders. At runtime, they will be automatically replaced. Note that in sinks, the same sourceId name is used, and placeholders must always start with ${ and end with }. Each item can contain at most one placeholder, and there can be content outside the placeholder as well.