| # Intro To Schema Feature |
| |
| ## Why We Need Schema |
| |
| Some NoSQL databases or message queue are not strongly limited schema, so the schema cannot be obtained through the api. |
| At this time, a schema needs to be defined to convert to TableSchema and obtain data. |
| |
| ## SchemaOptions |
| |
| We can use SchemaOptions to define schema, the SchemaOptions contains some configs to define the schema. e.g. columns, primaryKey, constraintKeys. |
| |
| ``` |
| schema = { |
| table = "database.schema.table" |
| schema_first = false |
| comment = "comment" |
| columns = [ |
| ... |
| ] |
| primaryKey { |
| ... |
| } |
| |
| constraintKeys { |
| ... |
| } |
| } |
| ``` |
| |
| ### table |
| |
| The table full name of the table identifier which the schema belongs to, it contains database, schema, table name. e.g. `database.schema.table`, `database.table`, `table`. |
| |
| ### schema_first |
| |
| Default is false. |
| |
| If the schema_first is true, the schema will be used first, this means if we set `table = "a.b"`, `a` will be parsed as schema rather than database, then we can support write `table = "schema.table"`. |
| |
| ### comment |
| |
| The comment of the CatalogTable which the schema belongs to. |
| |
| ### Columns |
| |
| Columns is a list of configs used to define the column in schema, each column can contains name, type, nullable, defaultValue, comment field. |
| |
| ``` |
| columns = [ |
| { |
| name = id |
| type = bigint |
| nullable = false |
| columnLength = 20 |
| defaultValue = 0 |
| comment = "primary key id" |
| } |
| ] |
| ``` |
| |
| | Field | Required | Default Value | Description | |
| |:-------------|:---------|:--------------|----------------------------------------------------------------------------------| |
| | name | Yes | - | The name of the column | |
| | type | Yes | - | The data type of the column | |
| | nullable | No | true | If the column can be nullable | |
| | columnLength | No | 0 | The length of the column which will be useful when you need to define the length | |
| | columnScale | No | - | The scale of the column which will be useful when you need to define the scale | |
| | defaultValue | No | null | The default value of the column | |
| | comment | No | null | The comment of the column | |
| |
| #### What type supported at now |
| |
| | Data type | Value type in Java | Description | |
| |:-------------|:---------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | string | `java.lang.String` | string | |
| | boolean | `java.lang.Boolean` | boolean | |
| | tinyint | `java.lang.Byte` | -128 to 127 regular. 0 to 255 unsigned*. Specify the maximum number of digits in parentheses. | |
| | smallint | `java.lang.Short` | -32768 to 32767 General. 0 to 65535 unsigned*. Specify the maximum number of digits in parentheses. | |
| | int | `java.lang.Integer` | All numbers from -2,147,483,648 to 2,147,483,647 are allowed. | |
| | bigint | `java.lang.Long` | All numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 are allowed. | |
| | float | `java.lang.Float` | Float-precision numeric data from -1.79E+308 to 1.79E+308. | |
| | double | `java.lang.Double` | Double precision floating point. Handle most decimals. | |
| | decimal | `java.math.BigDecimal` | Double type stored as a string, allowing a fixed decimal point. | |
| | null | `java.lang.Void` | null | |
| | bytes | `byte[]` | bytes | |
| | date | `java.time.LocalDate` | Only the date is stored. From January 1, 0001 to December 31, 9999. | |
| | time | `java.time.LocalTime` | Only store time. Accuracy is 100 nanoseconds. | |
| | timestamp | `java.time.LocalDateTime` | Stores date and time information without time zone. Represents the time of an event in local time. It does not include any offset or zone information. | |
| | timestamp_tz | `java.time.OffsetDateTime` | Stores date and time information with an offset from UTC. It includes both the local date-time and the offset from UTC, providing more precise temporal information when working with multiple time zones. | |
| | row | `org.apache.seatunnel.api.table.type.SeaTunnelRow` | Row type, can be nested. | |
| | map | `java.util.Map` | A Map is an object that maps keys to values. The key type includes `int` `string` `boolean` `tinyint` `smallint` `bigint` `float` `double` `decimal` `date` `time` `timestamp` `null` , and the value type includes `int` `string` `boolean` `tinyint` `smallint` `bigint` `float` `double` `decimal` `date` `time` `timestamp` `null` `array` `map` `row`. | |
| | array | `ValueType[]` | A array is a data type that represents a collection of elements. The element type includes `int` `string` `boolean` `tinyint` `smallint` `bigint` `float` `double`. | |
| |
| #### How to declare type supported |
| |
| SeaTunnel provides a simple and direct way to declare basic types. Basic type keywords include `string`, `boolean`, `tinyint`, `smallint`, `int`, `bigint`, `float`, `double`, `date`, `time`, `timestamp`, and `null`. The keyword names for basic types can be used directly as type declarations, and SeaTunnel is case-insensitive to type keywords. For example, if you need to declare a field with integer type, you can simply define the field as `int` or `"int"`. |
| |
| > The null type declaration must be enclosed in double quotes, like `"null"`. This approach helps avoid confusion with [HOCON](https://github.com/lightbend/config/blob/main/HOCON.md)'s `null` type which represents undefined object. |
| |
| When declaring complex types (such as **decimal**, **array**, **map**, and **row**), pay attention to specific considerations. |
| - When declaring a decimal type, precision and scale settings are required, and the type definition follows the format `decimal(precision, scale)`. It's essential to emphasize that the declaration of the decimal type must be enclosed in `"`; you cannot use the type name directly, as with basic types. For example, when declaring a decimal field with precision 10 and scale 2, you specify the field type as `"decimal(10,2)"`. |
| - When declaring an array type, you need to specify the element type, and the type definition follows the format `array<T>`, where `T` represents the element type. The element type includes `int`,`string`,`boolean`,`tinyint`,`smallint`,`bigint`,`float` and `double`. Similar to the decimal type declaration, it also be enclosed in `"`. For example, when declaring a field with an array of integers, you specify the field type as `"array<int>"`. |
| - When declaring a map type, you need to specify the key and value types. The map type definition follows the format `map<K,V>`, where `K` represents the key type and `V` represents the value type. `K` can be any basic type and decimal type, and `V` can be any type supported by SeaTunnel. Similar to previous type declarations, the map type declaration must be enclosed in double quotes. For example, when declaring a field with map type, where the key type is string and the value type is integer, you can declare the field as `"map<string, int>"`. |
| - When declaring a row type, you need to define a [HOCON](https://github.com/lightbend/config/blob/main/HOCON.md) object to describe the fields and their types. The field types can be any type supported by SeaTunnel. For example, when declaring a row type containing an integer field `a` and a string field `b`, you can declare it as `{a = int, b = string}`. Enclosing the definition in `"` as a string is also acceptable, so `"{a = int, b = string}"` is equivalent to `{a = int, c = string}`. Since HOCON is compatible with JSON, `"{\"a\":\"int\", \"b\":\"string\"}"` is equivalent to `"{a = int, b = string}"`. |
| |
| Here is an example of complex type declarations: |
| |
| ```hocon |
| schema { |
| fields { |
| c_decimal = "decimal(10, 2)" |
| c_array = "array<int>" |
| c_row = { |
| c_int = int |
| c_string = string |
| c_row = { |
| c_int = int |
| } |
| } |
| # Hocon style declare row type in generic type |
| map0 = "map<string, {c_int = int, c_string = string, c_row = {c_int = int}}>" |
| # Json style declare row type in generic type |
| map1 = "map<string, {\"c_int\":\"int\", \"c_string\":\"string\", \"c_row\":{\"c_int\":\"int\"}}>" |
| } |
| } |
| ``` |
| |
| ### PrimaryKey |
| |
| Primary key is a config used to define the primary key in schema, it contains name, columns field. |
| |
| ``` |
| primaryKey { |
| name = id |
| columns = [id] |
| } |
| ``` |
| |
| | Field | Required | Default Value | Description | |
| |:--------|:---------|:--------------|-----------------------------------| |
| | name | Yes | - | The name of the primaryKey | |
| | columns | Yes | - | The column list in the primaryKey | |
| |
| ### ConstraintKeys |
| |
| Constraint keys is a list of config used to define the constraint keys in schema, it contains constraintName, constraintType, constraintColumns field. |
| |
| ``` |
| constraintKeys = [ |
| { |
| constraintName = "id_index" |
| constraintType = KEY |
| constraintColumns = [ |
| { |
| columnName = "id" |
| sortType = ASC |
| } |
| ] |
| }, |
| ] |
| ``` |
| |
| | Field | Required | Default Value | Description | |
| |:------------------|:---------|:--------------|-------------------------------------------------------------------------------------------------------------------------------------------| |
| | constraintName | Yes | - | The name of the constraintKey | |
| | constraintType | No | KEY | The type of the constraintKey | |
| | constraintColumns | Yes | - | The column list in the primaryKey, each column should contains constraintType and sortType, sortType support ASC and DESC, default is ASC | |
| |
| #### What constraintType supported at now |
| |
| | ConstraintType | Description | |
| |:---------------|:------------| |
| | INDEX_KEY | key | |
| | UNIQUE_KEY | unique key | |
| |
| ## Multi table schemas |
| |
| ``` |
| tables_configs = [ |
| { |
| schema { |
| table = "database.schema.table1" |
| schema_first = false |
| comment = "comment" |
| columns = [ |
| ... |
| ] |
| primaryKey { |
| ... |
| } |
| constraintKeys { |
| ... |
| } |
| } |
| }, |
| { |
| schema = { |
| table = "database.schema.table2" |
| schema_first = false |
| comment = "comment" |
| columns = [ |
| ... |
| ] |
| primaryKey { |
| ... |
| } |
| constraintKeys { |
| ... |
| } |
| } |
| } |
| ] |
| |
| ``` |
| |
| ## How to use schema |
| |
| ### Recommended |
| |
| ``` |
| source { |
| FakeSource { |
| parallelism = 2 |
| plugin_output = "fake" |
| row.num = 16 |
| schema { |
| table = "FakeDatabase.FakeTable" |
| columns = [ |
| { |
| name = id |
| type = bigint |
| nullable = false |
| defaultValue = 0 |
| comment = "primary key id" |
| }, |
| { |
| name = name |
| type = "string" |
| nullable = true |
| comment = "name" |
| }, |
| { |
| name = age |
| type = int |
| nullable = true |
| comment = "age" |
| } |
| ] |
| primaryKey { |
| name = "id" |
| columnNames = [id] |
| } |
| constraintKeys = [ |
| { |
| constraintName = "unique_name" |
| constraintType = UNIQUE_KEY |
| constraintColumns = [ |
| { |
| columnName = "name" |
| sortType = ASC |
| } |
| ] |
| }, |
| ] |
| } |
| } |
| } |
| ``` |
| |
| ### Deprecated |
| |
| If you only need to define the column, you can use fields to define the column, this is a simple way but will be remove in the future. |
| |
| ``` |
| source { |
| FakeSource { |
| parallelism = 2 |
| plugin_output = "fake" |
| row.num = 16 |
| schema = { |
| fields { |
| id = bigint |
| c_map = "map<string, smallint>" |
| c_array = "array<tinyint>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(2, 1)" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| } |
| } |
| } |
| } |
| ``` |
| |
| ## When we should use it or not |
| |
| If there is a `schema` configuration project in Options,the connector can then customize the schema. Like `Fake` `Pulsar` `Http` source connector etc. |