| import ChangeLog from '../changelog/connector-fake.md'; |
| |
| # FakeSource |
| |
| > FakeSource connector |
| |
| ## Support Those Engines |
| |
| > Spark<br/> |
| > Flink<br/> |
| > SeaTunnel Zeta<br/> |
| |
| ## Description |
| |
| The FakeSource is a virtual data source, which randomly generates the number of rows according to the data structure of the user-defined schema, |
| just for some test cases such as type conversion or connector new feature testing |
| |
| ## Key Features |
| |
| - [x] [batch](../../concept/connector-v2-features.md) |
| - [x] [stream](../../concept/connector-v2-features.md) |
| - [ ] [exactly-once](../../concept/connector-v2-features.md) |
| - [x] [column projection](../../concept/connector-v2-features.md) |
| - [ ] [parallelism](../../concept/connector-v2-features.md) |
| - [ ] [support user-defined split](../../concept/connector-v2-features.md) |
| |
| ## Source Options |
| |
| | Name | Type | Required | Default | Description | |
| |-------------------------|----------|----------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | tables_configs | list | no | - | Define Multiple FakeSource, each item can contains the whole fake source config description below | |
| | schema | config | yes | - | Define Schema information | |
| | auto.increment.enabled | boolean | no | false | Enable auto increment ID generation | |
| | auto.increment.start | int | no | | Starting value for auto increment ID | |
| | rows | config | no | - | The row list of fake data output per degree of parallelism see title `Options rows Case`. | |
| | row.num | int | no | 5 | The total number of data generated per degree of parallelism | |
| | split.num | int | no | 1 | the number of splits generated by the enumerator for each degree of parallelism | |
| | split.read-interval | long | no | 1 | The interval(mills) between two split reads in a reader | |
| | map.size | int | no | 5 | The size of `map` type that connector generated | |
| | array.size | int | no | 5 | The size of `array` type that connector generated | |
| | bytes.length | int | no | 5 | The length of `bytes` type that connector generated | |
| | string.length | int | no | 5 | The length of `string` type that connector generated | |
| | string.fake.mode | string | no | range | The fake mode of generating string data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `string.template` option | |
| | string.template | list | no | - | The template list of string type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | tinyint.fake.mode | string | no | range | The fake mode of generating tinyint data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `tinyint.template` option | |
| | tinyint.min | tinyint | no | 0 | The min value of tinyint data that connector generated | |
| | tinyint.max | tinyint | no | 127 | The max value of tinyint data that connector generated | |
| | tinyint.template | list | no | - | The template list of tinyint type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | smallint.fake.mode | string | no | range | The fake mode of generating smallint data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `smallint.template` option | |
| | smallint.min | smallint | no | 0 | The min value of smallint data that connector generated | |
| | smallint.max | smallint | no | 32767 | The max value of smallint data that connector generated | |
| | smallint.template | list | no | - | The template list of smallint type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | int.fake.template | string | no | range | The fake mode of generating int data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `int.template` option | |
| | int.min | int | no | 0 | The min value of int data that connector generated | |
| | int.max | int | no | 0x7fffffff | The max value of int data that connector generated | |
| | int.template | list | no | - | The template list of int type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | bigint.fake.mode | string | no | range | The fake mode of generating bigint data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `bigint.template` option | |
| | bigint.min | bigint | no | 0 | The min value of bigint data that connector generated | |
| | bigint.max | bigint | no | 0x7fffffffffffffff | The max value of bigint data that connector generated | |
| | bigint.template | list | no | - | The template list of bigint type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | float.fake.mode | string | no | range | The fake mode of generating float data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `float.template` option | |
| | float.min | float | no | 0 | The min value of float data that connector generated | |
| | float.max | float | no | 0x1.fffffeP+127 | The max value of float data that connector generated | |
| | float.template | list | no | - | The template list of float type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | double.fake.mode | string | no | range | The fake mode of generating float data, support `range` and `template`, default `range`,if use configured it to `template`, user should also configured `double.template` option | |
| | double.min | double | no | 0 | The min value of double data that connector generated | |
| | double.max | double | no | 0x1.fffffffffffffP+1023 | The max value of double data that connector generated | |
| | double.template | list | no | - | The template list of double type that connector generated, if user configured it, connector will randomly select an item from the template list | |
| | vector.dimension | int | no | 4 | Dimension of the generated vector, excluding binary vectors | |
| | binary.vector.dimension | int | no | 8 | Dimension of the generated binary vector | |
| | vector.float.min | float | no | 0 | The min value of float data in vector that connector generated | |
| | vector.float.max | float | no | 0x1.fffffeP+127 | The max value of float data in vector that connector generated | |
| | common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details | |
| |
| ## Task Example |
| |
| ### Simple |
| |
| > This example Randomly generates data of a specified type. If you want to learn how to declare field types, click [here](../../concept/schema-feature.md#how-to-declare-type-supported). |
| |
| ```hocon |
| schema = { |
| fields { |
| c_map = "map<string, array<int>>" |
| c_map_nest = "map<string, {c_int = int, c_string = string}>" |
| c_array = "array<int>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(30, 8)" |
| c_null = "null" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| c_row = { |
| c_map = "map<string, map<string, string>>" |
| c_array = "array<int>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(30, 8)" |
| c_null = "null" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| } |
| } |
| } |
| ``` |
| |
| ### Random Generation |
| |
| > 16 data matching the type are randomly generated |
| |
| ```hocon |
| source { |
| # This is a example input plugin **only for test and demonstrate the feature input plugin** |
| FakeSource { |
| row.num = 16 |
| schema = { |
| fields { |
| c_map = "map<string, string>" |
| c_array = "array<int>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(30, 8)" |
| c_null = "null" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| } |
| } |
| plugin_output = "fake" |
| } |
| } |
| ``` |
| |
| ### Customize the data content Simple |
| |
| > This is a self-defining data source information, defining whether each piece of data is an add or delete modification operation, and defining what each field stores |
| |
| ```hocon |
| source { |
| FakeSource { |
| schema = { |
| fields { |
| c_map = "map<string, string>" |
| c_array = "array<int>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(30, 8)" |
| c_null = "null" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| } |
| } |
| rows = [ |
| { |
| kind = INSERT |
| fields = [{"a": "b"}, [101], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"] |
| } |
| { |
| kind = UPDATE_BEFORE |
| fields = [{"a": "c"}, [102], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"] |
| } |
| { |
| kind = UPDATE_AFTER |
| fields = [{"a": "e"}, [103], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"] |
| } |
| { |
| kind = DELETE |
| fields = [{"a": "f"}, [104], "c_string", true, 117, 15987, 56387395, 7084913402530365000, 1.23, 1.23, "2924137191386439303744.39292216", null, "bWlJWmo=", "2023-04-22", "2023-04-22T23:20:58"] |
| } |
| ] |
| } |
| } |
| ``` |
| |
| > Due to the constraints of the [HOCON](https://github.com/lightbend/config/blob/main/HOCON.md) specification, users cannot directly create byte sequence objects. FakeSource uses strings to assign `bytes` type values. In the example above, the `bytes` type field is assigned `"bWlJWmo="`, which is encoded from "miIZj" with **base64**. Hence, when assigning values to `bytes` type fields, please use strings encoded with **base64**. |
| |
| ### Specified Data number Simple |
| |
| > This case specifies the number of data generated and the length of the generated value |
| |
| ```hocon |
| FakeSource { |
| row.num = 10 |
| map.size = 10 |
| array.size = 10 |
| bytes.length = 10 |
| string.length = 10 |
| schema = { |
| fields { |
| c_map = "map<string, array<int>>" |
| c_array = "array<int>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(30, 8)" |
| c_null = "null" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| c_row = { |
| c_map = "map<string, map<string, string>>" |
| c_array = "array<int>" |
| c_string = string |
| c_boolean = boolean |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| c_decimal = "decimal(30, 8)" |
| c_null = "null" |
| c_bytes = bytes |
| c_date = date |
| c_timestamp = timestamp |
| } |
| } |
| } |
| } |
| ``` |
| |
| ### Template data Simple |
| |
| > Randomly generated according to the specified template |
| |
| Using template |
| |
| ```hocon |
| FakeSource { |
| row.num = 5 |
| string.fake.mode = "template" |
| string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"] |
| tinyint.fake.mode = "template" |
| tinyint.template = [1, 2, 3, 4, 5, 6, 7, 8, 9] |
| smalling.fake.mode = "template" |
| smallint.template = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] |
| int.fake.mode = "template" |
| int.template = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] |
| bigint.fake.mode = "template" |
| bigint.template = [30, 31, 32, 33, 34, 35, 36, 37, 38, 39] |
| float.fake.mode = "template" |
| float.template = [40.0, 41.0, 42.0, 43.0] |
| double.fake.mode = "template" |
| double.template = [44.0, 45.0, 46.0, 47.0] |
| schema { |
| fields { |
| c_string = string |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| } |
| } |
| } |
| ``` |
| |
| ### Range data Simple |
| |
| > The specified data generation range is randomly generated |
| |
| ```hocon |
| FakeSource { |
| row.num = 5 |
| string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"] |
| tinyint.min = 1 |
| tinyint.max = 9 |
| smallint.min = 10 |
| smallint.max = 19 |
| int.min = 20 |
| int.max = 29 |
| bigint.min = 30 |
| bigint.max = 39 |
| float.min = 40.0 |
| float.max = 43.0 |
| double.min = 44.0 |
| double.max = 47.0 |
| schema { |
| fields { |
| c_string = string |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| } |
| } |
| } |
| ``` |
| |
| ### Generate Multiple tables |
| |
| > This is a case of generating a multi-data source test.table1 and test.table2 |
| |
| ```hocon |
| FakeSource { |
| tables_configs = [ |
| { |
| row.num = 16 |
| schema { |
| table = "test.table1" |
| fields { |
| c_string = string |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| } |
| } |
| }, |
| { |
| row.num = 17 |
| schema { |
| table = "test.table2" |
| fields { |
| c_string = string |
| c_tinyint = tinyint |
| c_smallint = smallint |
| c_int = int |
| c_bigint = bigint |
| c_float = float |
| c_double = double |
| } |
| } |
| } |
| ] |
| } |
| ``` |
| |
| ### Options `rows` Case |
| |
| ```hocon |
| rows = [ |
| { |
| kind = INSERT |
| fields = [1, "A", 100] |
| }, |
| { |
| kind = UPDATE_BEFORE |
| fields = [1, "A", 100] |
| }, |
| { |
| kind = UPDATE_AFTER |
| fields = [1, "A_1", 100] |
| }, |
| { |
| kind = DELETE |
| fields = [1, "A_1", 100] |
| } |
| ] |
| ``` |
| |
| ### Options `table-names` Case |
| |
| ```hocon |
| |
| source { |
| # This is a example source plugin **only for test and demonstrate the feature source plugin** |
| FakeSource { |
| table-names = ["test.table1", "test.table2", "test.table3"] |
| parallelism = 1 |
| schema = { |
| fields { |
| name = "string" |
| age = "int" |
| } |
| } |
| } |
| } |
| ``` |
| |
| ### Options `defaultValue` Case |
| |
| Custom data can be generated by `row` and `columns`. For the time type, obtain the current time by |
| `CURRENT_TIMESTAMP` 、`CURRENT_TIME` 、 `CURRENT_DATE` |
| |
| ```hocon |
| schema = { |
| fields { |
| pk_id = bigint |
| name = string |
| score = int |
| time1 = timestamp |
| time2 = time |
| time3 = date |
| } |
| } |
| # use rows |
| rows = [ |
| { |
| kind = INSERT |
| fields = [1, "A", 100, CURRENT_TIMESTAMP, CURRENT_TIME, CURRENT_DATE] |
| } |
| ] |
| ``` |
| |
| ```hocon |
| schema = { |
| # use columns |
| columns = [ |
| { |
| name = book_publication_time |
| type = timestamp |
| defaultValue = "2024-09-12 15:45:30" |
| comment = "book publication time" |
| }, |
| { |
| name = book_publication_time2 |
| type = timestamp |
| defaultValue = CURRENT_TIMESTAMP |
| comment = "book publication time2" |
| }, |
| { |
| name = book_publication_time3 |
| type = time |
| defaultValue = "15:45:30" |
| comment = "book publication time3" |
| }, |
| { |
| name = book_publication_time4 |
| type = time |
| defaultValue = CURRENT_TIME |
| comment = "book publication time4" |
| }, |
| { |
| name = book_publication_time5 |
| type = date |
| defaultValue = "2024-09-12" |
| comment = "book publication time5" |
| }, |
| { |
| name = book_publication_time6 |
| type = date |
| defaultValue = CURRENT_DATE |
| comment = "book publication time6" |
| } |
| ] |
| } |
| ``` |
| |
| ### Use Vector Example |
| |
| ```hocon |
| |
| source { |
| FakeSource { |
| row.num = 10 |
| # Low priority |
| vector.dimension= 4 |
| binary.vector.dimension = 8 |
| # Low priority |
| schema = { |
| table = "simple_example" |
| columns = [ |
| { |
| name = book_id |
| type = bigint |
| nullable = false |
| defaultValue = 0 |
| comment = "primary key id" |
| }, |
| { |
| name = book_intro_1 |
| type = binary_vector |
| columnScale =8 |
| comment = "vector" |
| }, |
| { |
| name = book_intro_2 |
| type = float16_vector |
| columnScale =4 |
| comment = "vector" |
| }, |
| { |
| name = book_intro_3 |
| type = bfloat16_vector |
| columnScale =4 |
| comment = "vector" |
| }, |
| { |
| name = book_intro_4 |
| type = sparse_float_vector |
| columnScale =4 |
| comment = "vector" |
| } |
| ] |
| } |
| } |
| } |
| |
| |
| ``` |
| |
| ### Auto-increment primary key Example |
| |
| ```hocon |
| |
| source { |
| # This is a example source plugin **only for test and demonstrate the feature source plugin** |
| FakeSource { |
| plugin_output = "fake" |
| auto.increment.enabled = true |
| auto.increment.start = 1000 |
| row.num = 50000 |
| schema = { |
| fields { |
| id = "int" |
| name = "string" |
| age = "int" |
| } |
| primaryKey { |
| name = "pk" |
| columnNames = [id] |
| } |
| } |
| } |
| } |
| |
| ``` |
| |
| ## Changelog |
| |
| <ChangeLog /> |