| import Tabs from '@theme/Tabs'; |
| import TabItem from '@theme/TabItem'; |
| |
| # Fake |
| |
| > Fake source connector |
| |
| ## Description |
| |
| `Fake` is mainly used to conveniently generate user-specified data, which is used as input for functional verification, testing, and performance testing of seatunnel. |
| |
| :::note |
| |
| Engine Supported and plugin name |
| |
| * [x] Spark: Fake, FakeStream |
| * [x] Flink: FakeSource, FakeSourceStream |
| * Flink `Fake Source` is mainly used to automatically generate data. The data has only two columns. The first column is of `String type` and the content is a random one from `["Gary", "Ricky Huo", "Kid Xiong"]` . The second column is of `Int type` , which is the current 13-digit timestamp is used as input for functional verification and testing of `seatunnel` . |
| |
| ::: |
| |
| ## Options |
| |
| <Tabs |
| groupId="engine-type" |
| defaultValue="spark" |
| values={[ |
| {label: 'Spark', value: 'spark'}, |
| {label: 'Flink', value: 'flink'}, |
| ]}> |
| <TabItem value="spark"> |
| |
| :::note |
| |
| These options is for Spark:`FakeStream`, and Spark:`Fake` do not have any options |
| |
| ::: |
| |
| | name | type | required | default value | |
| | -------------- | ------ | -------- | ------------- | |
| | content | array | no | - | |
| | rate | number | yes | - | |
| | common-options | string | yes | - | |
| |
| ### content [array] |
| |
| List of test data strings |
| |
| ### rate [number] |
| |
| Number of test cases generated per second |
| |
| </TabItem> |
| <TabItem value="flink"> |
| |
| | name | type | required | default value | |
| |--------------------|----------------------|----------|---------------| |
| | parallelism | `Int` | no | - | |
| | common-options | `string` | no | - | |
| | mock_data_schema | list [column_config] | no | see details. | |
| | mock_data_size | int | no | 300 | |
| | mock_data_interval | int (second) | no | 1 | |
| |
| ### parallelism [`Int`] |
| |
| The parallelism of an individual operator, for Fake Source Stream |
| |
| </TabItem> |
| </Tabs> |
| |
| ### common options [string] |
| |
| Source plugin common parameters, please refer to [Source Plugin](common-options.mdx) for details |
| |
| ### mock_data_schema Option [list[column_config]] |
| |
| Config mock data's schema. Each is column_config option. |
| |
| When mock_data_schema is not defined. Data will generate with schema like this: |
| ```bash |
| mock_data_schema = [ |
| { |
| name = "name", |
| type = "string", |
| mock_config = { |
| string_seed = ["Gary", "Ricky Huo", "Kid Xiong"] |
| size_range = [1,1] |
| } |
| } |
| { |
| name = "age", |
| type = "int", |
| mock_config = { |
| int_range = [1, 100] |
| } |
| } |
| ] |
| ``` |
| |
| column_config option type. |
| |
| | name | type | required | default value | support values | |
| |-------------|-------------|----------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | name | string | yes | string | - | |
| | type | string | yes | string | int,integer,byte,boolean,char,<br/>character,short,long,float,double,<br/>date,timestamp,decimal,bigdecimal,<br/>bigint,int[],byte[],<br/>boolean[],char[],character[],short[],<br/>long[],float[],double[],string[],<br/>binary,varchar | |
| | mock_config | mock_config | no | - | - | |
| |
| mock_config Option |
| |
| | name | type | required | default value | sample | |
| |---------------|-----------------------|----------|---------------|------------------------------------------| |
| | byte_range | list[byte] [size=2] | no | - | [0,127] | |
| | boolean_seed | list[boolean] | no | - | [true, true, false] | |
| | char_seed | list[char] [size=2] | no | - | ['a','b','c'] | |
| | date_range | list[string] [size=2] | no | - | ["1970-01-01", "2100-12-31"] | |
| | decimal_scale | int | no | - | 2 | |
| | double_range | list[double] [size=2] | no | - | [0.0, 10000.0] | |
| | float_range | list[flout] [size=2] | no | - | [0.0, 10000.0] | |
| | int_range | list[int] [size=2] | no | - | [0, 100] | |
| | long_range | list[long] [size=2] | no | - | [0, 100000] | |
| | number_regex | string | no | - | "[1-9]{1}\\d?" | |
| | time_range | list[int] [size=6] | no | - | [0,24,0,60,0,60] | |
| | size_range | list[int] [size=2] | no | - | [6,10] | |
| | string_regex | string | no | - | "[a-z0-9]{5}\\@\\w{3}\\.[a-z]{3}" | |
| | string_seed | list[string] | no | - | ["Gary", "Ricky Huo", "Kid Xiong"] | |
| |
| ### mock_data_size Option [int] |
| |
| Config mock data size. |
| |
| ### mock_data_interval Option [int] |
| |
| Config the data can mock with interval, The unit is SECOND. |
| |
| ## Examples |
| |
| <Tabs |
| groupId="engine-type" |
| defaultValue="spark" |
| values={[ |
| {label: 'Spark', value: 'spark'}, |
| {label: 'Flink', value: 'flink'}, |
| ]}> |
| <TabItem value="spark"> |
| |
| ### Fake |
| |
| ```bash |
| Fake { |
| result_table_name = "my_dataset" |
| } |
| ``` |
| |
| ### FakeStream |
| |
| ```bash |
| fakeStream { |
| content = ['name=ricky&age=23', 'name=gary&age=28'] |
| rate = 5 |
| } |
| ``` |
| |
| The generated data is as follows, randomly extract the string from the `content` list |
| |
| ```bash |
| +-----------------+ |
| |raw_message | |
| +-----------------+ |
| |name=gary&age=28 | |
| |name=ricky&age=23| |
| +-----------------+ |
| ``` |
| |
| </TabItem> |
| <TabItem value="flink"> |
| |
| ### FakeSourceStream |
| |
| |
| |
| ```bash |
| source { |
| FakeSourceStream { |
| result_table_name = "fake" |
| field_name = "name,age" |
| } |
| } |
| ``` |
| |
| ### FakeSource |
| |
| ```bash |
| source { |
| FakeSource { |
| result_table_name = "fake" |
| field_name = "name,age" |
| mock_data_size = 100 // will generate 100 rows mock data. |
| } |
| } |
| ``` |
| |
| </TabItem> |
| </Tabs> |