blob: 95980afca69f70bb32a9ab4feec53d9d4a019444 [file] [log] [blame]
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Fake
> Fake source connector
## Description
`Fake` is mainly used to conveniently generate user-specified data, which is used as input for functional verification, testing, and performance testing of seatunnel.
:::note
Engine Supported and plugin name
* [x] Spark: Fake, FakeStream
* [x] Flink: FakeSource, FakeSourceStream
* Flink `Fake Source` is mainly used to automatically generate data. The data has only two columns. The first column is of `String type` and the content is a random one from `["Gary", "Ricky Huo", "Kid Xiong"]` . The second column is of `Int type` , which is the current 13-digit timestamp is used as input for functional verification and testing of `seatunnel` .
:::
## Options
<Tabs
groupId="engine-type"
defaultValue="spark"
values={[
{label: 'Spark', value: 'spark'},
{label: 'Flink', value: 'flink'},
]}>
<TabItem value="spark">
:::note
These options is for Spark:`FakeStream`, and Spark:`Fake` do not have any options
:::
| name | type | required | default value |
| -------------- | ------ | -------- | ------------- |
| content | array | no | - |
| rate | number | yes | - |
| common-options | string | yes | - |
### content [array]
List of test data strings
### rate [number]
Number of test cases generated per second
</TabItem>
<TabItem value="flink">
| name | type | required | default value |
|--------------------|----------------------|----------|---------------|
| parallelism | `Int` | no | - |
| common-options | `string` | no | - |
| mock_data_schema | list [column_config] | no | see details. |
| mock_data_size | int | no | 300 |
| mock_data_interval | int (second) | no | 1 |
### parallelism [`Int`]
The parallelism of an individual operator, for Fake Source Stream
</TabItem>
</Tabs>
### common options [string]
Source plugin common parameters, please refer to [Source Plugin](common-options.mdx) for details
### mock_data_schema Option [list[column_config]]
Config mock data's schema. Each is column_config option.
When mock_data_schema is not defined. Data will generate with schema like this:
```bash
mock_data_schema = [
{
name = "name",
type = "string",
mock_config = {
string_seed = ["Gary", "Ricky Huo", "Kid Xiong"]
size_range = [1,1]
}
}
{
name = "age",
type = "int",
mock_config = {
int_range = [1, 100]
}
}
]
```
column_config option type.
| name | type | required | default value | support values |
|-------------|-------------|----------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name | string | yes | string | - |
| type | string | yes | string | int,integer,byte,boolean,char,<br/>character,short,long,float,double,<br/>date,timestamp,decimal,bigdecimal,<br/>bigint,int[],byte[],<br/>boolean[],char[],character[],short[],<br/>long[],float[],double[],string[],<br/>binary,varchar |
| mock_config | mock_config | no | - | - |
mock_config Option
| name | type | required | default value | sample |
|---------------|-----------------------|----------|---------------|------------------------------------------|
| byte_range | list[byte] [size=2] | no | - | [0,127] |
| boolean_seed | list[boolean] | no | - | [true, true, false] |
| char_seed | list[char] [size=2] | no | - | ['a','b','c'] |
| date_range | list[string] [size=2] | no | - | ["1970-01-01", "2100-12-31"] |
| decimal_scale | int | no | - | 2 |
| double_range | list[double] [size=2] | no | - | [0.0, 10000.0] |
| float_range | list[flout] [size=2] | no | - | [0.0, 10000.0] |
| int_range | list[int] [size=2] | no | - | [0, 100] |
| long_range | list[long] [size=2] | no | - | [0, 100000] |
| number_regex | string | no | - | "[1-9]{1}\\d?" |
| time_range | list[int] [size=6] | no | - | [0,24,0,60,0,60] |
| size_range | list[int] [size=2] | no | - | [6,10] |
| string_regex | string | no | - | "[a-z0-9]{5}\\@\\w{3}\\.[a-z]{3}" |
| string_seed | list[string] | no | - | ["Gary", "Ricky Huo", "Kid Xiong"] |
### mock_data_size Option [int]
Config mock data size.
### mock_data_interval Option [int]
Config the data can mock with interval, The unit is SECOND.
## Examples
<Tabs
groupId="engine-type"
defaultValue="spark"
values={[
{label: 'Spark', value: 'spark'},
{label: 'Flink', value: 'flink'},
]}>
<TabItem value="spark">
### Fake
```bash
Fake {
result_table_name = "my_dataset"
}
```
### FakeStream
```bash
fakeStream {
content = ['name=ricky&age=23', 'name=gary&age=28']
rate = 5
}
```
The generated data is as follows, randomly extract the string from the `content` list
```bash
+-----------------+
|raw_message |
+-----------------+
|name=gary&age=28 |
|name=ricky&age=23|
+-----------------+
```
</TabItem>
<TabItem value="flink">
### FakeSourceStream
```bash
source {
FakeSourceStream {
result_table_name = "fake"
field_name = "name,age"
}
}
```
### FakeSource
```bash
source {
FakeSource {
result_table_name = "fake"
field_name = "name,age"
mock_data_size = 100 // will generate 100 rows mock data.
}
}
```
</TabItem>
</Tabs>