| import Tabs from '@theme/Tabs'; |
| import TabItem from '@theme/TabItem'; |
| |
| # File |
| |
| > File sink connector |
| |
| ## Description |
| |
| Output data to local or hdfs file. |
| |
| :::tip |
| |
| Engine Supported and plugin name |
| |
| * [x] Spark: File |
| * [x] Flink: File |
| |
| ::: |
| |
| ## Options |
| |
| <Tabs |
| groupId="engine-type" |
| defaultValue="spark" |
| values={[ |
| {label: 'Spark', value: 'spark'}, |
| {label: 'Flink', value: 'flink'}, |
| ]}> |
| <TabItem value="spark"> |
| |
| | name | type | required | default value | |
| | ---------------- | ------ | -------- | -------------- | |
| | options | object | no | - | |
| | partition_by | array | no | - | |
| | path | string | yes | - | |
| | path_time_format | string | no | yyyyMMddHHmmss | |
| | save_mode | string | no | error | |
| | serializer | string | no | json | |
| | common-options | string | no | - | |
| |
| ### options [object] |
| |
| Custom parameters |
| |
| ### partition_by [array] |
| |
| Partition data based on selected fields |
| |
| ### path [string] |
| |
| The file path is required. The `hdfs file` starts with `hdfs://` , and the `local file` starts with `file://`, |
| we can add the variable `${now}` or `${uuid}` in the path, like `hdfs:///test_${uuid}_${now}.txt`, |
| `${now}` represents the current time, and its format can be defined by specifying the option `path_time_format` |
| |
| ### path_time_format [string] |
| |
| When the format in the `path` parameter is `xxxx-${now}` , `path_time_format` can specify the time format of the path, and the default value is `yyyy.MM.dd` . The commonly used time formats are listed as follows: |
| |
| | Symbol | Description | |
| | ------ | ------------------ | |
| | y | Year | |
| | M | Month | |
| | d | Day of month | |
| | H | Hour in day (0-23) | |
| | m | Minute in hour | |
| | s | Second in minute | |
| |
| See [Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html) for detailed time format syntax. |
| |
| ### save_mode [string] |
| |
| Storage mode, currently supports `overwrite` , `append` , `ignore` and `error` . For the specific meaning of each mode, see [save-modes](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes) |
| |
| ### serializer [string] |
| |
| Serialization method, currently supports `csv` , `json` , `parquet` , `orc` and `text` |
| |
| ### common options [string] |
| |
| Sink plugin common parameters, please refer to [Sink Plugin](common-options.md) for details |
| |
| </TabItem> |
| <TabItem value="flink"> |
| |
| |
| | name | type | required | default value | |
| |-------------------|--------| -------- |----------------| |
| | format | string | yes | - | |
| | path | string | yes | - | |
| | path_time_format | string | no | yyyyMMddHHmmss | |
| | write_mode | string | no | - | |
| | common-options | string | no | - | |
| | parallelism | int | no | - | |
| | rollover_interval | long | no | 1 | |
| | max_part_size | long | no | 1024 | |
| | prefix | string | no | seatunnel | |
| | suffix | string | no | .ext | |
| |
| ### format [string] |
| |
| Currently, `csv` , `json` , and `text` are supported. The streaming mode currently only supports `text` |
| |
| ### path [string] |
| |
| The file path is required. The `hdfs file` starts with `hdfs://` , and the `local file` starts with `file://`, |
| we can add the variable `${now}` or `${uuid}` in the path, like `hdfs:///test_${uuid}_${now}.txt`, |
| `${now}` represents the current time, and its format can be defined by specifying the option `path_time_format` |
| |
| ### path_time_format [string] |
| |
| When the format in the `path` parameter is `xxxx-${now}` , `path_time_format` can specify the time format of the path, and the default value is `yyyy.MM.dd` . The commonly used time formats are listed as follows: |
| |
| | Symbol | Description | |
| | ------ | ------------------ | |
| | y | Year | |
| | M | Month | |
| | d | Day of month | |
| | H | Hour in day (0-23) | |
| | m | Minute in hour | |
| | s | Second in minute | |
| |
| See [Java SimpleDateFormat](https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html) for detailed time format syntax. |
| |
| ### write_mode [string] |
| |
| - NO_OVERWRITE |
| |
| - No overwrite, there is an error in the path |
| |
| - OVERWRITE |
| |
| - Overwrite, delete and then write if the path exists |
| |
| ### common options [string] |
| |
| Sink plugin common parameters, please refer to [Sink Plugin](common-options.md) for details |
| |
| ### parallelism [`Int`] |
| |
| The parallelism of an individual operator, for FileSink |
| |
| ### rollover_interval [long] |
| |
| The new file part rollover interval, unit min. |
| |
| ### max_part_size [long] |
| |
| The max size of each file part, unit MB. |
| |
| ### prefix [string] |
| |
| The prefix of each file part. |
| |
| ### suffix [string] |
| |
| The suffix of each file part. |
| |
| </TabItem> |
| </Tabs> |
| |
| ## Example |
| |
| <Tabs |
| groupId="engine-type" |
| defaultValue="spark" |
| values={[ |
| {label: 'Spark', value: 'spark'}, |
| {label: 'Flink', value: 'flink'}, |
| ]}> |
| <TabItem value="spark"> |
| |
| ```bash |
| file { |
| path = "file:///var/logs" |
| serializer = "text" |
| } |
| ``` |
| |
| </TabItem> |
| <TabItem value="flink"> |
| |
| ```bash |
| FileSink { |
| format = "json" |
| path = "hdfs://localhost:9000/flink/output/" |
| write_mode = "OVERWRITE" |
| } |
| ``` |
| |
| </TabItem> |
| </Tabs> |