blob: 43ebb79e03ace1006fd9934294cb77a0b3a05c74 [file] [log] [blame] [view]
# CSV options and configuration
CSV parser of HTTP Storage plugin can be configured using `csvOptions`.
```json
{
"csvOptions": {
"delimiter": ",",
"quote": "\"",
"quoteEscape": "\"",
"lineSeparator": "\n",
"headerExtractionEnabled": null,
"numberOfRowsToSkip": 0,
"numberOfRecordsToRead": -1,
"lineSeparatorDetectionEnabled": true,
"maxColumns": 512,
"maxCharsPerColumn": 4096,
"skipEmptyLines": true,
"ignoreLeadingWhitespaces": true,
"ignoreTrailingWhitespaces": true,
"nullValue": null
}
}
```
## Configuration options
- **delimiter**: The character used to separate individual values in a CSV record.
Default: `,`
- **quote**: The character used to enclose fields that may contain special characters (like the
delimiter or line separator).
Default: `"`
- **quoteEscape**: The character used to escape a quote inside a field enclosed by quotes.
Default: `"`
- **lineSeparator**: The string that represents a line break in the CSV file.
Default: `\n`
- **headerExtractionEnabled**: Determines if the first row of the CSV contains the headers (field
names). If set to `true`, the parser will use the first row as headers.
Default: `null`
- **numberOfRowsToSkip**: Number of rows to skip before starting to read records. Useful for
skipping initial lines that are not records or headers.
Default: `0`
- **numberOfRecordsToRead**: Specifies the maximum number of records to read from the input. A
negative value (e.g., `-1`) means there's no limit.
Default: `-1`
- **lineSeparatorDetectionEnabled**: When set to `true`, the parser will automatically detect and
use the line separator present in the input. This is useful when you don't know the line separator
in advance.
Default: `true`
- **maxColumns**: The maximum number of columns a record can have. Any record with more columns than
this will cause an exception.
Default: `512`
- **maxCharsPerColumn**: The maximum number of characters a single field can have. Any field with
more characters than this will cause an exception.
Default: `4096`
- **skipEmptyLines**: When set to `true`, the parser will skip any lines that are empty or only
contain whitespace.
Default: `true`
- **ignoreLeadingWhitespaces**: When set to `true`, the parser will ignore any whitespaces at the
start of a field.
Default: `true`
- **ignoreTrailingWhitespaces**: When set to `true`, the parser will ignore any whitespaces at the
end of a field.
Default: `true`
- **nullValue**: Specifies a string that should be interpreted as a `null` value when reading. If a
field matches this string, it will be returned as `null`.
Default: `null`
## Example
### Parse tsv
To parse `.tsv` files you can use a following `csvOptions` config:
```json
{
"csvOptions": {
"delimiter": "\t"
}
}
```
Then we can create a following connector plugin which queries a `.tsv` file from GitHub, let's call
it `github`:
```json
{
"type": "http",
"connections": {
"test-data": {
"url": "https://raw.githubusercontent.com/semantic-web-company/wic-tsv/master/data/de/Test/test_examples.txt",
"requireTail": false,
"method": "GET",
"authType": "none",
"inputType": "csv",
"xmlDataLevel": 1,
"postParameterLocation": "QUERY_STRING",
"csvOptions": {
"delimiter": "\t",
"quote": "\"",
"quoteEscape": "\"",
"lineSeparator": "\n",
"numberOfRecordsToRead": -1,
"lineSeparatorDetectionEnabled": true,
"maxColumns": 512,
"maxCharsPerColumn": 4096,
"skipEmptyLines": true,
"ignoreLeadingWhitespaces": true,
"ignoreTrailingWhitespaces": true
},
"verifySSLCert": true
}
},
"timeout": 5,
"retryDelay": 1000,
"proxyType": "direct",
"authMode": "SHARED_USER",
"enabled": true
}
```
And we can query it using a following query:
```sql
SELECT * from github.`test-data`
```