CSV options and configuration

CSV parser of HTTP Storage plugin can be configured using csvOptions.

{
  "csvOptions": {
    "delimiter": ",",
    "quote": "\"",
    "quoteEscape": "\"",
    "lineSeparator": "\n",
    "headerExtractionEnabled": null,
    "numberOfRowsToSkip": 0,
    "numberOfRecordsToRead": -1,
    "lineSeparatorDetectionEnabled": true,
    "maxColumns": 512,
    "maxCharsPerColumn": 4096,
    "skipEmptyLines": true,
    "ignoreLeadingWhitespaces": true,
    "ignoreTrailingWhitespaces": true,
    "nullValue": null
  }
}

Configuration options

  • delimiter: The character used to separate individual values in a CSV record. Default: ,

  • quote: The character used to enclose fields that may contain special characters (like the delimiter or line separator). Default: "

  • quoteEscape: The character used to escape a quote inside a field enclosed by quotes. Default: "

  • lineSeparator: The string that represents a line break in the CSV file. Default: \n

  • headerExtractionEnabled: Determines if the first row of the CSV contains the headers (field names). If set to true, the parser will use the first row as headers. Default: null

  • numberOfRowsToSkip: Number of rows to skip before starting to read records. Useful for skipping initial lines that are not records or headers. Default: 0

  • numberOfRecordsToRead: Specifies the maximum number of records to read from the input. A negative value (e.g., -1) means there's no limit. Default: -1

  • lineSeparatorDetectionEnabled: When set to true, the parser will automatically detect and use the line separator present in the input. This is useful when you don't know the line separator in advance. Default: true

  • maxColumns: The maximum number of columns a record can have. Any record with more columns than this will cause an exception. Default: 512

  • maxCharsPerColumn: The maximum number of characters a single field can have. Any field with more characters than this will cause an exception. Default: 4096

  • skipEmptyLines: When set to true, the parser will skip any lines that are empty or only contain whitespace. Default: true

  • ignoreLeadingWhitespaces: When set to true, the parser will ignore any whitespaces at the start of a field. Default: true

  • ignoreTrailingWhitespaces: When set to true, the parser will ignore any whitespaces at the end of a field. Default: true

  • nullValue: Specifies a string that should be interpreted as a null value when reading. If a field matches this string, it will be returned as null. Default: null

Example

Parse tsv

To parse .tsv files you can use a following csvOptions config:

{
  "csvOptions": {
    "delimiter": "\t"
  }
}

Then we can create a following connector plugin which queries a .tsv file from GitHub, let's call it github:

{
  "type": "http",
  "connections": {
    "test-data": {
      "url": "https://raw.githubusercontent.com/semantic-web-company/wic-tsv/master/data/de/Test/test_examples.txt",
      "requireTail": false,
      "method": "GET",
      "authType": "none",
      "inputType": "csv",
      "xmlDataLevel": 1,
      "postParameterLocation": "QUERY_STRING",
      "csvOptions": {
        "delimiter": "\t",
        "quote": "\"",
        "quoteEscape": "\"",
        "lineSeparator": "\n",
        "numberOfRecordsToRead": -1,
        "lineSeparatorDetectionEnabled": true,
        "maxColumns": 512,
        "maxCharsPerColumn": 4096,
        "skipEmptyLines": true,
        "ignoreLeadingWhitespaces": true,
        "ignoreTrailingWhitespaces": true
      },
      "verifySSLCert": true
    }
  },
  "timeout": 5,
  "retryDelay": 1000,
  "proxyType": "direct",
  "authMode": "SHARED_USER",
  "enabled": true
}

And we can query it using a following query:

SELECT * from github.`test-data`