contrib/storage-http/JSON_Options.md - drill - Git at Google

 # JSON Options and Configuration

 Drill has a collection of JSON configuration options to allow you to configure how Drill interprets JSON files.  These are set at the global level, however the HTTP plugin
 allows you to configure these options individually per connection and override the Drill defaults.  The options are:

 * `allowNanInf`:  Configures the connection to interpret `NaN` and `Inf` values
 * `allTextMode`:  By default, Drill attempts to infer data types from JSON data. If the data is malformed, Drill may throw schema change exceptions. If your data is
   inconsistent, you can enable `allTextMode` which when true, Drill will read all JSON values as strings, rather than try to infer the data type.
 * `readNumbersAsDouble`:  By default Drill will attempt to interpret integers, floating point number types and strings.  One challenge is when data is consistent, Drill may
   throw schema change exceptions. In addition to `allTextMode`, you can make Drill less sensitive by setting the `readNumbersAsDouble` to `true` which causes Drill to read all
   numeric fields in JSON data as `double` data type rather than trying to distinguish between ints and doubles.
 * `enableEscapeAnyChar`:  Allows a user to escape any character with a \
 * `skipMalformedRecords`:  Allows Drill to skip malformed records and recover without throwing exceptions.
 * `skipMalformedDocument`:  Allows Drill to skip entire malformed documents without throwing errors.

 All of these can be set by adding the `jsonOptions` to your connection configuration as shown below:

 ```json

 "jsonOptions": {
   "allTextMode": true,
   "readNumbersAsDouble": true
 }

 ```

 ## Schema Provisioning
 One of the challenges of querying APIs is inconsistent data.  Drill allows you to provide a schema for individual endpoints.  You can do this in one of three ways:

 1. By providing a schema inline [See: Specifying Schema as Table Function Parameter](https://drill.apache.org/docs/plugin-configuration-basics/#specifying-the-schema-as-table-function-parameter)
 2. By providing a schema in the configuration for the endpoint.

 The schema provisioning currently supports complex types of Arrays and Maps at any nesting level.

 ### Example Schema Provisioning:
 ```json
 "jsonOptions": {
   "schema": {
     "columns":[
       {
         "name":"outer_map",
         "type":"ARRAY<STRUCT<`bigint_col` BIGINT, `boolean_col` BOOLEAN, `date_col` DATE, `double_col` DOUBLE, `interval_col` INTERVAL, `int_col` BIGINT, `timestamp_col` TIMESTAMP, `time_col` TIME, `varchar_col` VARCHAR>>","mode":"REPEATED"
       }, {
         "name":"field1",
         "type":"VARCHAR",
         "mode":"OPTIONAL"
       },
     ]
   }
 }
 ```

 ## Dealing With Inconsistent Schemas
 One of the major challenges of interacting with JSON data is when the schema is inconsistent.  Drill has a `UNION` data type which is marked as experimental. At the time of
 writing, the HTTP plugin does not support the `UNION`, however supplying a schema can solve a lot of those issues.

 ### Json Mode
 Drill offers the option of reading all JSON values as a string. While this can complicate downstream analytics, it can also be a more memory-efficient way of reading data with
 inconsistent schema.  Unfortunately, at the time of writing, JSON-mode is only available with a provided schema.  However, future work will allow this mode to be enabled for
 any JSON data.

 #### Enabling JSON Mode:
 You can enable JSON mode simply by adding the `drill.json-mode` property with a value of `json` to a field, as shown below:

 ```json
 "jsonOptions": {
   "readNumbersAsDouble": true,
   "schema": {
     "type": "tuple_schema",
       "columns": [
         {
           "name": "custom_fields",
           "type": "ARRAY<STRUCT<`value` VARCHAR PROPERTIES { 'drill.json-mode' = 'json' }>>",
           "mode": "REPEATED"
       }
     ]
   }
 }
 ```
	# JSON Options and Configuration

	Drill has a collection of JSON configuration options to allow you to configure how Drill interprets JSON files. These are set at the global level, however the HTTP plugin
	allows you to configure these options individually per connection and override the Drill defaults. The options are:

	* `allowNanInf`: Configures the connection to interpret `NaN` and `Inf` values
	* `allTextMode`: By default, Drill attempts to infer data types from JSON data. If the data is malformed, Drill may throw schema change exceptions. If your data is
	inconsistent, you can enable `allTextMode` which when true, Drill will read all JSON values as strings, rather than try to infer the data type.
	* `readNumbersAsDouble`: By default Drill will attempt to interpret integers, floating point number types and strings. One challenge is when data is consistent, Drill may
	throw schema change exceptions. In addition to `allTextMode`, you can make Drill less sensitive by setting the `readNumbersAsDouble` to `true` which causes Drill to read all
	numeric fields in JSON data as `double` data type rather than trying to distinguish between ints and doubles.
	* `enableEscapeAnyChar`: Allows a user to escape any character with a \
	* `skipMalformedRecords`: Allows Drill to skip malformed records and recover without throwing exceptions.
	* `skipMalformedDocument`: Allows Drill to skip entire malformed documents without throwing errors.

	All of these can be set by adding the `jsonOptions` to your connection configuration as shown below:

	```json

	"jsonOptions": {
	"allTextMode": true,
	"readNumbersAsDouble": true
	}

	```

	## Schema Provisioning
	One of the challenges of querying APIs is inconsistent data. Drill allows you to provide a schema for individual endpoints. You can do this in one of three ways:

	1. By providing a schema inline [See: Specifying Schema as Table Function Parameter](https://drill.apache.org/docs/plugin-configuration-basics/#specifying-the-schema-as-table-function-parameter)
	2. By providing a schema in the configuration for the endpoint.

	The schema provisioning currently supports complex types of Arrays and Maps at any nesting level.

	### Example Schema Provisioning:
	```json
	"jsonOptions": {
	"schema": {
	"columns":[
	{
	"name":"outer_map",
	"type":"ARRAY<STRUCT<`bigint_col` BIGINT, `boolean_col` BOOLEAN, `date_col` DATE, `double_col` DOUBLE, `interval_col` INTERVAL, `int_col` BIGINT, `timestamp_col` TIMESTAMP, `time_col` TIME, `varchar_col` VARCHAR>>","mode":"REPEATED"
	}, {
	"name":"field1",
	"type":"VARCHAR",
	"mode":"OPTIONAL"
	},
	]
	}
	}
	```

	## Dealing With Inconsistent Schemas
	One of the major challenges of interacting with JSON data is when the schema is inconsistent. Drill has a `UNION` data type which is marked as experimental. At the time of
	writing, the HTTP plugin does not support the `UNION`, however supplying a schema can solve a lot of those issues.

	### Json Mode
	Drill offers the option of reading all JSON values as a string. While this can complicate downstream analytics, it can also be a more memory-efficient way of reading data with
	inconsistent schema. Unfortunately, at the time of writing, JSON-mode is only available with a provided schema. However, future work will allow this mode to be enabled for
	any JSON data.

	#### Enabling JSON Mode:
	You can enable JSON mode simply by adding the `drill.json-mode` property with a value of `json` to a field, as shown below:

	```json
	"jsonOptions": {
	"readNumbersAsDouble": true,
	"schema": {
	"type": "tuple_schema",
	"columns": [
	{
	"name": "custom_fields",
	"type": "ARRAY<STRUCT<`value` VARCHAR PROPERTIES { 'drill.json-mode' = 'json' }>>",
	"mode": "REPEATED"
	}
	]
	}
	}
	```