docs/querying/arrays.md - druid - Git at Google

 ---
 id: arrays
 title: "Arrays"
 ---

 <!--
   ~ Licensed to the Apache Software Foundation (ASF) under one
   ~ or more contributor license agreements.  See the NOTICE file
   ~ distributed with this work for additional information
   ~ regarding copyright ownership.  The ASF licenses this file
   ~ to you under the Apache License, Version 2.0 (the
   ~ "License"); you may not use this file except in compliance
   ~ with the License.  You may obtain a copy of the License at
   ~
   ~   http://www.apache.org/licenses/LICENSE-2.0
   ~
   ~ Unless required by applicable law or agreed to in writing,
   ~ software distributed under the License is distributed on an
   ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   ~ KIND, either express or implied.  See the License for the
   ~ specific language governing permissions and limitations
   ~ under the License.
   -->


 Apache Druid supports SQL standard `ARRAY` typed columns for `VARCHAR`, `BIGINT`, and `DOUBLE` types (native types `ARRAY<STRING>`, `ARRAY<LONG>`, and `ARRAY<DOUBLE>`). Other more complicated ARRAY types must be stored in [nested columns](nested-columns.md). Druid ARRAY types are distinct from [multi-value dimension](multi-value-dimensions.md), which have significantly different behavior than standard arrays.

 This document describes inserting, filtering, and grouping behavior for `ARRAY` typed columns.
 Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and [SQL array function reference](sql-array-functions.md) for additional details
 about the functions available to use with ARRAY columns and types in SQL.

 The following sections describe inserting, filtering, and grouping behavior based on the following example data, which includes 3 array typed columns:

 ```json lines
 {"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", "b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
 {"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, "b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
 {"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],          "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]}
 {"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", "b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
 {"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,        "arrayLong":[],          "arrayDouble":null}
 ```

 ## Ingesting arrays

 ### Native batch and streaming ingestion
 When using native [batch](../ingestion/native-batch.md) or streaming ingestion such as with [Apache Kafka](../ingestion/kafka-ingestion.md), arrays can be ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) type dimension schema which is shared with [type-aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery).

 When ingesting from TSV or CSV data, you can specify the array delimiters using the `listDelimiter` field in the `inputFormat`. JSON data must be formatted as a JSON array to be ingested as an array type. JSON data does not require `inputFormat` configuration.

 The following shows an example `dimensionsSpec` for native ingestion of the data used in this document:

 ```
 "dimensions": [
   {
     "type": "auto",
     "name": "label"
   },
   {
     "type": "auto",
     "name": "arrayString"
   },
   {
     "type": "auto",
     "name": "arrayLong"
   },
   {
     "type": "auto",
     "name": "arrayDouble"
   }
 ],
 ```

 ### SQL-based ingestion

 Arrays can be inserted with [SQL-based ingestion](../multi-stage-query/index.md).

 #### Examples

 ```sql
 REPLACE INTO "array_example" OVERWRITE ALL
 WITH "ext" AS (
   SELECT *
   FROM TABLE(
     EXTERN(
       '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          \"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        \"arrayLong\":[],          \"arrayDouble\":null}"}',
       '{"type":"json"}'
     )
   ) EXTEND (
     "timestamp" VARCHAR,
     "label" VARCHAR,
     "arrayString" VARCHAR ARRAY,
     "arrayLong" BIGINT ARRAY,
     "arrayDouble" DOUBLE ARRAY
   )
 )
 SELECT
   TIME_PARSE("timestamp") AS "__time",
   "label",
   "arrayString",
   "arrayLong",
   "arrayDouble"
 FROM "ext"
 PARTITIONED BY DAY
 ```

 Arrays can also be used as `GROUP BY` keys for rollup:

 ```sql
 REPLACE INTO "array_example_rollup" OVERWRITE ALL
 WITH "ext" AS (
   SELECT *
   FROM TABLE(
     EXTERN(
       '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          \"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        \"arrayLong\":[],          \"arrayDouble\":null}"}',
       '{"type":"json"}'
     )
   ) EXTEND (
     "timestamp" VARCHAR,
     "label" VARCHAR,
     "arrayString" VARCHAR ARRAY,
     "arrayLong" BIGINT ARRAY,
     "arrayDouble" DOUBLE ARRAY
   )
 )
 SELECT
   TIME_PARSE("timestamp") AS "__time",
   "label",
   "arrayString",
   "arrayLong",
   "arrayDouble",
   COUNT(*) as "count"
 FROM "ext"
 GROUP BY 1,2,3,4,5
 PARTITIONED BY DAY
 ```

 #### `arrayIngestMode`

 For seamless backwards compatible behavior with Druid versions older than 31, there is an `arrayIngestMode` query context flag.

 When `arrayIngestMode` is `array`, SQL ARRAY types are stored using Druid array columns. This is recommended for new
 tables and the default configuration for Druid 31 and newer.

 When `arrayIngestMode` is `mvd` (legacy), SQL `VARCHAR ARRAY` are implicitly wrapped in [`ARRAY_TO_MV`](sql-functions.md#array_to_mv).
 This causes them to be stored as [multi-value strings](multi-value-dimensions.md), using the same `STRING` column type
 as regular scalar strings. SQL `BIGINT ARRAY` and `DOUBLE ARRAY` cannot be loaded under `arrayIngestMode: mvd`. This
 mode is not recommended and will be removed in a future release, but provided for backwards compatibility.

 The following table summarizes the differences in SQL ARRAY handling between `arrayIngestMode: array` and
 `arrayIngestMode: mvd`.

 | SQL type | Stored type when `arrayIngestMode: array` (default) | Stored type when `arrayIngestMode: mvd` |
 |---|---|---|
 |`VARCHAR ARRAY`|`ARRAY<STRING>`|[multi-value `STRING`](multi-value-dimensions.md)|
 |`BIGINT ARRAY`|`ARRAY<LONG>`|not possible (validation error)|
 |`DOUBLE ARRAY`|`ARRAY<DOUBLE>`|not possible (validation error)|

 In either mode, you can explicitly wrap string arrays in `ARRAY_TO_MV` to cause them to be stored as
 [multi-value strings](multi-value-dimensions.md).

 When validating a SQL INSERT or REPLACE statement that contains arrays, Druid checks whether the statement would lead
 to mixing string arrays and multi-value strings in the same column. If this condition is detected, the statement fails
 validation unless the column is named under the `skipTypeVerification` context parameter. This parameter can be either
 a comma-separated list of column names, or a JSON array in string form. This validation is done to prevent accidentally
 mixing arrays and multi-value strings in the same column.

 ## Querying arrays

 ### Filtering

 All query types, as well as [filtered aggregators](aggregations.md#filtered-aggregator), can filter on array typed columns. Filters follow these rules for array types:

 - All filters match against the entire array value for the row
 - Native value filters like [equality](filters.md#equality-filter) and [range](filters.md#range-filter) match on entire array values, as do SQL constructs that plan into these native filters
 - The [`IS NULL`](filters.md#null-filter) filter will match rows where the entire array value is null
 - [Array specific functions](sql-array-functions.md) like `ARRAY_CONTAINS` and `ARRAY_OVERLAP` follow the behavior specified by those functions
 - All other filters do not directly support ARRAY types and will result in a query error

 #### Example: equality
 ```sql
 SELECT *
 FROM "array_example"
 WHERE arrayLong = ARRAY[1,2,3]
 ```

 ```json lines
 {"__time":"2023-01-01T00:00:00.000Z","label":"row3","arrayString":"[]","arrayLong":"[1,2,3]","arrayDouble":"[null,2.2,1.1]"}
 {"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
 ```

 #### Example: null
 ```sql
 SELECT *
 FROM "array_example"
 WHERE arrayLong IS NULL
 ```

 ```json lines
 {"__time":"2023-01-01T00:00:00.000Z","label":"row2","arrayString":"[null,\"b\"]","arrayLong":null,"arrayDouble":"[999.0,null,5.5]"}
 ```

 #### Example: range
 ```sql
 SELECT *
 FROM "array_example"
 WHERE arrayString >= ARRAY['a','b']
 ```

 ```json lines
 {"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
 {"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
 ```

 #### Example: ARRAY_CONTAINS
 ```sql
 SELECT *
 FROM "array_example"
 WHERE ARRAY_CONTAINS(arrayString, 'a')
 ```

 ```json lines
 {"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
 {"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
 ```

 ### Grouping

 When grouping on an array with SQL or a native [groupBy query](groupbyquery.md), grouping follows standard SQL behavior and groups on the entire array as a single value. The [`UNNEST`](sql.md#unnest) function allows grouping on the individual array elements.

 #### Example: SQL grouping query with no filtering
 ```sql
 SELECT label, arrayString
 FROM "array_example"
 GROUP BY 1,2
 ```
 results in:
 ```json lines
 {"label":"row1","arrayString":"[\"a\",\"b\"]"}
 {"label":"row2","arrayString":"[null,\"b\"]"}
 {"label":"row3","arrayString":"[]"}
 {"label":"row4","arrayString":"[\"a\",\"b\"]"}
 {"label":"row5","arrayString":null}
 ```

 #### Example: SQL grouping query with a filter
 ```sql
 SELECT label, arrayString
 FROM "array_example"
 WHERE arrayLong = ARRAY[1,2,3]
 GROUP BY 1,2
 ```

 results:
 ```json lines
 {"label":"row3","arrayString":"[]"}
 {"label":"row4","arrayString":"[\"a\",\"b\"]"}
 ```

 #### Example: UNNEST
 ```sql
 SELECT label, strings
 FROM "array_example" CROSS JOIN UNNEST(arrayString) as u(strings)
 GROUP BY 1,2
 ```

 results:
 ```json lines
 {"label":"row1","strings":"a"}
 {"label":"row1","strings":"b"}
 {"label":"row2","strings":null}
 {"label":"row2","strings":"b"}
 {"label":"row4","strings":"a"}
 {"label":"row4","strings":"b"}
 ```

 ## Differences between arrays and multi-value dimensions
 Avoid confusing string arrays with [multi-value dimensions](multi-value-dimensions.md). Arrays and multi-value dimensions are stored in different column types, and query behavior is different. You can use the functions `MV_TO_ARRAY` and `ARRAY_TO_MV` to convert between the two if needed. In general, we recommend using arrays whenever possible, since they are a newer and more powerful feature and have SQL compliant behavior.

 Use care during ingestion to ensure you get the type you want.

 To get arrays when performing an ingestion using JSON ingestion specs, such as [native batch](../ingestion/native-batch.md) or streaming ingestion such as with [Apache Kafka](../ingestion/kafka-ingestion.md), use dimension type `auto` or enable `useSchemaDiscovery`. When performing a [SQL-based ingestion](../multi-stage-query/index.md), write a query that generates arrays. Arrays may contain strings or numbers.

 To get multi-value dimensions when performing an ingestion using JSON ingestion specs, use dimension type `string` and do not enable `useSchemaDiscovery`. When performing a [SQL-based ingestion](../multi-stage-query/index.md), wrap arrays in [`ARRAY_TO_MV`](multi-value-dimensions.md#sql-based-ingestion), which ensures you get multi-value dimensions. Multi-value dimensions can only contain strings.

 You can tell which type you have by checking the `INFORMATION_SCHEMA.COLUMNS` table, using a query like:

 ```sql
 SELECT COLUMN_NAME, DATA_TYPE
 FROM INFORMATION_SCHEMA.COLUMNS
 WHERE TABLE_NAME = 'mytable'
 ```

 Arrays are type `ARRAY`, multi-value strings are type `VARCHAR`.
	---
	id: arrays
	title: "Arrays"
	---

	<!--
	~ Licensed to the Apache Software Foundation (ASF) under one
	~ or more contributor license agreements. See the NOTICE file
	~ distributed with this work for additional information
	~ regarding copyright ownership. The ASF licenses this file
	~ to you under the Apache License, Version 2.0 (the
	~ "License"); you may not use this file except in compliance
	~ with the License. You may obtain a copy of the License at
	~
	~ http://www.apache.org/licenses/LICENSE-2.0
	~
	~ Unless required by applicable law or agreed to in writing,
	~ software distributed under the License is distributed on an
	~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	~ KIND, either express or implied. See the License for the
	~ specific language governing permissions and limitations
	~ under the License.
	-->


	Apache Druid supports SQL standard `ARRAY` typed columns for `VARCHAR`, `BIGINT`, and `DOUBLE` types (native types `ARRAY<STRING>`, `ARRAY<LONG>`, and `ARRAY<DOUBLE>`). Other more complicated ARRAY types must be stored in [nested columns](nested-columns.md). Druid ARRAY types are distinct from [multi-value dimension](multi-value-dimensions.md), which have significantly different behavior than standard arrays.

	This document describes inserting, filtering, and grouping behavior for `ARRAY` typed columns.
	Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and [SQL array function reference](sql-array-functions.md) for additional details
	about the functions available to use with ARRAY columns and types in SQL.

	The following sections describe inserting, filtering, and grouping behavior based on the following example data, which includes 3 array typed columns:

	```json lines
	{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", "b"], "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
	{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, "b"], "arrayLong":null, "arrayDouble":[999, null, 5.5]}
	{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [], "arrayLong":[1, 2, 3], "arrayDouble":[null, 2.2, 1.1]}
	{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", "b"], "arrayLong":[1, 2, 3], "arrayDouble":[]}
	{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null, "arrayLong":[], "arrayDouble":null}
	```

	## Ingesting arrays

	### Native batch and streaming ingestion
	When using native [batch](../ingestion/native-batch.md) or streaming ingestion such as with [Apache Kafka](../ingestion/kafka-ingestion.md), arrays can be ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) type dimension schema which is shared with [type-aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery).

	When ingesting from TSV or CSV data, you can specify the array delimiters using the `listDelimiter` field in the `inputFormat`. JSON data must be formatted as a JSON array to be ingested as an array type. JSON data does not require `inputFormat` configuration.

	The following shows an example `dimensionsSpec` for native ingestion of the data used in this document:

	```
	"dimensions": [
	{
	"type": "auto",
	"name": "label"
	},
	{
	"type": "auto",
	"name": "arrayString"
	},
	{
	"type": "auto",
	"name": "arrayLong"
	},
	{
	"type": "auto",
	"name": "arrayDouble"
	}
	],
	```

	### SQL-based ingestion

	Arrays can be inserted with [SQL-based ingestion](../multi-stage-query/index.md).

	#### Examples

	```sql
	REPLACE INTO "array_example" OVERWRITE ALL
	WITH "ext" AS (
	SELECT *
	FROM TABLE(
	EXTERN(
	'{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null, \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null, \"arrayLong\":[], \"arrayDouble\":null}"}',
	'{"type":"json"}'
	)
	) EXTEND (
	"timestamp" VARCHAR,
	"label" VARCHAR,
	"arrayString" VARCHAR ARRAY,
	"arrayLong" BIGINT ARRAY,
	"arrayDouble" DOUBLE ARRAY
	)
	)
	SELECT
	TIME_PARSE("timestamp") AS "__time",
	"label",
	"arrayString",
	"arrayLong",
	"arrayDouble"
	FROM "ext"
	PARTITIONED BY DAY
	```

	Arrays can also be used as `GROUP BY` keys for rollup:

	```sql
	REPLACE INTO "array_example_rollup" OVERWRITE ALL
	WITH "ext" AS (
	SELECT *
	FROM TABLE(
	EXTERN(
	'{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null, \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null, \"arrayLong\":[], \"arrayDouble\":null}"}',
	'{"type":"json"}'
	)
	) EXTEND (
	"timestamp" VARCHAR,
	"label" VARCHAR,
	"arrayString" VARCHAR ARRAY,
	"arrayLong" BIGINT ARRAY,
	"arrayDouble" DOUBLE ARRAY
	)
	)
	SELECT
	TIME_PARSE("timestamp") AS "__time",
	"label",
	"arrayString",
	"arrayLong",
	"arrayDouble",
	COUNT(*) as "count"
	FROM "ext"
	GROUP BY 1,2,3,4,5
	PARTITIONED BY DAY
	```

	#### `arrayIngestMode`

	For seamless backwards compatible behavior with Druid versions older than 31, there is an `arrayIngestMode` query context flag.

	When `arrayIngestMode` is `array`, SQL ARRAY types are stored using Druid array columns. This is recommended for new
	tables and the default configuration for Druid 31 and newer.

	When `arrayIngestMode` is `mvd` (legacy), SQL `VARCHAR ARRAY` are implicitly wrapped in [`ARRAY_TO_MV`](sql-functions.md#array_to_mv).
	This causes them to be stored as [multi-value strings](multi-value-dimensions.md), using the same `STRING` column type
	as regular scalar strings. SQL `BIGINT ARRAY` and `DOUBLE ARRAY` cannot be loaded under `arrayIngestMode: mvd`. This
	mode is not recommended and will be removed in a future release, but provided for backwards compatibility.

	The following table summarizes the differences in SQL ARRAY handling between `arrayIngestMode: array` and
	`arrayIngestMode: mvd`.

	\| SQL type \| Stored type when `arrayIngestMode: array` (default) \| Stored type when `arrayIngestMode: mvd` \|
	\|---\|---\|---\|
	\|`VARCHAR ARRAY`\|`ARRAY<STRING>`\|[multi-value `STRING`](multi-value-dimensions.md)\|
	\|`BIGINT ARRAY`\|`ARRAY<LONG>`\|not possible (validation error)\|
	\|`DOUBLE ARRAY`\|`ARRAY<DOUBLE>`\|not possible (validation error)\|

	In either mode, you can explicitly wrap string arrays in `ARRAY_TO_MV` to cause them to be stored as
	[multi-value strings](multi-value-dimensions.md).

	When validating a SQL INSERT or REPLACE statement that contains arrays, Druid checks whether the statement would lead
	to mixing string arrays and multi-value strings in the same column. If this condition is detected, the statement fails
	validation unless the column is named under the `skipTypeVerification` context parameter. This parameter can be either
	a comma-separated list of column names, or a JSON array in string form. This validation is done to prevent accidentally
	mixing arrays and multi-value strings in the same column.

	## Querying arrays

	### Filtering

	All query types, as well as [filtered aggregators](aggregations.md#filtered-aggregator), can filter on array typed columns. Filters follow these rules for array types:

	- All filters match against the entire array value for the row
	- Native value filters like [equality](filters.md#equality-filter) and [range](filters.md#range-filter) match on entire array values, as do SQL constructs that plan into these native filters
	- The [`IS NULL`](filters.md#null-filter) filter will match rows where the entire array value is null
	- [Array specific functions](sql-array-functions.md) like `ARRAY_CONTAINS` and `ARRAY_OVERLAP` follow the behavior specified by those functions
	- All other filters do not directly support ARRAY types and will result in a query error

	#### Example: equality
	```sql
	SELECT *
	FROM "array_example"
	WHERE arrayLong = ARRAY[1,2,3]
	```

	```json lines
	{"__time":"2023-01-01T00:00:00.000Z","label":"row3","arrayString":"[]","arrayLong":"[1,2,3]","arrayDouble":"[null,2.2,1.1]"}
	{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
	```

	#### Example: null
	```sql
	SELECT *
	FROM "array_example"
	WHERE arrayLong IS NULL
	```

	```json lines
	{"__time":"2023-01-01T00:00:00.000Z","label":"row2","arrayString":"[null,\"b\"]","arrayLong":null,"arrayDouble":"[999.0,null,5.5]"}
	```

	#### Example: range
	```sql
	SELECT *
	FROM "array_example"
	WHERE arrayString >= ARRAY['a','b']
	```

	```json lines
	{"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
	{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
	```

	#### Example: ARRAY_CONTAINS
	```sql
	SELECT *
	FROM "array_example"
	WHERE ARRAY_CONTAINS(arrayString, 'a')
	```

	```json lines
	{"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
	{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
	```

	### Grouping

	When grouping on an array with SQL or a native [groupBy query](groupbyquery.md), grouping follows standard SQL behavior and groups on the entire array as a single value. The [`UNNEST`](sql.md#unnest) function allows grouping on the individual array elements.

	#### Example: SQL grouping query with no filtering
	```sql
	SELECT label, arrayString
	FROM "array_example"
	GROUP BY 1,2
	```
	results in:
	```json lines
	{"label":"row1","arrayString":"[\"a\",\"b\"]"}
	{"label":"row2","arrayString":"[null,\"b\"]"}
	{"label":"row3","arrayString":"[]"}
	{"label":"row4","arrayString":"[\"a\",\"b\"]"}
	{"label":"row5","arrayString":null}
	```

	#### Example: SQL grouping query with a filter
	```sql
	SELECT label, arrayString
	FROM "array_example"
	WHERE arrayLong = ARRAY[1,2,3]
	GROUP BY 1,2
	```

	results:
	```json lines
	{"label":"row3","arrayString":"[]"}
	{"label":"row4","arrayString":"[\"a\",\"b\"]"}
	```

	#### Example: UNNEST
	```sql
	SELECT label, strings
	FROM "array_example" CROSS JOIN UNNEST(arrayString) as u(strings)
	GROUP BY 1,2
	```

	results:
	```json lines
	{"label":"row1","strings":"a"}
	{"label":"row1","strings":"b"}
	{"label":"row2","strings":null}
	{"label":"row2","strings":"b"}
	{"label":"row4","strings":"a"}
	{"label":"row4","strings":"b"}
	```

	## Differences between arrays and multi-value dimensions
	Avoid confusing string arrays with [multi-value dimensions](multi-value-dimensions.md). Arrays and multi-value dimensions are stored in different column types, and query behavior is different. You can use the functions `MV_TO_ARRAY` and `ARRAY_TO_MV` to convert between the two if needed. In general, we recommend using arrays whenever possible, since they are a newer and more powerful feature and have SQL compliant behavior.

	Use care during ingestion to ensure you get the type you want.

	To get arrays when performing an ingestion using JSON ingestion specs, such as [native batch](../ingestion/native-batch.md) or streaming ingestion such as with [Apache Kafka](../ingestion/kafka-ingestion.md), use dimension type `auto` or enable `useSchemaDiscovery`. When performing a [SQL-based ingestion](../multi-stage-query/index.md), write a query that generates arrays. Arrays may contain strings or numbers.

	To get multi-value dimensions when performing an ingestion using JSON ingestion specs, use dimension type `string` and do not enable `useSchemaDiscovery`. When performing a [SQL-based ingestion](../multi-stage-query/index.md), wrap arrays in [`ARRAY_TO_MV`](multi-value-dimensions.md#sql-based-ingestion), which ensures you get multi-value dimensions. Multi-value dimensions can only contain strings.

	You can tell which type you have by checking the `INFORMATION_SCHEMA.COLUMNS` table, using a query like:

	```sql
	SELECT COLUMN_NAME, DATA_TYPE
	FROM INFORMATION_SCHEMA.COLUMNS
	WHERE TABLE_NAME = 'mytable'
	```

	Arrays are type `ARRAY`, multi-value strings are type `VARCHAR`.