:::info Apache Druid supports two query languages: Druid SQL and native queries. This document describes the SQL language. :::
This page provides a reference of Apache Druid® SQL functions in alphabetical order. For more details on a function, refer to the following:
The examples on this page use the following example datasources:
array-example
created with SQL-based ingestionflight-carriers
using FlightCarrierOnTime (1 month)
included with Druidkttm
using KoalasToTheMax one day
included with Druidmvd-example
using SQL-based ingestiontaxi-trips
using NYC Taxi cabs (3 files)
included with DruidTo load a datasource included with Druid, access the web console and go to Load data > Batch - SQL > Example data. Select Connect data, and parse using the default settings. On the page to configure the schema, select the datasource label and enter the name of the datasource listed above.
Use the following query to create the array-example
datasource:
REPLACE INTO "array-example" OVERWRITE ALL WITH "ext" AS ( SELECT * FROM TABLE( EXTERN( '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null, \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null, \"arrayLong\":[], \"arrayDouble\":null}"}', '{"type":"json"}' ) ) EXTEND ( "timestamp" VARCHAR, "label" VARCHAR, "arrayString" VARCHAR ARRAY, "arrayLong" BIGINT ARRAY, "arrayDouble" DOUBLE ARRAY ) ) SELECT TIME_PARSE("timestamp") AS "__time", "label", "arrayString", "arrayLong", "arrayDouble" FROM "ext" PARTITIONED BY DAY
Use the following query to create the mvd-example
datasource:
REPLACE INTO "mvd-example" OVERWRITE ALL WITH "ext" AS ( SELECT * FROM TABLE( EXTERN( '{"type":"inline","data":"{\"timestamp\": \"2011-01-12T00:00:00.000Z\", \"label\": \"row1\", \"tags\": [\"t1\",\"t2\",\"t3\"]}\n{\"timestamp\": \"2011-01-13T00:00:00.000Z\", \"label\": \"row2\", \"tags\": [\"t3\",\"t4\",\"t5\"]}\n{\"timestamp\": \"2011-01-14T00:00:00.000Z\", \"label\": \"row3\", \"tags\": [\"t5\",\"t6\",\"t7\"]}\n{\"timestamp\": \"2011-01-14T00:00:00.000Z\", \"label\": \"row4\", \"tags\": []}"}', '{"type":"json"}', '[{"name":"timestamp", "type":"STRING"},{"name":"label", "type":"STRING"},{"name":"tags", "type":"ARRAY<STRING>"}]' ) ) ) SELECT TIME_PARSE("timestamp") AS "__time", "label", ARRAY_TO_MV("tags") AS "tags" FROM "ext" PARTITIONED BY DAY
Calculates the absolute value of a numeric expression.
ABS(<NUMERIC>)
The following example applies the ABS function to the ArrDelay
column from the flight-carriers
datasource.
SELECT "ArrDelay" AS "arrival_delay", ABS("ArrDelay") AS "absolute_arrival_delay" FROM "flight-carriers" WHERE "ArrDelay" < 0 LIMIT 1
Returns the following:
arrival_delay | absolute_arrival_delay |
---|---|
-27 | 27 |
Calculates the arc cosine (arccosine) of a numeric expression.
ACOS(expr)
The following example calculates the arc cosine of 0
.
SELECT ACOS(0) AS "arc_cosine"
Returns the following:
arc_cosine |
---|
1.5707963267948966 |
Returns any value of the specified expression.
ANY_VALUE(expr, [maxBytesPerValue, [aggregateMultipleValues]])
The following example returns the state abbreviation, state name, and average flight time grouped by each state in flight-carriers
:
SELECT "OriginState", ANY_VALUE("OriginStateName") AS "OriginStateName", AVG("ActualElapsedTime") AS "AverageFlightTime" FROM "flight-carriers" GROUP BY 1 LIMIT 3
Returns the following:
OriginState | OriginStateName | AverageFlightTime |
---|---|---|
AK | Alaska | 113.2777967841259 |
AL | Alabama | 92.28766697732215 |
AR | Arkansas | 95.0391382405745 |
Counts distinct values of a regular column or a prebuilt sketch column using an approximate algorithm.
APPROX_COUNT_DISTINCT(expr)
The following example counts the number of distinct airlines reported in flight-carriers
:
SELECT APPROX_COUNT_DISTINCT("Reporting_Airline") AS "num_airlines" FROM "flight-carriers"
Returns the following:
num_airlines |
---|
20 |
Counts distinct values of a string, numeric, or hyperUnique
column using Druid's built-in cardinality
or hyperUnique
aggregators. Consider using APPROX_COUNT_DISTINCT_DS_HLL
instead, which offers better accuracy in many cases.
APPROX_COUNT_DISTINCT_BUILTIN(expr)
The following example counts the number of distinct airlines reported in flight-carriers
:
SELECT APPROX_COUNT_DISTINCT_BUILTIN("Reporting_Airline") AS "num_airlines" FROM "flight-carriers"
Returns the following:
num_airlines |
---|
20 |
Returns the approximate number of distinct values in a HLL sketch column or a regular column. See DataSketches HLL Sketch module for a description of optional parameters.
APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])
The following example returns the approximate number of distinct tail numbers in the flight-carriers
datasource.
SELECT APPROX_COUNT_DISTINCT_DS_HLL("Tail_Number") AS "estimate" FROM "flight-carriers"
Returns the following:
estimate |
---|
4686 |
Returns the approximate number of distinct values in a Theta sketch column or a regular column. See DataSketches Theta Sketch module for a description of optional parameters.
APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])
The following example returns the approximate number of distinct tail numbers in the Tail_Number
column of the flight-carriers
datasource.
SELECT APPROX_COUNT_DISTINCT_DS_THETA("Tail_Number") AS "estimate" FROM "flight-carriers"
Returns the following:
estimate |
---|
4667 |
:::info Deprecated in favor of APPROX_QUANTILE_DS
. :::
APPROX_QUANTILE(expr, probability, [k])
Computes approximate quantiles on a Quantiles sketch column or a regular numeric column. See DataSketches Quantiles Sketch module for a description of parameters.
APPROX_QUANTILE_DS(expr, probability, [k])
The following example approximates the median of the Distance
column from the flight-carriers
datasource. The query may return a different approximation on each execution.
SELECT APPROX_QUANTILE_DS("Distance", 0.5, 128) AS "estimate_median" FROM "flight-carriers"
Returns a result similar to the following:
estimate_median |
---|
569 |
Computes approximate quantiles on fixed buckets histogram column or a regular numeric column. See Fixed buckets histogram for a description of parameters.
APPROX_QUANTILE_FIXED_BUCKETS(expr, probability, numBuckets, lowerLimit, upperLimit, [outlierHandlingMode])
The following example approximates the median of a histogram on the Distance
column from the flight-carriers
datasource. The histogram has 10 buckets, a lower limit of zero, an upper limit of 2500, and ignores outlier values.
SELECT APPROX_QUANTILE_FIXED_BUCKETS("Distance", 0.5, 10, 0, 2500, 'ignore') AS "estimate_median" FROM "flight-carriers"
Returns the following:
estimate_median |
---|
571.6983032226562 |
Constructs a SQL ARRAY
literal from the provided expression arguments. All arguments must be of the same type.
ARRAY[expr1, expr2, ...]
The following example constructs arrays from the values of the agent_category
, browser
, and browser_version
columns in the kttm
datasource.
SELECT ARRAY["agent_category", "browser", "browser_version"] AS "user_agent_details" FROM "kttm" LIMIT 5
Returns the following:
user_agent_details |
---|
["Personal computer","Chrome","76.0.3809.100"] |
["Smartphone","Chrome Mobile","50.0.2661.89"] |
["Personal computer","Chrome","76.0.3809.100"] |
["Personal computer","Opera","62.0.3331.116"] |
["Smartphone","Mobile Safari","12.0"] |
Returns an array of all values of the specified expression. To include only unique values, specify DISTINCT
.
ARRAY_AGG([DISTINCT] expr, [size])
The following example returns arrays of unique values from the OriginState
column in the flight-carriers
datasource, grouped by Reporting_Airline
.
SELECT "Reporting_Airline", ARRAY_AGG(DISTINCT "OriginState", 50000) AS "Origin" FROM "flight-carriers" GROUP BY "Reporting_Airline" LIMIT 5
Returns the following:
Reporting_Airline | Origin |
---|---|
AA | ["AL","AR","AZ","CA","CO","CT","FL","GA","HI","IL","IN","KS","KY","LA","MA","MD","MI","MN","MO","NC","NE","NJ","NM","NV","NY","OH","OK","OR","PA","PR","RI","TN","TX","UT","VA","VI","WA"] |
AS | ["AK","AZ","CA","CO","FL","ID","IL","MA","NJ","NV","OR","TX","VA","WA"] |
B6 | ["AZ","CA","CO","FL","LA","MA","NJ","NV","NY","OR","PR","UT","VA","VT","WA"] |
CO | ["AK","AL","AZ","CA","CO","CT","FL","GA","HI","IL","IN","LA","MA","MD","MI","MN","MO","MS","NC","NE","NH","NJ","NM","NV","NY","OH","OK","OR","PA","PR","RI","SC","TN","TX","UT","VA","VI","WA"] |
DH | ["AL","CA","CT","FL","GA","IL","MA","ME","MI","NC","NH","NJ","NV","NY","OH","PA","RI","SC","TN","VA","VT","WA","WV"] |
Appends the expression to the array. The source array type determines the resulting array type.
ARRAY_APPEND(arr, expr)
The following example appends c
to the values in the arrayString
column from the array-example
datasource.
SELECT ARRAY_APPEND("arrayString",'c') AS "array_appended" FROM "array-example"
Returns the following:
array_appended |
---|
[a, b, c] |
[null,"b","c"] |
[c] |
[a, b, c] |
null |
Concatenates two arrays. The type of arr1
determines the resulting array type.
ARRAY_CONCAT(arr1, arr2)
The following example concatenates the arrays in the arrayLong
and arrayDouble
columns from the array-example
datasource.
SELECT ARRAY_CONCAT("arrayLong", "arrayDouble") AS "arrayConcatenated" FROM "array-example"
Returns the following:
arrayConcatenated |
---|
[1,null,3,1.1,2.2,null] |
null |
[1,2,3,null,2.2,1.1] |
[1,2,3] |
null |
Concatenates array inputs into a single array. To include only unique values, specify DISTINCT
.
ARRAY_CONCAT_AGG([DISTINCT] expr, [size])
The following example concatenates the array inputs from the arrayDouble
column of the array-example
datasource into a single array.
SELECT ARRAY_CONCAT_AGG( DISTINCT "arrayDouble") AS "array_concat_agg_distinct" FROM "array-example"
Returns the following:
array_concat_agg_distinct |
---|
[null,1.1,2.2,5.5,999] |
Checks if the array contains the specified expression.
If the specified expression is a scalar value, returns true if the source array contains the value.
ARRAY_CONTAINS(arr, expr)
The following example returns true if the arraySring
column from the array-example
datasource contains 2
.
SELECT "arrayLong", ARRAY_CONTAINS("arrayLong", 2) AS "arrayContains" FROM "array-example"
Returns the following:
arrayLong | arrayContains |
---|---|
[1,null,3] | false |
null | null |
[1,2,3] | true |
[1,2,3] | true |
[] | false |
If the specified expression is an array, returns true if the source array contains all elements of the expression.
ARRAY_CONTAINS(arr, expr)
The following example returns true if the arrayLong
column from the array-example
datasource contains all elements of the provided expression.
SELECT "label", "arrayLong", ARRAY_CONTAINS("arrayLong", ARRAY[1,2,3]) AS "arrayContains" FROM "array-example"
Returns the following:
label | arrayLong | arrayContains |
---|---|---|
row1 | [1,null,3] | false |
row2 | null | null |
row3 | [1,2,3] | true |
row4 | [1,2,3] | true |
row5 | [] | false |
Returns the length of the array.
ARRAY_LENGTH(arr)
The following example returns the length of array expressions in the arrayDouble
column from the array-example
datasource.
SELECT "arrayDouble" AS "array", ARRAY_LENGTH("arrayDouble") AS "arrayLength" FROM "array-example"
Returns the following:
larray | arrayLength |
---|---|
row1 | 3 |
row2 | 3 |
row3 | 3 |
row4 | 0 |
row5 | null |
Returns the array element at the specified zero-based index. Returns null if the index is out of bounds.
ARRAY_OFFSET(arr, long)
The following example returns the element at the specified zero-based index from the arrays in the arrayLong
column of the array-example
datasource.
SELECT "arrayLong" as "array", ARRAY_OFFSET("arrayLong", 2) AS "elementAtIndex" FROM "array-example"
Returns the following:
array | elementAtIndex |
---|---|
[1,null,3] | 3 |
null | null |
[1,2,3] | 3 |
[1,2,3] | 3 |
[] | null |
Returns the zero-based index of the first occurrence of the expression in the array. Returns null if the value isn't present.
ARRAY_OFFSET_OF(arr, expr)
The following example returns the zero-based index of the fist occurrence of 3
in the arrays in the arrayLong
column of the array-example
datasource.
SELECT "arrayLong" as "array", ARRAY_OFFSET_OF("arrayLong", 3) AS "offset" FROM "array-example"
Returns the following:
array | offset |
---|---|
[1,null,3] | 2 |
null | null |
[1,2,3] | 2 |
[1,2,3] | 2 |
[] | null |
Returns the array element at the specified one-based index. Returns null if the index is out of bounds.
ARRAY_ORDINAL(arr, long)
The following example returns the element at the specified one-based index from the arrays in the arrayLong
column of the array-example
datasource.
SELECT "arrayLong" as "array", ARRAY_ORDINAL("arrayLong", 2) AS "elementAtIndex" FROM "array-example"
Returns the following:
array | elementAtIndex |
---|---|
[1,null,3] | null |
null | null |
[1,2,3] | 2 |
[1,2,3] | 2 |
[] | null |
Returns the one-based index of the first occurrence of the expression in the array. Returns null if the value isn't present.
ARRAY_ORDINAL_OF(arr, expr)
The following example returns the one-based index of the fist occurrence of 3
in the arrays in the arrayLong
column of the array-example
datasource.
SELECT "arrayLong" as "array", ARRAY_ORDINAL_OF("arrayLong", 3) AS "ordinal" FROM "array-example"
Returns the following:
array | ordinal |
---|---|
[1,null,3] | 3 |
null | null |
[1,2,3] | 3 |
[1,2,3] | 3 |
[] | null |
Returns true if two arrays have any elements in common. Treats NULL
values as known elements.
ARRAY_OVERLAP(arr1, arr2)
The following example returns true if columns arrayString
and arrayDouble
from the array-example
datasource have common elements.
SELECT "arrayString", "arrayDouble", ARRAY_OVERLAP("arrayString", "arrayDouble") AS "overlap" FROM "array-example"
Returns the following:
arrayString | arrayDouble | overlap |
---|---|---|
["a","b"] | [1.1,2.2,null] | false |
[null,"b"] | [999,null,5.5] | true |
[] | [null,2.2,1.1] | false |
["a","b"] | [] | false |
null | null | null |
Checks if the scalar value is present in the array. Returns false if the value is non-null, or UNKNOWN
if the value is NULL
. Returns UNKNOWN
if the array is NULL
.
SCALAR_IN_ARRAY(expr, arr)
The following example returns true if the value 36
is present in the array generated from the elements in the DestStateFips
column from the flight-carriers
datasource.
SELECT "Reporting_Airline", ARRAY_AGG(DISTINCT "DestStateFips") AS "StateFipsArray", SCALAR_IN_ARRAY(36, ARRAY_AGG(DISTINCT "DestStateFips")) AS "ValueInArray" FROM "flight-carriers" GROUP BY "Reporting_Airline" LIMIT 5
Returns the following:
Reporting_Airline | StateFipsArray | ValueInArray |
---|---|---|
AA | [1,4,5,6,8,9,12,13,15,17,18,20,21,22,24,25,26,27,29,31,32,34,35,36,37,39,40,41,42,44,47,48,49,51,53,72,78] | true |
AS | [2,4,6,8,12,16,17,25,32,34,41,48,51,53] | false |
B6 | [4,6,8,12,22,25,32,34,36,41,49,50,51,53,72] | true |
CO | [1,2,4,6,8,9,12,13,15,17,18,22,24,25,26,27,28,29,31,32,33,34,35,36,37,39,40,41,42,44,45,47,48,49,51,53,72,78] | true |
DH | [1,6,9,12,13,17,23,25,26,32,33,34,36,37,39,42,44,45,47,50,51,53,54] | true |
Prepends the expression to the array. The source array type determines the resulting array type.
ARRAY_PREPEND(expr, arr)
The following example prepends c
to the arrays in the arrayString
column from the array-example
datasource.
SELECT ARRAY_PREPEND('c', "arrayString") AS "arrayPrepended" FROM "array-example"
Returns the following:
arrayPrepended |
---|
[c, a, b] |
["c",null,"b"] |
[c] |
[c,a,b] |
null |
Returns a subset of the array from the zero-based index start
(inclusive) to end
(exclusive). Returns null if start
is less than 0, greater than the length of the array, or greater than end
.
ARRAY_SLICE(arr, start, end)
The following example constructs a new array from the elements of arrays in the arrayDouble
column from the array-example
datasource.
SELECT "arrayDouble", ARRAY_SLICE("arrayDouble", 0, 2) AS "arrayNew" FROM "array-example"
Returns the following:
arrayDouble | arrayNew |
---|---|
[1.1,2.2,null] | [1.1,2.2] |
[999,null,5.5] | [999,null] |
[null,2.2,1.1] | [null,2.2] |
[] | [null,null] |
null | null |
Converts an array of any type into a multi-value string.
ARRAY_TO_MV(arr)
The following example converts the arrays in the arrayDouble
column from the array-example
datasource into multi-value strings.
SELECT ARRAY_TO_MV("arrayDouble") AS "multiValueString" FROM "array-example"
Returns the following:
multiValueString |
---|
["1.1","2.2",null] |
["999.0",null,"5.5"] |
[null,"2.2","1.1"] |
[] |
null |
Joins all elements of the array into a string using the specified delimiter.
ARRAY_TO_STRING(arr, delimiter)
The following example converts the arrays in the arrayDouble
column of the array-example
datasource into concatenated strings.
SELECT ARRAY_TO_STRING("arrayDouble", '') AS "notSeparated" FROM "array-example"
Returns the following:
multiValueString |
---|
1.12.2null |
999.0null5.5 |
null2.21.1 |
|
null |
Calculates the arc sine (arcsine) of a numeric expression.
ASIN(expr)
The following example calculates the arc sine of 1
.
SELECT ASIN(1) AS "arc_sine"
Returns the following:
arc_sine |
---|
1.5707963267948966 |
Calculates the arc tangent (arctangent) of a numeric expression.
ATAN(expr)
The following example calculates the arc tangent of 1
.
SELECT ATAN(1) AS "arc_tangent"
Returns the following:
arc_tangent |
---|
0.7853981633974483 |
Calculates the arc tangent (arctangent) of a specified x and y coordinate.
ATAN2(x, y)
The following example calculates the arc tangent of the coordinate (1, -1)
SELECT ATAN2(1,-1) AS "arc_tangent_2"
Returns the following:
arc_tangent_2 |
---|
2.356194490192345 |
Calculates the average of a set of values.
AVG(<NUMERIC>)
The following example calculates the average minutes of delay for a particular airlines in flight-carriers
:
SELECT AVG("DepDelayMinutes") AS avg_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
avg_delay |
---|
8.936 |
Performs a bitwise AND operation on all input values.
BIT_AND(expr)
The following example returns the bitwise AND operation for all values in passenger-count
from taxi-trips
:
SELECT BIT_AND("passenger_count") AS "bit_and" FROM "taxi-trips"
Returns the following:
bit_and |
---|
0 |
Performs a bitwise OR operation on all input values.
BIT_OR(expr)
The following example returns the bitwise OR operation for all values in passenger-count
from taxi-trips
:
SELECT BIT_OR("passenger_count") AS "bit_or" FROM "taxi-trips"
Returns the following:
bit_or |
---|
15 |
Performs a bitwise XOR operation on all input values.
BIT_XOR(expr)
The following example returns the bitwise XOR operation for all values in passenger-count
from taxi-trips
:
SELECT BIT_OR("passenger_count") AS "bit_xor" FROM "taxi-trips"
Returns the following:
bit_xor |
---|
6 |
Returns the bitwise AND between two expressions: expr1 & expr2
.
BITWISE_AND(expr1, expr2)
The following example performs the bitwise AND operation 12 & 10
.
SELECT BITWISE_AND(12, 10) AS "bitwise_and"
Returns the following:
bitwise_and |
---|
8 |
Returns the bitwise complement (bitwise not) for the expression: ~expr
.
BITWISE_COMPLEMENT(expr)
The following example performs the bitwise complement operation ~12
.
SELECT BITWISE_COMPLEMENT(12) AS "bitwise_complement"
Returns the following:
bitwise_complement |
---|
-13 |
Converts the bits of an IEEE 754 floating-point double value to long.
BITWISE_CONVERT_DOUBLE_TO_LONG_BITS(expr)
The following example returns the IEEE 754 floating-point double representation of 255
as a long.
SELECT BITWISE_CONVERT_DOUBLE_TO_LONG_BITS(255) AS "ieee_754_double_to_long"
Returns the following:
ieee_754_double_to_long |
---|
4643176031446892544 |
Converts a long to the IEEE 754 floating-point double specified by the bits stored in the long.
BITWISE_CONVERT_LONG_BITS_TO_DOUBLE(expr)
The following example returns the long representation of 4643176031446892544
as an IEEE 754 floating-point double.
SELECT BITWISE_CONVERT_LONG_BITS_TO_DOUBLE(4643176031446892544) AS "long_to_ieee_754_double"
Returns the following:
long_to_ieee_754_double |
---|
255 |
Returns the bitwise OR between the two expressions: expr1 | expr2
.
BITWISE_OR(expr1, expr2)
The following example performs the bitwise OR operation 12 | 10
.
SELECT BITWISE_OR(12, 10) AS "bitwise_or"
Returns the following:
bitwise_or |
---|
14 |
Returns the bitwise left shift by x positions of an expr: expr << x
.
BITWISE_SHIFT_LEFT(expr, x)
The following example performs the bitwise SHIFT operation 2 << 3
.
SELECT BITWISE_SHIFT_LEFT(2, 3) AS "bitwise_shift_left"
Returns the following:
bitwise_shift_left |
---|
16 |
Returns the bitwise right shift by x positions of an expr: expr >> x
.
BITWISE_SHIFT_RIGHT(expr, x)
The following example performs the bitwise SHIFT operation 16 >> 3
.
SELECT BITWISE_SHIFT_RIGHT(16, 3) AS "bitwise_shift_right"
Returns the following:
bitwise_shift_right |
---|
2 |
Returns the bitwise exclusive OR between the two expressions: expr1 ^ expr2
.
BITWISE_XOR(expr1, expr2)
The following example performs the bitwise XOR operation 12 ^ 10
.
SELECT BITWISE_XOR(12, 10) AS "bitwise_xor"
Returns the following:
bitwise_xor |
---|
6 |
Computes a Bloom filter from values provided in an expression.
BLOOM_FILTER(expr, numEntries)
numEntries
specifies the maximum number of distinct values before the false positive rate increases.The following example returns a Base64-encoded Bloom filter representing the set of devices, agent_category
, used in Albania:
SELECT "country", BLOOM_FILTER(agent_category, 10) as albanian_bloom FROM "kttm" WHERE "country" = 'Albania' GROUP BY "country"
Returns the following:
country | albanian_bloom |
---|---|
Albania | BAAAAAgAAACAAEAAAAAAAAAAAEIAAAAAAAAAAAAAAAAAAAAAAAIIAAAAAAAAAAAAAAAAAAIAAAAAAQAAAAAAAAAAAAAA |
Returns true if an expression is contained in a Base64-encoded Bloom filter string.
BLOOM_FILTER_TEST(expr, <STRING>)
The following example returns true
when a device type, agent_category
, exists in the Bloom filter representing the set of devices used in Albania:
SELECT agent_category, BLOOM_FILTER_TEST("agent_category", 'BAAAAAgAAACAAEAAAAAAAAAAAEIAAAAAAAAAAAAAAAAAAAAAAAIIAAAAAAAAAAAAAAAAAAIAAAAAAQAAAAAAAAAAAAAA') AS bloom_test FROM "kttm" GROUP BY 1
Returns the following:
agent_category | bloom_test |
---|---|
empty | false |
Game console | false |
Personal computer | true |
Smart TV | false |
Smartphone | true |
Tablet | false |
Trims characters from both the leading and trailing ends of an expression. Defaults chars
to a space if none is provided.
BTRIM(expr[, chars])
The following example trims the _
characters from both ends of the string expression.
SELECT '___abc___' AS "original_string", BTRIM('___abc___', '_') AS "trim_both_ends"
Returns the following:
original_string | trim_both_ends |
---|---|
___abc___ | abc |
Returns a result based on given conditions.
Compares an expression to a set of values or expressions.
CASE expr WHEN value1 THEN result1 \[ WHEN value2 THEN result2 ... \] \[ ELSE resultN \] END
The following example returns a UI type based on the value of agent_category
from the kttm
datasource.
SELECT "agent_category" AS "device_type", CASE "agent_category" WHEN 'Personal computer' THEN 'Large UI' WHEN 'Smartphone' THEN 'Mobile UI' ELSE 'other' END AS "UI_type" FROM "kttm" LIMIT 2
Returns the following:
device_type | UI_type |
---|---|
Personal computer | Large UI |
Smartphone | Mobile UI |
Evaluates a set of Boolean expressions.
CASE WHEN boolean_expr1 THEN result1 \[ WHEN boolean_expr2 THEN result2 ... \] \[ ELSE resultN \] END
The following example returns the departure location corresponding to the value of the OriginStateName
column from the flight-carriers
datasource.
SELECT "OriginStateName" AS "flight_origin", CASE WHEN "OriginStateName" = 'Puerto Rico' THEN 'U.S. Territory' WHEN "OriginStateName" = 'U.S. Virgin Islands' THEN 'U.S. Territory' ELSE 'U.S. State' END AS "state_status" FROM "flight-carriers" LIMIT 2
Returns the following:
flight_origin | departure_location |
---|---|
Puerto Rico | U.S. Territory |
Massachusetts | U.S. State |
Converts a value into the specified data type.
CAST(value AS TYPE)
The following example converts the values in the Distance
column from the flight-carriers
datasource from DOUBLE
to VARCHAR
.
SELECT "Distance" AS "original_column", CAST("Distance" AS VARCHAR) "cast_to_string" FROM "flight-carriers" LIMIT 1
Returns the following:
original_column | cast_to_string |
---|---|
1571 | 1571.0 |
Rounds up a timestamp by a given time unit.
CEIL(timestamp_expr TO unit>)
The following example rounds up the __time
column from the taxi-trips
datasource to the nearest year.
SELECT "__time" AS "original_time", CEIL("__time" TO YEAR) AS "ceiling" FROM "taxi-trips" LIMIT 1
Returns the following:
original_time | ceiling |
---|---|
2013-08-01T08:14:37.000Z | 2014-01-01T00:00:00.000Z |
Calculates the smallest integer value greater than or equal to the numeric expression.
CEIL(<NUMERIC>)
The following example applies the CEIL function to the fare_amount
column from the taxi-trips
datasource.
SELECT "fare_amount" AS "fare_amount", CEIL("fare_amount") AS "ceiling_fare_amount" FROM "taxi-trips" LIMIT 1
Returns the following:
fare_amount | ceiling_fare_amount |
---|---|
21.25 | 22 |
Alias for LENGTH
.
CHAR_LENGTH(expr)
Alias for LENGTH
.
CHARACTER_LENGTH(expr)
Returns the first non-null value.
COALESCE(expr, expr, ...)
The following example returns the first non-null value from the list of parameters.
SELECT COALESCE(null, null, 5, 'abc') AS "first_non_null"
Returns the following:
first_non_null |
---|
5 |
Concatenates a list of expressions.
CONCAT(expr[, expr,...])
The following example concatenates the OriginCityName
column from flight-carriers
, the string to
, and the DestCityName
column from flight-carriers
.
SELECT "OriginCityName" AS "origin_city", "DestCityName" AS "destination_city", CONCAT("OriginCityName", ' to ', "DestCityName") AS "concatenate_flight_details" FROM "flight-carriers" LIMIT 1
Returns the following:
origin_city | destination_city | concatenate_flight_details |
---|---|---|
San Juan, PR | Washington, DC | San Juan, PR to Washington, DC |
Returns true if str
is a substring of expr
, case-sensitive. Otherwise, returns false.
CONTAINS_STRING(expr, str)
The following example returns true if the OriginCityName
column from the flight-carriers
datasource contains the substring San
.
SELECT "OriginCityName" AS "origin_city", CONTAINS_STRING("OriginCityName", 'San') AS "contains_string" FROM "flight-carriers" LIMIT 2
Returns the following:
origin_city | contains_string |
---|---|
San Juan, PR | true |
Boston, MA | false |
Calculates the trigonometric cosine of an angle expressed in radians.
COS(expr)
The following example calculates the cosine of angle PI/3
radians.
SELECT COS(PI / 3) AS "cosine"
Returns the following:
cosine |
---|
0.5000000000000001 |
Calculates the trigonometric cotangent of an angle expressed in radians.
COT(expr)
The following example calculates the cotangent of angle PI/3
radians.
SELECT COT(PI / 3) AS "cotangent"
Returns the following:
cotangent |
---|
0.577350269189626 |
Counts the number of rows.
COUNT([DISTINCT] expr)
COUNT(*)
APPROX_COUNT_DISTINCT
.The following example counts the number of distinct flights per day after '2005-01-01 00:00:00'
in flight-carriers
:
SELECT TIME_FLOOR(__time, 'P1D') AS "flight_day", COUNT(*) AS "num_flights" FROM "flight-carriers" WHERE __time > '2005-01-01 00:00:00' GROUP BY 1 LIMIT 3
Returns the following:
flight_day | num_flights |
---|---|
2005-11-01T00:00:00.000Z | 18961 |
2005-11-02T00:00:00.000Z | 19434 |
2005-11-03T00:00:00.000Z | 19745 |
Returns the cumulative distribution of the current row within the window calculated as number of window rows at the same rank or higher than current row
/ total window rows
. The return value ranges between 1/number of rows
and 1.
CUME_DIST()
The following example returns the cumulative distribution of number of flights by airline from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", CUME_DIST() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "cume_dist" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | cume_dist |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | 0.25 |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | 0.5 |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | 1 |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | 1 |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | 0.3333333333333333 |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | 1 |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | 1 |
Returns the current date in UTC time, unless you specify a different timezone in the query context.
CURRENT_DATE
The following example returns the current date.
SELECT CURRENT_DATE AS "current_date"
Returns the following:
current_date |
---|
2024-08-14T00:00:00.000Z |
Returns the current timestamp in UTC time, unless you specify a different timezone in the query context.
CURRENT_TIMESTAMP
The following example returns the current timestamp.
SELECT CURRENT_TIMESTAMP AS "current_timestamp"
Returns the following:
current_timestamp |
---|
2024-08-14T21:30:13.793Z |
Rounds down a timestamp by a given time unit.
DATE_TRUNC(unit, timestamp_expr)
The following example truncates a timestamp from the __time
column from the taxi-trips
datasource to the most recent decade
.
SELECT "__time" AS "original_timestamp", DATE_TRUNC('decade', "__time") AS "truncate_timestamp" FROM "taxi-trips" LIMIT 1
Returns the following:
original_timestamp | truncate_time |
---|---|
2013-08-01T08:14:37.000Z | 2010-01-01T00:00:00.000Z |
Decodes a Base64-encoded expression into a complex data type.
You can use the function to ingest data when a column contains an encoded data sketch such as Theta or HLL.
The function supports hyperUnique
and serializablePairLongString
data types by default. To enable support for a complex data type, load the corresponding extension:
druid-bloom-filter
: bloom
druid-datasketches
: arrayOfDoublesSketch
, HLLSketch
, KllDoublesSketch
, KllFloatsSketch
, quantilesDoublesSketch
, thetaSketch
druid-histogram
: approximateHistogram
, fixedBucketsHistogram
druid-stats
: variance
druid-compressed-bigdecimal
: compressedBigDecimal
druid-momentsketch
: momentSketch
druid-tdigestsketch
: tDigestSketch
DECODE_BASE64_COMPLEX(dataType, expr)
The following example returns a Theta sketch complex type from a Base64-encoded string representation of the sketch:
SELECT DECODE_BASE64_COMPLEX('thetaSketch','AgMDAAAazJNBAAAAAACAP+k/tkWGkSoFYWMAG0y+3gVabvKcIUNrBv0jAkGsw7sK5szX1k0ScwtMfCQmFP/rDhFK6yU7PPkObZ/Ugw5fcBQZ+GaO+Nt6FP+Whz6TmxkWyRJ+gaQLFhcts1+c0Q/vF9FLFfaVlOkb3/XpXaZ3JhyZ2dG8Di2/HO10sMs9C0AdM4FdHuye6SB+GYinIhTOITOHzB5SAfIiph3de9qIGSM89V+s/TkdI/WZVzK9wF0npfi4ZrmgBSnVjphCtQA5K2fp0x59UCwvMopZarsSkzEo81OIxjznNNXLr1BbQBo1Ei3OxJOoNzVs0x9xzsm4NfgAZSvZQvI1c2TmPsZvlzpW7tmIlizOOsr6pGWoh0U99/tV8RFwhz0SJoWyU1Z2P0hZ5d7KRnZBjlWC+e/FLEKrWsu14rlFRXhsOuxRId9FboEuH9PqMUixI2lB8MhLS803hJDoZ7tMy7Egl+YNU04QM11stXX4Tu96NHHcGiZRuCyciGiTGVQflMLmNt6lW6zIwJy0baNdbwjMCTjtUF7oZOtugWLYYJE9sJU3HuVijc0J10l6SmPslbfY6Fw0Za9w/Zdhn/5nIuKc1WMrYWnAJQJKXY73bHYWq7gI6dRvYdC2fLJyv3F8qwQcOJgFc0GaGXw8KRF3w3IVCwxsMntWhdTkaJ88e++5NFyM1Hd/D79wg0b9vH8=') AS "theta_sketch"
You can perform Theta sketch operations on the resulting COMPLEX<thetaSketch>
value which resembles the input string.
Decodes a Base64-encoded expression into a UTF-8 encoded string.
DECODE_BASE64_UTF8(expr)
The following example decodes the Base64-encoded representation of “Hello, World!”:
SELECT DECODE_BASE64_UTF8('SGVsbG8sIFdvcmxkIQ==') as decoded
Returns the following:
decoded |
---|
Hello, World! |
Converts an angle from radians to degrees.
DEGREES(expr)
The following example converts an angle of PI
radians to degrees
SELECT DEGREES(PI) AS "degrees"
Returns the following:
degrees |
---|
180 |
Returns the rank for a row within a window without gaps. For example, if two rows tie for a rank of 1, the subsequent row is ranked 2.
DENSE_RANK()
The following example returns the dense rank by airline for flights from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", DENSE_RANK() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "dense_rank" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | dense_rank |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | 1 |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | 2 |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | 3 |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | 3 |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | 1 |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | 2 |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | 2 |
Returns the result of integer division of x
by y
.
:::info The DIV
function is not implemented in Druid versions 30.0.0 or earlier. Consider using SAFE_DIVIDE
instead. :::
DIV(x, y)
The following calculates integer divisions of 78
by 10
.
SELECT DIV(78, 10) as "division"
Returns the following:
division |
---|
7 |
Returns a string representing an approximation to the cumulative distribution function given a list of split points that define the edges of the bins from a Quantiles sketch.
DS_CDF(expr, splitPoint0, splitPoint1, ...)
The following example specifies three split points to return cumulative distribution function approximations on the Distance
column from the flight-carriers
datasource. The query may return a different approximation for each bin on each execution.
SELECT DS_CDF( DS_QUANTILES_SKETCH("Distance"), 750, 1500, 2250) AS "estimate_cdf" FROM "flight-carriers"
Returns a result similar to the following:
estimate_cdf |
---|
[0.6332237016416492,0.8908411023460711,0.9612303007393957,1.0] |
Returns the quantile estimate corresponding to the fraction from a Quantiles sketch.
DS_GET_QUANTILE(expr, fraction)
The following example approximates the median of the Distance
column from the flight-carriers
datasource. The query may return a different approximation with each execution.
SELECT DS_GET_QUANTILE( DS_QUANTILES_SKETCH("Distance"), 0.5) AS "estimate_median" FROM "flight-carriers"
Returns a result similar to the following:
estimate_median |
---|
569 |
Returns a string representing an array of quantile estimates corresponding to a list of fractions from a Quantiles sketch.
DS_GET_QUANTILES(expr, fraction0, fraction1, ...)
The following example approximates the 25th, 50th, and 75th percentiles of the Distance
column from the flight-carriers
datasource. The query may return a different approximation for each percentile on each execution.
SELECT DS_GET_QUANTILES( DS_QUANTILES_SKETCH("Distance"), 0.25, 0.5, 0.75) AS "estimate_fractions" FROM "flight-carriers"
Returns a result similar to the following:
estimate_fractions |
---|
[316.0,571.0,951.0] |
Returns an approximation to the histogram from a Quantiles sketch. The split points define the histogram bins.
DS_HISTOGRAM(expr, splitPoint0, splitPoint1, ...)
The following example specifies three split points to approximate a histogram on the Distance
column from the flight-carriers
datasource. The query may return a different approximation for each bin on each execution.
SELECT DS_HISTOGRAM( DS_QUANTILES_SKETCH("Distance"), 750, 1500, 2250) AS "estimate_histogram" FROM "flight-carriers"
Returns a result similar to the following:
estimate_histogram |
---|
[358496.0,153974.99999999997,39909.99999999999,13757.000000000005] |
Creates a HLL sketch on a column containing HLL sketches or a regular column. See DataSketches HLL Sketch module for a description of optional parameters.
DS_HLL(expr, [lgK, tgtHllType])
The following example creates a HLL sketch on the Tail_number
column of the flight-carriers
datasource grouping by OriginState
and DestState
.
SELECT "OriginState" AS "origin_state", "DestState" AS "destination_state", DS_HLL("Tail_Number") AS "hll_tail_number" FROM "flight-carriers" GROUP BY 1,2 LIMIT 1
Returns the following:
origin_state | destination_state | hll_tail_number |
---|---|---|
AK | AK | "AwEHDAcIAAFBAAAAfY..." |
Returns a string summary of a Quantiles sketch.
DS_QUANTILE_SUMMARY(expr)
The following example returns a summary of a Quantiles sketch on the Distance
column from the flight-carriers
datasource.
SELECT DS_QUANTILE_SUMMARY( DS_QUANTILES_SKETCH("Distance") ) AS "summary" FROM "flight-carriers"
Returns the following:
### Quantiles DirectCompactDoublesSketch SUMMARY: Empty : false Memory, Capacity bytes : true, 6128 Estimation Mode : true K : 128 N : 566,138 Levels (Needed, Total, Valid): 12, 12, 5 Level Bit Pattern : 100010100011 BaseBufferCount : 122 Combined Buffer Capacity : 762 Retained Items : 762 Compact Storage Bytes : 6,128 Updatable Storage Bytes : 14,368 Normalized Rank Error : 1.406% Normalized Rank Error (PMF) : 1.711% Min Item : 2.400000e+01 Max Item : 4.962000e+03 ### END SKETCH SUMMARY
Creates a Quantiles sketch on a Quantiles sketch column or a regular column. See DataSketches Quantiles Sketch module for a description of parameters.
DS_QUANTILES_SKETCH(expr, [k])
The following example creates a Quantile sketch on the Distance
column from the flight-carriers
datasource.
SELECT DS_QUANTILES_SKETCH("Distance") AS "quantile_sketch" FROM "flight-carriers"
Returns the following:
quantile_sketch |
---|
AgMIGoAAAAB6owgAA... |
Returns an approximate rank of a given value in a distribution. The rank represents the fraction of the distribution less than the given value.
DS_RANK(expr, value)
The following example estimates the fraction of records in the flight-carriers
datasource where the value in the Distance
column is less than 500. The query may return a different approximation on each execution.
SELECT DS_RANK( DS_QUANTILES_SKETCH("Distance"), 500) AS "estimate_rank" FROM "flight-carriers"
Returns a result similar to the following:
estimate_rank |
---|
0.43837721544923675 |
Creates a Theta sketch on a column containing Theta sketches or a regular column. See DataSketches Theta Sketch module for a description of optional parameters.
DS_THETA(expr, [size])
The following example creates a Theta sketch on the Tail_number
column of the flight-carriers
datasource grouping by OriginState
and DestState
.
SELECT "OriginState" AS "origin_state", "DestState" AS "destination_state", DS_THETA("Tail_Number") AS "theta_tail_number" FROM "flight-carriers" GROUP BY 1,2 LIMIT 1
Returns the following:
origin_state | destination_state | theta_tail_number |
---|---|---|
AK | AK | AgMDAAAazJNBAAAAA... |
Creates a Tuple sketch on raw data or a precomputed sketch column. See DataSketches Tuple Sketch module for a description of parameters.
DS_TUPLE_DOUBLES(expr[, nominalEntries])
DS_TUPLE_DOUBLES(dimensionColumnExpr, metricColumnExpr1[, metricColumnExpr2, ...], [nominalEntries])
The following example creates a Tuples sketch column that stores the arrival and departure delay minutes for each airline in flight-carriers
:
SELECT "Reporting_Airline", DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes", "DepDelayMinutes") AS tuples_delay FROM "flight-carriers" GROUP BY 1 LIMIT 2
Returns the following:
Reporting_Airline | tuples_delay |
---|---|
AA | 1.0 |
AS | 1.0 |
Returns an intersection of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for nominal entries.
DS_TUPLE_DOUBLES_INTERSECT(expr, ..., [nominalEntries])
The following example calculates the total minutes of arrival delay for airlines flying out of SFO
or LAX
. An airline that doesn't fly out of both airports returns a value of 0.
SELECT "Reporting_Airline", DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE( DS_TUPLE_DOUBLES_INTERSECT( DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'SFO'), DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'LAX') ) ) AS arrival_delay_sfo_lax FROM "flight-carriers" GROUP BY 1 LIMIT 5
Returns the following:
Reporting_Airline | arrival_delay_sfo_lax |
---|---|
AA | [33296] |
AS | [13694] |
B6 | [0] |
CO | [13582] |
DH | [0] |
Computes approximate sums of the values contained within a Tuple sketch which contains an array of double values as the Summary Object.
DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(expr)
The following example calculates the sum of arrival and departure delay minutes for each airline in flight-carriers
:
SELECT "Reporting_Airline", DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE(DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes", "DepDelayMinutes")) AS sum_delays FROM "flight-carriers" GROUP BY 1 LIMIT 2
Returns the following:
Reporting_Airline | sum_delays |
---|---|
AA | [612831,474309] |
AS | [157340,141462] |
Compare this example with an analogous SQL statement that doesn't use approximations:
SELECT "Reporting_Airline", SUM("ArrDelayMinutes") AS sum_arrival_delay, SUM("DepDelayMinutes") AS sum_departure_delay FROM "flight-carriers" GROUP BY 1 LIMIT 2
Returns the following:
Reporting_Airline | sum_arrival_delay | sum_departure_delay |
---|---|---|
AA | 612831 | 475735 |
AS | 157340 | 143620 |
Returns a set difference of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Object are preserved as is. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for nominal entries.
DS_TUPLE_DOUBLES_NOT(expr, ..., [nominalEntries])
The following example calculates the total minutes of arrival delay for airlines that fly out of SFO
but not LAX
.
SELECT "Reporting_Airline", DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE( DS_TUPLE_DOUBLES_NOT( DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'SFO'), DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'LAX') ) ) AS arrival_delay_sfo_lax FROM "flight-carriers" GROUP BY 1 LIMIT 5
Returns the following:
Reporting_Airline | arrival_delay_sfo_lax |
---|---|
AA | [0] |
AS | [0] |
B6 | [0] |
CO | [0] |
DH | [93] |
Returns a union of Tuple sketches which each contain an array of double values as their Summary Objects. The values contained in the Summary Objects are summed when combined. If the last value of the array is a numeric literal, Druid assumes that the value is an override parameter for nominal entries.
DS_TUPLE_DOUBLES_UNION(expr, ..., [nominalEntries])
The following example calculates the total minutes of arrival delay for airlines flying out of either SFO
or LAX
.
SELECT "Reporting_Airline", DS_TUPLE_DOUBLES_METRICS_SUM_ESTIMATE( DS_TUPLE_DOUBLES_UNION( DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'SFO'), DS_TUPLE_DOUBLES("Reporting_Airline", "ArrDelayMinutes") FILTER(WHERE "Origin" = 'LAX') ) ) AS arrival_delay_sfo_lax FROM "flight-carriers" GROUP BY 1 LIMIT 5
Returns the following:
Reporting_Airline | arrival_delay_sfo_lax |
---|---|
AA | [33296] |
AS | [13694] |
B6 | [0] |
CO | [13582] |
DH | [93] |
Returns the value of a numeric or string expression corresponding to the earliest __time
value.
EARLIEST(expr, [maxBytesPerValue])
The following example returns the origin airport code associated with the earliest departing flight daily after '2005-01-01 00:00:00'
in flight-carriers
:
SELECT TIME_FLOOR(__time, 'P1D') AS "departure_day", EARLIEST("Origin") AS "origin" FROM "flight-carriers" WHERE __time >= TIMESTAMP '2005-01-01 00:00:00' GROUP BY 1 LIMIT 2
Returns the following:
departure_day | origin |
---|---|
2005-11-01T00:00:00.000Z | LAS |
2005-11-02T00:00:00.000Z | SDF |
Returns the value of a numeric or string expression corresponding to the earliest time value from timestampExpr
.
EARLIEST_BY(expr, timestampExpr, [maxBytesPerValue])
The following example returns the destination airport code associated with the earliest arriving flight daily after '2005-01-01 00:00:00'
in flight-carriers
:
SELECT TIME_FLOOR(TIME_PARSE("arrivalime"), 'P1D') AS "arrival_day", EARLIEST_BY("Dest", TIME_PARSE("arrivalime")) AS "dest" FROM "flight-carriers" WHERE TIME_PARSE("arrivalime") >= TIMESTAMP '2005-01-01 00:00:00' GROUP BY 1 LIMIT 2
Returns the following:
arrival_day | origin |
---|---|
2005-11-01T00:00:00.000Z | RSW |
2005-11-02T00:00:00.000Z | CLE |
Calculates e raised to the power of the numeric expression.
EXP(<NUMERIC>)
The following example calculates e to the power of 1.
SELECT EXP(1) AS "exponential"
Returns the following:
exponential |
---|
2.7182818284590455 |
Extracts the value of some unit from the timestamp.
EXTRACT(unit FROM timestamp_expr)
The following example extracts the year from the __time
column from the taxi-trips
datasource.
SELECT "__time" AS "original_time", EXTRACT(YEAR FROM "__time" ) AS "year" FROM "taxi-trips" LIMIT 1
Returns the following:
original_time | year |
---|---|
2013-08-01T08:14:37.000Z | 2013 |
Returns the value evaluated for the expression for the first row within the window.
FIRST_VALUE(expr)
The following example returns the name of the first airline in the window of flights by airline for two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", FIRST_VALUE("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "first_val" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | first_val |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | HA |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | HA |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | HA |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | HA |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | HA |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | HA |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | HA |
Rounds down a timestamp by a given time unit.
FLOOR(timestamp_expr TO unit)
The following example rounds down the __time
column from the taxi-trips
datasource to the nearest year.
SELECT "__time" AS "original_time", FLOOR("__time" TO YEAR) AS "floor" FROM "taxi-trips" LIMIT 1
Returns the following:
original_time | floor |
---|---|
2013-08-01T08:14:37.000Z | 2013-01-01T00:00:00.000Z |
Calculates the largest integer less than or equal to the numeric expression.
FLOOR(expr)
The following example applies the FLOOR function to the fare_amount
column from the taxi-trips
datasource.
SELECT "fare_amount" AS "fare_amount", FLOOR("fare_amount") AS "floor_fare_amount" FROM "taxi-trips" LIMIT 1
Returns the following:
fare_amount | floor_fare_amount |
---|---|
21.25 | 21 |
Returns the maximum value from the provided expressions. For information on how Druid interprets the arguments passed into the function, see Reduction functions.
GREATEST([expr1, ...])
The following example returns the greatest value between the numeric constant PI
, the integer number 4
, and the double -5.0
. Druid interprets these arguments as DOUBLE data type.
SELECT GREATEST(PI, 4, -5.0) AS "greatest"
Returns the following:
greatest |
---|
4 |
Returns a number for each output row of a groupBy query, indicating whether the specified dimension is included for that row.
GROUPING(expr, expr...)
The following example returns the total minutes of flight delay for each day of the week in flight-carriers
. The GROUP BY clause creates two grouping sets, one for the day of the week and one for the grand total.
For more information, refer to CASE and grouping sets with SQL GROUP BY.
SELECT CASE WHEN GROUPING("DayOfWeek") = 1 THEN 'Total' ELSE "DayOfWeek" END AS "DayOfWeek", GROUPING("DayOfWeek") AS Subgroup, SUM("DepDelayMinutes") AS "MinutesDelayed" FROM "flight-carriers" GROUP BY GROUPING SETS("DayOfWeek", ())
Returns the following:
DayOfWeek | Subgroup | MinutesDelayed |
---|---|---|
1 | 0 | 998505 |
2 | 0 | 1031599 |
3 | 0 | 884677 |
4 | 0 | 525351 |
5 | 0 | 519413 |
6 | 0 | 354601 |
7 | 0 | 848704 |
Total | 1 | 5162850 |
Returns the distinct count estimate from a HLL sketch. To round the distinct count estimate, set round
to true. round
defaults to false.
HLL_SKETCH_ESTIMATE(expr, [round])
The following example estimates the distinct number of unique tail numbers in the flight-carriers
datasource.
SELECT HLL_SKETCH_ESTIMATE(DS_HLL("Tail_Number")) AS "estimate" FROM "flight-carriers"
Returns the following:
estimate |
---|
4685.8815405960595 |
Returns the distinct count estimate and error bounds from a HLL sketch. To specify the number of standard bound deviations, use numStdDev
.
HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, [numStdDev])
The following example estimates the number of unique tail numbers in the flight-carriers
datasource with error bounds at plus or minus one standard deviation.
SELECT HLL_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(DS_HLL("Tail_Number"), 1) AS "estimate_with_errors" FROM "flight-carriers"
Returns the following:
estimate_with_errors |
---|
[4685.8815405960595,4611.381540678335,4762.978259800803] |
Returns a human-readable string representation of a HLL sketch for debugging.
HLL_SKETCH_TO_STRING(expr)
The following example returns the HLL sketch on column Tail_Number
from the flight-carriers
datasource as a human-readable string.
SELECT HLL_SKETCH_TO_STRING( DS_HLL("Tail_Number") ) AS "summary" FROM "flight-carriers"
Returns the following:
### HLL SKETCH SUMMARY: Log Config K : 12 Hll Target : HLL_4 Current Mode : HLL Memory : false LB : 4611.381540678335 Estimate : 4685.8815405960595 UB : 4762.978259800803 OutOfOrder Flag: true CurMin : 0 NumAtCurMin : 1316 HipAccum : 0.0 KxQ0 : 2080.7755126953125 KxQ1 : 0.0 Rebuild KxQ Flg: false
Returns a union of HLL sketches. See DataSketches HLL Sketch module for a description of optional parameters.
HLL_SKETCH_UNION([lgK, tgtHllType], expr0, expr1, ...)
The following example estimates the union of the HLL sketch of tail numbers that took off from CA
and the HLL sketch of tail numbers that took off from TX
. The example uses the Tail_Number
and OriginState
columns from the flight-carriers
datasource.
SELECT HLL_SKETCH_ESTIMATE( HLL_SKETCH_UNION( DS_HLL("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), DS_HLL("Tail_Number") FILTER(WHERE "OriginState" = 'TX') ) ) AS "estimate_union" FROM "flight-carriers"
Returns the following:
estimate_union |
---|
4204.798431046455 |
Converts an integer byte size into human-readable IEC format.
HUMAN_READABLE_BINARY_BYTE_FORMAT(value[, precision])
The following example converts 1000000
into IEC format.
SELECT HUMAN_READABLE_BINARY_BYTE_FORMAT(1000000, 2) AS "iec_format"
Returns the following:
iec_format |
---|
976.56 KiB |
Converts a byte size into human-readable SI format.
HUMAN_READABLE_DECIMAL_BYTE_FORMAT(value[, precision])
The following example converts 1000000
into SI format.
SELECT HUMAN_READABLE_DECIMAL_BYTE_FORMAT(1000000, 2) AS "si_format"
Returns the following:
si_format |
---|
1.00 MB |
Converts a byte size into human-readable SI format with single-character units.
HUMAN_READABLE_DECIMAL_FORMAT(value[, precision])
The following example converts 1000000
into single character SI format.
SELECT HUMAN_READABLE_DECIMAL_FORMAT(1000000, 2) AS "single_character_si_format"
Returns the following:
single_character_si_format |
---|
1.00 M |
Returns true if str
is a substring of expr
, case-insensitive. Otherwise, returns false.
ICONTAINS_STRING(expr, str)
The following example returns true if the OriginCityName
column from the flight-carriers
datasource contains the case-insensitive substring san
.
SELECT "OriginCityName" AS "origin_city", ICONTAINS_STRING("OriginCityName", 'san') AS "contains_case_insensitive_string" FROM "flight-carriers" LIMIT 2
Returns the following:
origin_city | contains_case_insensitive_string |
---|---|
San Juan, PR | true |
Boston, MA | false |
Returns true if the IPv4 address
belongs to the subnet
literal, otherwise returns false.
IPV4_MATCH(address, subnet)
The following example returns true if the IPv4 address in the forward_for
column from the kttm
datasource belongs to the subnet 181.13.41.0/24
.
SELECT "forwarded_for" AS "ipv4_address", IPV4_MATCH("forwarded_for", '181.13.41.0/24') AS "belongs_in_subnet" FROM "kttm" LIMIT 2
Returns the following:
ipv4_address | belongs_in_subnet |
---|---|
181.13.41.82 | true |
177.242.100.0 | false |
Parses an IPv4 address
into its integer notation.
IPV4_PARSE(address)
The following example returns an integer that represents the IPv4 address 5.5.5.5
.
SELECT '5.5.5.5' AS "ipv4_address", IPV4_PARSE('5.5.5.5') AS "integer"
Returns the following:
ipv4_address | integer |
---|---|
5.5.5.5 | 84215045 |
Converts an IPv4 address
in integer notation into dot-decimal notation.
IPV4_STRINGIFY(address)
The following example returns the integer 84215045
in IPv4 dot-decimal notation.
SELECT '84215045' AS "integer", IPV4_STRINGIFY(84215045) AS "dot_decimal_notation"
Returns the following:
integer | dot_decimal_notation |
---|---|
84215045 | 5.5.5.5 |
Returns true if the IPv6 address
belongs to the subnet
literal. Otherwise, returns false.
IPV6_MATCH(address, subnet)
The following example returns true because 75e9:efa4:29c6:85f6::232c
is in the subnet of 75e9:efa4:29c6:85f6::/64
.
SELECT '75e9:efa4:29c6:85f6::232c' AS "ipv6_address", IPV6_MATCH('75e9:efa4:29c6:85f6::232c', '75e9:efa4:29c6:85f6::/64') AS "belongs_in_subnet"
Returns the following:
ipv6_address | belongs_in_subnet |
---|---|
75e9:efa4:29c6:85f6::232c | true |
Returns an array of field names from an expression, at a specified path.
JSON_KEYS(expr, path)
The following example returns an array of field names from the nested column agent
:
SELECT JSON_KEYS(agent, '$.') AS agent_keys FROM "kttm_nested" LIMIT 1
Returns the following:
agent_keys |
---|
[type, category, browser, browser_version, os, platform] |
Merges two or more JSON STRING
or COMPLEX<json>
expressions into one, preserving the rightmost value when there are key overlaps. Returns NULL
if any argument is NULL
. The function always returns a COMPLEX<json>
object.
JSON_MERGE(expr1, expr2[, expr3 ...])
The following example merges the event
object with a static string example_string
:
SELECT event, JSON_MERGE(event, '{"example_string": 123}') as event_with_string FROM "kttm_nested" LIMIT 1
Returns the following:
event | event_with_string |
---|---|
{"type":"PercentClear","percentage":55} | {"type":"PercentClear","percentage":55,"example_string":123} |
Constructs a new COMPLEX<json>
object from one or more expressions. The KEY
expressions must evaluate to string types. The VALUE
expressions can be composed of any input type, including other COMPLEX<json>
objects. The function can accept colon-separated key-value pairs.
JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])
JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])
The following example creates a new object combinedJSON
from continent
in geo_ip
and type
in event
:
SELECT JSON_OBJECT( KEY 'geo_ip' VALUE JSON_QUERY(geo_ip, '$.continent'), KEY 'event' VALUE JSON_QUERY(event, '$.type') ) as combined_JSON FROM "kttm_nested" LIMIT 1
Returns the following:
combined_JSON |
---|
{"geo_ip": {"continent": "South America"},"event": {"type": "PercentClear"}} |
Returns an array of all paths which refer to primitive values in an expression, in JSONPath format.
JSON_PATHS(expr)
The following example returns an array of distinct paths in the geo_ip
nested column:
SELECT ARRAY_CONCAT_AGG(DISTINCT JSON_PATHS(geo_ip)) AS geo_ip_paths from "kttm_nested"
Returns the following:
geo_ip_paths |
---|
[$.city, $.continent, $.country, $.region] |
Extracts a COMPLEX<json>
value from an expression at a specified path.
JSON_QUERY(expr, path)
The following example returns the values of percentage
in the event
nested column:
SELECT "event", JSON_QUERY("event", '$.percentage') FROM "kttm_nested" LIMIT 2
Returns the following:
event | percentage |
---|---|
{"type":"PercentClear","percentage":55} | 55 |
{"type":"PercentClear","percentage":80} | 80 |
Extracts an ARRAY<COMPLEX<json>>
value from an expression at a specified path.
If the value isn't an array, the function translates it into a single element ARRAY
containing the value at path
. This function is mainly used to extract arrays of objects to use as inputs to other array functions.
JSON_QUERY_ARRAY(expr, path)
The following example returns an array of percentage
values in the event
nested column:
SELECT "event", JSON_QUERY_ARRAY("event", '$.percentage') FROM "kttm_nested" LIMIT 2
Returns the following:
event | percentage |
---|---|
{"type":"PercentClear","percentage":55} | [55] |
{"type":"PercentClear","percentage":80} | [80] |
Extracts a primitive value from an expression at a specified path.
If you include RETURNING
and specify a SQL type (such as VARCHAR
, BIGINT
, DOUBLE
) the function plans the query using the suggested type. If RETURNING
isn‘t included, the function attempts to infer the type based on the context. If the function can’t infer the type, it defaults to VARCHAR
. Primitive arrays can also be returned, but only if RETURNING
is specified as an ARRAY
type, e.g. RETURNING VARCHAR ARRAY
.
JSON_VALUE(expr, path [RETURNING sqlType])
The following example returns the value of city
in the geo_ip
nested column:
SELECT geo_ip, JSON_VALUE(geo_ip, '$.city' RETURNING VARCHAR) as city FROM "kttm_nested" WHERE JSON_VALUE(geo_ip, '$.continent') = 'Asia' LIMIT 2
Returns the following:
geo_ip | city |
---|---|
{"continent":"Asia","country":"Taiwan","region":"Taipei City","city":"Taipei"} | Taipei |
{"continent":"Asia","country":"Thailand","region":"Bangkok","city":"Bangkok"} | Bangkok |
If you do not supply an offset
, returns the value evaluated at the row preceding the current row. Specify an offset number n
to return the value evaluated at n
rows preceding the current one.
LAG(expr[, offset])
The following example returns the preceding airline in the window for flights by airline from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", LAG("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "lag" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | lag |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | null |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | HA |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | UA |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | AA |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | null |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | HA |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | AA |
Returns the value evaluated for the expression for the last row within the window.
LAST_VALUE(expr)
The following example returns the last airline name in the window for flights for two airports on a single day. Note that the RANGE BETWEEN clause defines the window frame between the current row and the final row in the window instead of the default of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when using ORDER BY.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", LAST_VALUE("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS "last_value" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | last_value |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | NW |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | NW |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | NW |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | NW |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | UA |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | UA |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | UA |
Returns the value of a numeric or string expression corresponding to the latest __time
value.
LATEST(expr, [maxBytesPerValue])
The following example returns the origin airport code associated with the latest departing flight daily after '2005-01-01 00:00:00'
in flight-carriers
:
SELECT TIME_FLOOR(__time, 'P1D') AS "departure_day", LATEST("Origin") AS "origin" FROM "flight-carriers" WHERE __time >= TIMESTAMP '2005-01-01 00:00:00' GROUP BY 1 LIMIT 2
Returns the following:
departure_day | origin |
---|---|
2005-11-01T00:00:00.000Z | LAS |
2005-11-02T00:00:00.000Z | LAX |
Returns the value of a numeric or string expression corresponding to the latest time value from timestampExpr
.
LATEST_BY(expr, timestampExpr, [maxBytesPerValue])
The following example returns the destination airport code associated with the latest arriving flight daily after '2005-01-01 00:00:00'
in flight-carriers
:
SELECT TIME_FLOOR(TIME_PARSE("arrivalime"), 'P1D') AS "arrival_day", LATEST_BY("Dest", TIME_PARSE("arrivalime")) AS "dest" FROM "flight-carriers" WHERE TIME_PARSE("arrivalime") >= TIMESTAMP '2005-01-01 00:00:00' GROUP BY 1 LIMIT 2
Returns the following:
arrival_day | origin |
---|---|
2005-11-01T00:00:00.000Z | MCO |
2005-11-02T00:00:00.000Z | BUF |
If you do not supply an offset
, returns the value evaluated at the row following the current row. Specify an offset number n
to return the value evaluated at n
rows following the current one; if there is no such row, returns the given default value.
LEAD(expr[, offset])
The following example returns the subsequent value for an airline in the window for flights from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", LEAD("Reporting_Airline") OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "lead" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | lead |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | UA |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | AA |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | NW |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | null |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | AA |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | UA |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | null |
Returns the minimum value from the provided expressions. For information on how Druid interprets the arguments passed into the function, see Reduction functions.
LEAST([expr1, ...])
The following example returns the minimum value between the strings apple
, orange
, and pear
. Druid interprets these arguments as STRING data type.
SELECT LEAST( 'apple', 'orange', 'pear') AS "least"
Returns the following:
least |
---|
apple |
Returns the N
leftmost characters of an expression, where N
is an integer value.
LEFT(expr, N)
The following example returns the 3
leftmost characters of the expression ABCDEFG
.
SELECT 'ABCDEFG' AS "expression", LEFT('ABCDEFG', 3) AS "leftmost_characters"
Returns the following:
expression | leftmost_characters |
---|---|
ABCDEFG | ABC |
Returns the length of the expression in UTF-16 code units.
LENGTH(expr)
The following example returns the character length of the OriginCityName
column from the flight-carriers
datasource.
SELECT "OriginCityName" AS "origin_city_name", LENGTH("OriginCityName") AS "city_name_length" FROM "flight-carriers" LIMIT 1
Returns the following:
origin_city_name | city_name_length |
---|---|
San Juan, PR | 12 |
Alias for STRING_AGG
.
LISTAGG([DISTINCT] expr, [separator, [size]])
Calculates the natural logarithm of the numeric expression.
LN(<NUMERIC>)
The following example applies the LN function to the max_temperature
column from the taxi-trips
datasource.
SELECT "max_temperature" AS "max_temperature", LN("max_temperature") AS "natural_log_max_temp" FROM "taxi-trips" LIMIT 1
Returns the following:
max_temperature | natural_log_max_temp |
---|---|
76 | 4.330733340286331 |
Calculates the base-10 logarithm of the numeric expression.
LOG10(<NUMERIC>)
The following example applies the LOG10 function to the max_temperature
column from the taxi-trips
datasource.
SELECT "max_temperature" AS "max_temperature", LOG10("max_temperature") AS "log10_max_temp" FROM "taxi-trips" LIMIT 1
Returns the following:
max_temperature | log10_max_temp |
---|---|
76 | 1.8808135922807914 |
Searches for expr
in a registered query-time lookup table named lookupName
and returns the mapped value. If expr
is null or not contained in the lookup, returns defaultValue
if supplied, otherwise returns null.
LOOKUP(expr, lookupName[, defaultValue])
The following example uses a map
type lookup table named code_to_name
, which contains the following key-value pairs:
{ "SJU": "Luis Munoz Marin International Airport", "IAD": "Dulles International Airport" }
The example uses code_to_name
to map the Origin
column from the flight-carriers
datasource to the corresponding full airport name. Returns key not found
if no matching key exists in the lookup table.
SELECT "Origin" AS "origin_airport", LOOKUP("Origin", 'code_to_name','key not found') AS "full_airport_name" FROM "flight-carriers" LIMIT 2
Returns the following:
origin_airport | full_airport_name |
---|---|
SJU | Luis Munoz Marin International Airport |
BOS | key not found |
Returns the expression in lowercase.
LOWER(expr)
The following example converts the OriginCityName
column from the flight-carriers
datasource to lowercase.
SELECT "OriginCityName" AS "origin_city", LOWER("OriginCityName") AS "lowercase" FROM "flight-carriers" LIMIT 1
Returns the following:
origin_city | lowercase |
---|---|
San Juan, PR | san juan, pr |
Returns a string of size length
from expr
. When the length of expr
is less than length
, left pads expr
with chars
, which defaults to the space character. Truncates expr
to length
if length
is shorter than the length of expr
.
LPAD(expr, length[, chars])
The following example left pads the value of OriginStateName
from the flight-carriers
datasource to return a total of 11 characters.
SELECT "OriginStateName" AS "origin_state", LPAD("OriginStateName", 11, '+') AS "add_left_padding" FROM "flight-carriers" LIMIT 3
Returns the following:
origin_state | add_left_padding |
---|---|
Puerto Rico | Puerto Rico |
Massachusetts | Massachuset |
Florida | ++++Florida |
Trims characters from the leading end of an expression. Defaults chars
to a space if none is provided.
LTRIM(expr[, chars])
The following example trims the _
characters from the leading end of the string expression.
SELECT '___abc___' AS "original_string", LTRIM('___abc___', '_') AS "trim_leading_end_of_expression"
Returns the following:
original_string | trim_leading_end_of_expression |
---|---|
___abc___ | abc___ |
Returns the maximum value of a set of values.
MAX(expr)
The following example calculates the maximum delay in minutes for an airline in flight-carriers
:
SELECT MAX("DepDelayMinutes") AS max_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
max_delay |
---|
1210 |
Converts a number of milliseconds since epoch into a timestamp.
MILLIS_TO_TIMESTAMP(millis_expr)
The following example converts 1375344877000 milliseconds from epoch into a timestamp.
SELECT MILLIS_TO_TIMESTAMP(1375344877000) AS "timestamp"
Returns the following:
timestamp |
---|
2013-08-01T08:14:37.000Z |
Returns the minimum value of a set of values.
MIN(expr)
The following example calculates the minimum delay in minutes for an airline in flight-carriers
:
SELECT MIN("DepDelayMinutes") AS min_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
min_delay |
---|
0 |
Calculates x modulo y, or the remainder of x divided by y. Where x and y are numeric expressions.
MOD(x, y)
The following calculates 78 MOD 10.
SELECT MOD(78, 10) as "modulo"
Returns the following:
modulo |
---|
8 |
Adds the expression to the end of the array.
MV_APPEND(arr1, expr)
The following example appends the string label
to the multi-value string tags
from mvd-example
:
SELECT MV_APPEND("tags", "label") AS append FROM "mvd-example" LIMIT 1
Returns the following:
append |
---|
["t1","t2","t3","row1"] |
Concatenates two arrays.
MV_CONCAT(arr1, arr2)
The following example concatenates tags
from mvd-example
to itself:
SELECT MV_CONCAT("tags", "tags") AS cat FROM "mvd-example" LIMIT 1
Returns the following:
cat |
---|
["t1","t2","t3","t1","t2","t3"] |
Returns true if the expression is in the array, false otherwise.
MV_CONTAINS(arr, expr)
The following example checks if the string t3
exists within tags
from mvd-example
:
SELECT "tags", MV_CONTAINS("tags", 't3') AS contained FROM "mvd-example"
Returns the following:
tags | contained |
---|---|
["t1","t2","t3"] | true |
["t3","t4","t5"] | true |
["t5","t6","t7"] | false |
null | false |
Filters a multi-value expression to exclude values from an array.
MV_FILTER_NONE(expr, arr)
The following example filters tags
from mvd-example
to remove values t1
or t3
, if present:
SELECT MV_FILTER_NONE("tags", ARRAY['t1', 't3']) AS exclude FROM "mvd-example" LIMIT 3
Returns the following:
exclude |
---|
t2 |
["t4", "t5"] |
["t5","t6","t7"] |
Filters a multi-value expression to include only values contained in the array.
MV_FILTER_ONLY(expr, arr)
The following example filters tags
from mvd-example
to only contain the values t1
or t3
:
SELECT MV_FILTER_ONLY("tags", ARRAY['t1', 't3']) AS filt FROM "mvd-example" LIMIT 3
Returns the following:
filt |
---|
["t1","t3"] |
t3 |
null |
Filters a multi-value expression to include only values matching the specified regular expression pattern.
MV_FILTER_REGEX(expr, pattern)
The following example filters the tags
multi-value string from the mvd-example
datasource to include only values starting with the letter t
:
SELECT MV_FILTER_REGEX("tags", '^t.*') AS regex_filtered FROM "mvd-example" LIMIT 4
Returns the following:
regex_filtered |
---|
["t1","t2","t3"] |
["t3","t4","t5"] |
["t5","t6","t7"] |
[] |
Filters multi-value expr
to include values that match pattern
.
Filters a multi-value expression to include only values that start with the specified prefix.
MV_FILTER_PREFIX(expr, prefix)
The following example filters the tags
multi-value string from the mvd-example
datasource to include only values starting with t3
:
SELECT MV_FILTER_PREFIX("tags", 't3') AS prefix_filtered FROM "mvd-example" LIMIT 4
Returns the following:
prefix_filtered |
---|
[] |
["t3"] |
[] |
[] |
Filters multi-value expr
to include values that have prefix prefix
.
Returns the length of an array expression.
MV_LENGTH(arr)
The following example returns the length of the tags
multi-value strings from mvd-example
:
SELECT MV_LENGTH("tags") AS len FROM "mvd-example" LIMIT 1
Returns the following:
len |
---|
3 |
Returns the array element at the given zero-based index.
MV_OFFSET(arr, long)
The following example returns tags
and the element at the third position of tags
in mvd-example
:
SELECT "tags", MV_OFFSET("tags", 2) AS elem FROM "mvd-example"
Returns the following:
tags | elem |
---|---|
["t1","t2","t3"] | t3 |
["t3","t4","t5"] | t5 |
["t5","t6","t7"] | t7 |
null | null |
Returns the zero-based index of the first occurrence of a given expression in the array.
MV_OFFSET_OF(arr, expr)
The following example returns tags
and the zero-based index of the string t3
from tags
in mvd-example
:
SELECT "tags", MV_OFFSET_OF("tags", 't3') AS index FROM "mvd-example"
Returns the following:
tags | index |
---|---|
["t1","t2","t3"] | 2 |
["t3","t4","t5"] | 0 |
["t5","t6","t7"] | null |
null | null |
Returns the array element at the given one-based index.
MV_ORDINAL(arr, long)
The following example returns tags
and the element at the third position of tags
in mvd-example
:
SELECT "tags", MV_ORDINAL("tags", 3) AS elem FROM "mvd-example"
Returns the following:
tags | elem |
---|---|
["t1","t2","t3"] | t3 |
["t3","t4","t5"] | t5 |
["t5","t6","t7"] | t7 |
null | null |
Returns the one-based index of the first occurrence of a given expression.
MV_ORDINAL_OF(arr, expr)
The following example returns tags
and the one-based index of the string t3
from tags
in mvd-example
:
SELECT "tags", MV_ORDINAL_OF("tags", 't3') AS index FROM "mvd-example"
Returns the following:
tags | index |
---|---|
["t1","t2","t3"] | 3 |
["t3","t4","t5"] | 1 |
["t5","t6","t7"] | null |
null | null |
Returns true if the two arrays have any elements in common, false otherwise.
MV_OVERLAP(arr1, arr2)
The following example identifies rows that contain t1
or t3
in tags
from mvd-example
:
SELECT "tags", MV_OVERLAP("tags", ARRAY['t1', 't3']) AS overlap FROM "mvd_example"
Returns the following:
tags | overlap |
---|---|
["t1","t2","t3"] | true |
["t3","t4","t5"] | true |
["t5","t6","t7"] | false |
null | false |
Adds the expression to the beginning of the array.
MV_PREPEND(expr, arr)
The following example prepends the string dimension label
to the multi-value string dimension tags
from mvd-example
:
SELECT MV_PREPEND("label", "tags") AS prepend FROM "mvd-example" LIMIT 1
Returns the following:
prepend |
---|
["row1","t1","t2","t3"] |
Returns a slice of the array from the zero-based start and end indexes.
MV_SLICE(arr, start, end)
The following example returns tags
and the second and third values of tags
from mvd-example
:
SELECT "tags", MV_SLICE(tags, 1, 3) AS slice FROM "mvd-example"
Returns the following:
tags | slice |
---|---|
["t1"","t2","t3"] | ["t2","t3"] |
["t3"","t4","t5"] | ["t4","t5"] |
["t5"","t6","t7"] | ["t6","t7"] |
null | null |
Converts a multi-value string from a VARCHAR
to a VARCHAR ARRAY
.
MV_TO_ARRAY(str)
The following example transforms the tags
column from mvd-example
to arrays:
SELECT MV_TO_ARRAY(tags) AS arr FROM "mvd-example" LIMIT 1
Returns the following:
arr |
---|
[t1, t2, t3] |
Joins all elements of the array together by the given delimiter.
MV_TO_STRING(arr, str)
The following example transforms the tags
column from mvd-example
to strings delimited by a space character:
SELECT MV_TO_STRING("tags", ' ') AS str FROM mvd-example LIMIT 1
Returns the following:
str |
---|
t1 t2 t3 |
Divides the rows within a window as evenly as possible into the number of tiles, also called buckets, and returns the value of the tile that the row falls into.
NTILE(tiles)
The following example returns the results for flights by airline from two airports on a single day divided into 3 tiles.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", NTILE(3) OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "ntile" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | lead | ntile |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | 1 |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | 1 |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | 2 |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | 3 |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | 1 |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | 2 |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | 3 |
Returns null if two values are equal, else returns the first value.
NULLIF(value1, value2)
The following example returns null if the OriginState
column from the flight-carriers
datasource is PR
.
SELECT "OriginState" AS "origin_state", NULLIF("OriginState", 'PR') AS "remove_pr" FROM "flight-carriers" LIMIT 2
Returns the following:
origin_state | remove_pr |
---|---|
PR | null |
MA | MA |
Returns value1
if value1
is not null, otherwise returns value2
.
NVL(value1, value1)
The following example replaces each null value in the Tail_Number
column of the flight-carriers
datasource with the string “No tail number.”
SELECT "Tail_Number" AS "original_column", NVL("Tail_Number", 'No tail number') AS "remove_null" FROM "flight-carriers" WHERE "OriginState" = 'CT' LIMIT 2
Returns the following:
original_column | remove_null |
---|---|
N951DL | N951DL |
null | No tail number |
Parses an expression into a COMPLEX<json>
object.
The function deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is invalid JSON or not a VARCHAR
, it returns an error.
PARSE_JSON(expr)
The following example creates a COMPLEX<json>
object gus
from a string of fields:
SELECT PARSE_JSON('{"name":"Gus","email":"gus_cat@example.com","type":"Pet"}') as gus
Returns the following:
gus |
---|
{"name":"Gus","email":"gus_cat@example.com","type":"Pet"} |
Converts a string into a long(BIGINT) with the given radix, or into DECIMAL(base 10) if a radix is not provided.
PARSE_LONG(string[, radix])
The following example converts the string representation of the binary, radix 2, number 1100
into its long (BIGINT) equivalent.
SELECT '1100' AS "binary_as_string", PARSE_LONG('1110', 2) AS "bigint_value"
Returns the following:
binary_as_string | bigint_value |
---|---|
1100 | 14 |
Returns the relative rank of the row calculated as a percentage according to the formula: RANK() OVER (window) / COUNT(1) OVER (window)
.
PERCENT_RANK()
The following example returns the percent rank within the window for flights by airline from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", PERCENT_RANK() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "pct_rank" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | pct_rank |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | 0 |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | 0.3333333333333333 |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | 0.6666666666666666 |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | 0.6666666666666666 |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | 0 |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | 0.5 |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | 0.5 |
Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index. If substring
is not found, returns 0.
POSITION(substring IN expr [FROM startingIndex])
The following example returns the one-based index of the substring PR
in the OriginCityName
column from the flight-carriers
datasource starting from index 5.
SELECT "OriginCityName" AS "origin_city", POSITION('PR' IN "OriginCityName" FROM 5) AS "index" FROM "flight-carriers" LIMIT 2
Returns the following:
origin_city | index |
---|---|
San Juan, PR | 11 |
Boston, MA | 0 |
Calculates a numerical expression raised to the specified power.
POWER(base, exponent)
The following example raises 5 to the power of 2.
SELECT POWER(5, 2) AS "power"
Returns the following:
power |
---|
25 |
Converts an angle from degrees to radians.
RADIANS(expr)
The following example converts an angle of 180
degrees to radians
SELECT RADIANS(180) AS "radians"
Returns the following:
radians |
---|
3.141592653589793 |
Returns the rank with gaps for a row within a window. For example, if two rows tie for rank 1, the next rank is 3.
RANK()
The following example returns the rank within the window for flights by airline from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", RANK() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "rank" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | rank |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | 1 |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | 2 |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | 3 |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | 3 |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | 1 |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | 2 |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | 3 |
Apply regular expression pattern
to expr
and extract the Nth capture group. If N
is unspecified or zero, returns the first substring that matches the pattern. Returns null if there is no matching pattern.
REGEXP_EXTRACT(expr, pattern[, N])
The following example uses regular expressions to find city names inside the OriginCityName
column from the flight-carriers
datasource by matching what comes before the comma.
SELECT "OriginCityName" AS "origin_city", REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_match" FROM "flight-carriers" LIMIT 1
Returns the following:
origin_city | pattern_match |
---|---|
San Juan, PR | San Juan |
Returns true
if the regular expression pattern
finds a match in expr
. Returns false
otherwise.
REGEXP_LIKE(expr, pattern)
The following example returns true
when the OriginCityName
column from flight-carriers
has a city name containing a space.
SELECT "OriginCityName" AS "origin_city", REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found" FROM "flight-carriers" LIMIT 2
Returns the following:
origin_city | pattern_found |
---|---|
San Juan, PR | true |
Boston, MA | false |
Replaces all occurrences of a regular expression in a string expression with a replacement string. Refer to capture groups in the replacement string using $group
syntax. For example: $1
or $2
.
REGEXP_REPLACE(expr, pattern, replacement)
The following example matches three consecutive words, where each word is its own capture group, and replaces the matched words with the word in the second capture group punctuated with exclamation marks.
SELECT 'foo bar baz' AS "original_string", REGEXP_REPLACE('foo bar baz', '([A-Za-z]+) ([A-Za-z]+) ([A-Za-z]+)' , '$2!') AS "modified_string"
Returns the following:
original_string | modified_string |
---|---|
foo bar baz | bar! |
Repeats the string expression N
times, where N
is an integer.
REPEAT(expr, N)
The following example returns the string expression abc
repeated 3
times.
SELECT 'abc' AS "original_string", REPEAT('abc', 3) AS "with_repetition"
Returns the following:
original_string | with_repetition |
---|---|
abc | abcabcabc |
Replaces instances of a substring with a replacement string in the given expression.
REPLACE(expr, substring, replacement)
The following example replaces instances of the substring abc
with XYZ
.
SELECT 'abc 123 abc 123' AS "original_string", REPLACE('abc 123 abc 123', 'abc', 'XYZ') AS "modified_string"
Returns the following:
original_string | modified_string |
---|---|
abc 123 abc 123 | XYZ 123 XYZ 123 |
Reverses the given expression.
REVERSE(expr)
The following example reverses the string expression abc
.
SELECT 'abc' AS "original_string", REVERSE('abc') AS "reversal"
Returns the following:
original_string | reversal |
---|---|
abc | cba |
Returns the N
rightmost characters of an expression, where N
is an integer value.
RIGHT(expr, N)
The following example returns the 3
rightmost characters of the expression ABCDEFG
.
SELECT 'ABCDEFG' AS "expression", RIGHT('ABCDEFG', 3) AS "rightmost_characters"
Returns the following:
expression | rightmost_characters |
---|---|
ABCDEFG | EFG |
Calculates the rounded value for a numerical expression.
ROUND(expr[, digits])
The following applies the ROUND function to 0 decimal points on the pickup_longitude
column from the taxi-trips
datasource.
SELECT "pickup_longitude" AS "pickup_longitude", ROUND("pickup_longitude", 0) as "rounded_pickup_longitude" FROM "taxi-trips" WHERE "pickup_longitude" IS NOT NULL LIMIT 1
Returns the following:
pickup_longitude | rounded_pickup_longitude |
---|---|
-73.9377670288086 | -74 |
Returns the number of the row within the window starting from 1.
ROW_NUMBER()
The following example returns the row number within the window for flights by airline from two airports on a single day.
SELECT FLOOR("__time" TO DAY) AS "flight_day", "Origin" AS "airport", "Reporting_Airline" as "airline", COUNT("Flight_Number_Reporting_Airline") as "num_flights", ROW_NUMBER() OVER (PARTITION BY "Origin" ORDER BY COUNT("Flight_Number_Reporting_Airline") DESC) AS "row_num" FROM "flight-carriers" WHERE FLOOR("__time" TO DAY) = '2005-11-01' AND "Origin" IN ('KOA', 'LIH') GROUP BY 1, 2, 3
Returns the following:
flight_day | airport | airline | num_flights | row_num |
---|---|---|---|---|
2005-11-01T00:00:00.000Z | KOA | HA | 11 | 1 |
2005-11-01T00:00:00.000Z | KOA | UA | 4 | 2 |
2005-11-01T00:00:00.000Z | KOA | AA | 1 | 3 |
2005-11-01T00:00:00.000Z | KOA | NW | 1 | 4 |
2005-11-01T00:00:00.000Z | LIH | HA | 15 | 1 |
2005-11-01T00:00:00.000Z | LIH | AA | 2 | 2 |
2005-11-01T00:00:00.000Z | LIH | UA | 2 | 3 |
Returns a string of size length
from expr
. When the length of expr
is less than length
, right pads expr
with chars
, which defaults to the space character. Truncates expr
to length
if length
is shorter than the length of expr
.
RPAD(expr, length[, chars])
The following example right pads the value of OriginStateName
from the flight-carriers
datasource to return a total of 11 characters.
SELECT "OriginStateName" AS "origin_state", RPAD("OriginStateName", 11, '+') AS "add_right_padding" FROM "flight-carriers" LIMIT 3
Returns the following:
origin_state | add_right_padding |
---|---|
Puerto Rico | Puerto Rico |
Massachusetts | Massachuset |
Florida | Florida++++ |
Trims characters from the trailing end of an expression. Defaults chars
to a space if none is provided.
RTRIM(expr[, chars])
The following example trims the _
characters from the trailing end of the string expression.
SELECT '___abc___' AS "original_string", RTRIM('___abc___', '_') AS "trim_end"
Returns the following:
original_string | trim_end |
---|---|
___abc___ | ___abc |
Returns x
divided by y
, guarded on division by 0.
SAFE_DIVIDE(x, y)
The following example calculates divisions of integer 78
by integer 10
.
SELECT SAFE_DIVIDE(78, 10) AS "safe_division"
Returns the following:
safe_division |
---|
7 |
Calculates the trigonometric sine of an angle expressed in radians.
SIN(expr)
The following example calculates the sine of angle PI/3
radians.
SELECT SIN(PI / 3) AS "sine"
Returns the following:
sine |
---|
0.8660254037844386 |
Calculates the square root of a numeric expression.
SQRT(<NUMERIC>)
The following example calculates the square root of 25.
SELECT SQRT(25) AS "square_root"
Returns the following:
square_root |
---|
5 |
Alias for STDDEV_SAMP
.
Requires the druid-stats
extension.
STDDEV(expr)
Calculates the population standard deviation of a set of values.
Requires the druid-stats
extension.
STDDEV_POP(expr)
The following example calculates the population standard deviation for minutes of delay for an airline in flight-carriers
:
SELECT STDDEV_POP("DepDelayMinutes") AS sd_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
sd_delay |
---|
27.083557 |
Calculates the sample standard deviation of a set of values.
Requires the druid-stats
extension.
STDDEV_SAMP(expr)
The following example calculates the sample standard deviation for minutes of delay for an airline in flight-carriers
:
SELECT STDDEV_SAMP("DepDelayMinutes") AS sd_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
sd_delay |
---|
27.083811 |
Collects all values of an expression into a single string.
STRING_AGG(expr, separator, [size])
The following example returns all the distinct airlines from flight-carriers
as a single space-delimited string:
SELECT STRING_AGG(DISTINCT "Reporting_Airline", ' ') AS "AllCarriers" FROM "flight-carriers"
Returns the following:
AllCarriers |
---|
AA AS B6 CO DH DL EV F9 FL HA HP MQ NW OH OO TZ UA US WN XE |
Returns a string formatted in the manner of Java's String.format.
STRING_FORMAT(pattern[, args...])
The following example uses Java String format to pass in Flight_Number_Reporting_Airline
and origin_airport
columns, from the flight-carriers
datasource, as arguments into the string.
SELECT "Flight_Number_Reporting_Airline" AS "flight_number", "Origin" AS "origin_airport", STRING_FORMAT('Flight No.%d departing from %s', "Flight_Number_Reporting_Airline", "Origin") AS "departure_announcement" FROM "flight-carriers" LIMIT 1
Returns the following:
flight_number | origin_airport | departure_announcement |
---|---|---|
314 | SJU | Flight No.314 departing from SJU |
Splits the string into an array of substrings using the specified delimiter. The delimiter must be a valid regular expression.
STRING_TO_ARRAY(string, delimiter)
Splits str1
into an multi-value string on the delimiter specified by str2
, which is a regular expression.
STRING_TO_MV(str1, str2)
The following example splits a street address by whitespace characters:
SELECT STRING_TO_MV('123 Rose Lane', '\s+') AS mv
Returns the following:
mv |
---|
["123","Rose","Lane"] |
Alias for LENGTH
.
STRLEN(expr)
Returns the one-based index position of a substring within an expression. If substring
is not found, returns 0.
STRPOS(expr, substring)
The following example returns the one-based index position of World
.
SELECT 'Hello World!' AS "original_string", STRPOS('Hello World!', 'World') AS "index"
Returns the following:
original_string | index |
---|---|
Hello World! | 7 |
Alias for SUBSTRING
.
SUBSTR(expr, index[, length])
Returns a substring of the expression starting at a given one-based index. If length
is omitted, extracts characters to the end of the string, otherwise returns a substring of length
characters.
SUBSTRING(expr, index[, length])
The following example extracts a substring from the string expression abcdefghi
of length 3
starting at index 4
SELECT 'abcdefghi' AS "original_string", SUBSTRING('abcdefghi', 4, 3) AS "substring"
Returns the following:
original_string | substring |
---|---|
abcdefghi | def |
Calculates the sum of a set of values.
SUM(expr)
The following example calculates the total minutes of delay for an airline in flight-carriers
:
SELECT SUM("DepDelayMinutes") AS tot_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
tot_delay |
---|
475735 |
Calculates the trigonometric tangent of an angle expressed in radians.
TAN(expr)
The following example calculates the tangent of angle PI/3
radians.
SELECT TAN(PI / 3) AS "tangent"
Returns the following:
tangent |
---|
1.7320508075688767 |
Concatenates two string expressions.
TEXTCAT(expr, expr)
The following example concatenates the OriginState
column from the flight-carriers
datasource to , USA
.
SELECT "OriginState" AS "origin_state", TEXTCAT("OriginState", ', USA') AS "concatenate_state_with_USA" FROM "flight-carriers" LIMIT 1
Returns the following:
origin_state | concatenate_state_with_USA |
---|---|
PR | PR, USA |
Returns the distinct count estimate from a Theta sketch. The expr
argument must return a Theta sketch.
THETA_SKETCH_ESTIMATE(expr)
The following example estimates the distinct number of tail numbers in the Tail_Number
column of the flight-carriers
datasource.
SELECT THETA_SKETCH_ESTIMATE( DS_THETA("Tail_Number") ) AS "estimate" FROM "flight-carriers"
Returns the following:
estimate |
---|
4667 |
Returns the distinct count estimate and error bounds from a Theta sketch. The expr
argument must return a Theta sketch. Use errorBoundsStdDev
to specify the number of standard error bound deviations.
THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(expr, errorBoundsStdDev)
The following example estimates the number of distinct tail numbers in the Tail_Number
column of the flight-carriers
datasource with error bounds at plus or minus one standard deviation.
SELECT THETA_SKETCH_ESTIMATE_WITH_ERROR_BOUNDS(DS_THETA("Tail_Number", 4096), 1) AS "estimate_with_error" FROM "flight-carriers"
Returns the following:
estimate_with_error |
---|
{"estimate":4691.201541339628,"highBound":4718.4577807143205,"lowBound":4664.093801991001,"numStdDev":1} |
Returns an intersection of Theta sketches. Each input expression must return a Theta sketch. See DataSketches Theta Sketch module for a description of optional parameters.
THETA_SKETCH_INTERSECT([size], expr0, expr1, ...)
The following example estimates the intersection of distinct tail numbers in the flight-carriers
datasource for flights originating in CA, TX, and NY.
SELECT THETA_SKETCH_ESTIMATE( THETA_SKETCH_INTERSECT( DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'TX'), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'NY') ) ) AS "estimate_intersection" FROM "flight-carriers"
Returns the following:
estimate_intersection |
---|
1701 |
Returns a set difference of Theta sketches. Each input expression must return a Theta sketch. See DataSketches Theta Sketch module for a description of optional parameters.
THETA_SKETCH_NOT([size], expr0, expr1, ...)
The following example estimates the number of distinct tail numbers in the flight-carriers
datasource for flights not originating in CA, TX, or NY.
SELECT THETA_SKETCH_ESTIMATE( THETA_SKETCH_NOT( DS_THETA("Tail_Number"), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'TX'), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'NY') ) ) AS "estimate_not" FROM "flight-carriers"
Returns the following:
estimate_not |
---|
145 |
Returns a union of Theta sketches. Each input expression must return a Theta sketch. See DataSketches Theta Sketch module for a description of optional parameters.
THETA_SKETCH_UNION([size], expr0, expr1, ...)
The following example estimates the number of distinct tail numbers that depart from CA, TX, or NY.
SELECT THETA_SKETCH_ESTIMATE( THETA_SKETCH_UNION( DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'CA'), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'TX'), DS_THETA("Tail_Number") FILTER(WHERE "OriginState" = 'NY') ) ) AS "estimate_union" FROM "flight-carriers"
Returns the following:
estimate_union |
---|
4522 |
Rounds up a timestamp to a given ISO 8601 time period. You can specify origin
to provide a reference timestamp from which to start rounding. If provided, timezone
should be a time zone name like America/Los_Angeles
or an offset like -08:00
.
TIME_CEIL(timestamp_expr, period[, origin[, timezone]])
The following example rounds up the __time
column from the taxi-trips
datasource to the nearest 45th minute in reference to the timestamp 2013-08-01 08:0:00
.
SELECT "__time" AS "original_timestamp", TIME_CEIL("__time", 'PT45M', TIMESTAMP '2013-08-01 08:0:00') AS "time_ceiling" FROM "taxi-trips" LIMIT 2
Returns the following:
original_timestamp | time_ceiling |
---|---|
2013-08-01T08:14:37.000Z | 2013-08-01T08:45:00.000Z |
2013-08-01T09:13:00.000Z | 2013-08-01T09:30:00.000Z |
Extracts the value of unit
from the timestamp and returns it as a number. If provided, timezone
should be a time zone name like America/Los_Angeles
or an offset like -08:00
.
TIME_EXTRACT(timestamp_expr[, unit[, timezone]])
The following example extracts the hour from the __time
column in the taxi-trips
datasource and offsets its timezone by -04:00
hours.
SELECT "__time" AS "original_timestamp", TIME_EXTRACT("__time", 'hour', '-04:00') AS "extract_hour" FROM "taxi-trips" LIMIT 2
Returns the following:
original_timestamp | extract_hour |
---|---|
2013-08-01T08:14:37.000Z | 4 |
2013-08-01T09:13:00.000Z | 5 |
Rounds down a timestamp to a given ISO 8601 time period. You can specify origin
to provide a reference timestamp from which to start rounding. If provided, timezone
should be a time zone name like America/Los_Angeles
or an offset like -08:00
.
TIME_FLOOR(timestamp_expr, period[, origin[, timezone]])
The following example rounds down the __time
column from the taxi-trips
datasource to the nearest 45th minute in reference to the timestamp 2013-08-01 08:0:00
.
SELECT "__time" AS "original_timestamp", TIME_FLOOR("__time", 'PT45M', TIMESTAMP '2013-08-01 08:0:00') AS "time_floor" FROM "taxi-trips" LIMIT 2
Returns the following:
original_timestamp | time_floor |
---|---|
2013-08-01T08:14:37.000Z | 2013-08-01T08:00:00.000Z |
2013-08-01T09:13:00.000Z | 2013-08-01T08:45:00.000Z |
Formats a timestamp as a string in a provided Joda DateTimeFormat pattern. If no pattern is provided, pattern
defaults to ISO 8601. If provided, timezone
should be a time zone name like America/Los_Angeles
or an offset like -08:00
.
TIME_FORMAT(timestamp_expr[, pattern[, timezone]])
The following example formats the __time
column from the flight-carriers
datasource into a string format and offsets the result's timezone by -05:00
hours.
SELECT "__time" AS "original_time", TIME_FORMAT( "__time", 'dd-MM-YYYY hh:mm aa zzz', '-05:00') AS "string" FROM "taxi-trips" LIMIT 1
Returns the following:
original_time | string |
---|---|
2013-08-01T08:14:37.000Z | 01-08-2013 03:14 AM -05:00 |
Returns true if a timestamp is contained within a particular interval. Intervals must be formatted as a string literal containing any ISO 8601 interval. The start instant of an interval is inclusive, and the end instant is exclusive.
TIME_IN_INTERVAL(timestamp_expr, interval)
The following example returns true when a timestamp in the __time
column of the taxi-trips
datasource is within a hour interval starting from 2013-08-01T08:00:00
.
SELECT "__time" AS "original_time", TIME_IN_INTERVAL("__time", '2013-08-01T08:00:00/PT1H') AS "in_interval" FROM "taxi-trips" LIMIT 2
Returns the following:
original_time | in_interval |
---|---|
2013-08-01T08:14:37.000Z | true |
2013-08-01T09:13:00.000Z | false |
Parses a string into a timestamp using a given Joda DateTimeFormat pattern. If no pattern is provided, pattern
defaults to ISO 8601. Returns NULL if string cannot be parsed. If provided, timezone
should be a time zone name like America/Los_Angeles
or an offset like -08:00
.
TIME_PARSE(string_expr[, pattern[, timezone]])
The following example parses the FlightDate
STRING column from the flight-carriers
datasource into a valid timestamp with an offset of -05:00
hours.
SELECT "FlightDate" AS "original_string", TIME_PARSE("FlightDate", 'YYYY-MM-dd', '-05:00') AS "timestamp" FROM "flight-carriers" LIMIT 1
Returns the following:
original_string | timestamp |
---|---|
2005-11-01 | 2005-11-01T05:00:00.000Z |
Shifts a timestamp by a given number of time units. The period
parameter can be any ISO 8601 period. The step
parameter can be negative. If provided, timezone
should be a time zone name like America/Los_Angeles
or an offset like -08:00
.
TIME_SHIFT(timestamp_expr, period, step[, timezone])
The following example shifts the __time
column from the taxi-trips
datasource back by 24 hours.
SELECT "__time" AS "original_timestamp", TIME_SHIFT("__time", 'PT1H', -24) AS "shift_back" FROM "taxi-trips" LIMIT 1
Returns the following:
original_timestamp | shift_back |
---|---|
2013-08-01T08:14:37.000Z | 2013-07-31T08:14:37.000Z |
Returns the number of milliseconds since epoch for the given timestamp.
TIMESTAMP_TO_MILLIS(timestamp_expr)
The following example converts the __time
column from the taxi-trips
datasource into milliseconds since epoch.
SELECT "__time" AS "original_time", TIMESTAMP_TO_MILLIS("__time") AS "miliseconds" FROM "taxi-trips" LIMIT 1
Returns the following:
original_time | miliseconds |
---|---|
2013-08-01T08:14:37.000Z | 1375344877000 |
Add a unit
of time multiplied by count
to timestamp
.
TIMESTAMPADD(unit, count, timestamp)
The following example adds five months to the timestamp 2000-01-01 00:00:00
.
SELECT TIMESTAMP '2000-01-01 00:00:00' AS "original_time", TIMESTAMPADD (MONTH, 5, TIMESTAMP '2000-01-01 00:00:00') AS "new_time"
Returns the following:
original_time | new_time |
---|---|
2000-01-01T00:00:00.000Z | 2000-06-01T00:00:00.000Z |
Returns the difference between two timestamps in a given unit.
TIMESTAMPDIFF(unit, timestamp1, timestamp2)
The following example calculates the taxi trip length in minutes by subtracting the __time
column from the dropoff_datetime
column in the taxi-trips
datasource.
SELECT "__time" AS "pickup_time", "dropoff_datetime" AS "dropoff_time", TIMESTAMPDIFF (MINUTE, "__time", TIME_PARSE("dropoff_datetime")) AS "trip_length" FROM "taxi-trips" LIMIT 1
Returns the following:
pickup_time | dropoff_time | trip_length |
---|---|---|
2013-08-01T08:14:37.000Z | 2013-08-01 09:09:06 | 54 |
Serializes an expression into a JSON string.
TO_JSON_STRING(expr)
The following example writes the distinct column names in the events
nested column to a JSON string:
SELECT TO_JSON_STRING(ARRAY_CONCAT_AGG(DISTINCT JSON_KEYS(event, '$.'))) as json_string FROM "kttm_nested"
Returns the following:
json_string |
---|
["error","layer","percentage","saveNumber","type","url","userAgent"] |
Trims the leading and/or trailing characters of an expression. Defaults chars
to a space if none is provided. Defaults to BOTH
if no directional argument is provided.
TRIM([BOTH|LEADING|TRAILING] [chars FROM] expr)
The following example trims _
characters from both ends of the string expression.
SELECT '___abc___' AS "original_string", TRIM( BOTH '_' FROM '___abc___') AS "trim_expression"
Returns the following:
original_string | trim_expression |
---|---|
___abc___ | abc |
Alias for TRUNCATE
.
TRUNC(expr[, digits])
Truncates a numerical expression to a specific number of decimal digits.
TRUNCATE(expr[, digits])
The following applies the TRUNCATE function to 1 decimal place on the pickup_longitude
column from the taxi-trips
datasource.
SELECT "pickup_longitude" as "pickup_longitude", TRUNCATE("pickup_longitude", 1) as "truncate_pickup_longitude" FROM "taxi-trips" WHERE "pickup_longitude" IS NOT NULL LIMIT 1
Returns the following:
pickup_longitude | truncate_pickup_longitude |
---|---|
-73.9377670288086 | -73.9 |
Parses an expression into a COMPLEX<json>
object.
This function deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is invalid JSON or not a VARCHAR
, it returns a NULL
value.
You can use this function instead of PARSE_JSON to insert a null value when processing invalid data, instead of producing an error.
TRY_PARSE_JSON(expr)
The following example creates a COMPLEX<json>
object gus
from a string of fields:
SELECT TRY_PARSE_JSON('{"name":"Gus","email":"gus_cat@example.com","type":"Pet"}') as gus
Returns the following:
gus |
---|
{"name":"Gus","email":"gus_cat@example.com","type":"Pet"} |
The following example contains invalid data x:x
:
SELECT TRY_PARSE_JSON('{"name":"Gus","email":"gus_cat@example.com","type":"Pet",x:x}') as gus
Returns the following:
gus |
---|
null |
Returns the expression in uppercase.
UPPER(expr)
The following example converts the OriginCityName
column from the flight-carriers
datasource to uppercase.
SELECT "OriginCityName" AS "origin_city", UPPER("OriginCityName") AS "uppercase" FROM "flight-carriers" LIMIT 1
Returns the following:
origin_city | uppercase |
---|---|
San Juan, PR | SAN JUAN, PR |
Calculates the population variance of a set of values.
Requires the druid-stats
extension.
VAR_POP(expr)
The following example calculates the population variance for minutes of delay by a particular airlines in flight-carriers
:
SELECT VAR_POP("DepDelayMinutes") AS varpop_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
varpop_delay |
---|
733.51908 |
Calculates the sample variance of a set of values.
Requires the druid-stats
extension.
VAR_SAMP(expr)
The following example calculates the sample variance for minutes of delay for an airline in flight-carriers
:
SELECT VAR_SAMP("DepDelayMinutes") AS varsamp_delay FROM "flight-carriers" WHERE "Reporting_Airline" = 'AA'
Returns the following:
varsamp_delay |
---|
733.53286 |
Alias for VAR_SAMP
.
Requires the druid-stats
extension.
VARIANCE(expr)