Aggregations are specifications of processing over metrics available in Druid. Available aggregations are:
count
computes the row count that match the filters
{ "type" : "count", "name" : <output_name> }
longSum
aggregatorcomputes the sum of values as a 64-bit, signed integer
{ "type" : "longSum", "name" : <output_name>, "fieldName" : <metric_name> }
name
– output name for the summed value fieldName
– name of the metric column to sum over
doubleSum
aggregatorComputes the sum of values as 64-bit floating point value. Similar to longSum
{ "type" : "doubleSum", "name" : <output_name>, "fieldName" : <metric_name> }
min
aggregatormin
computes the minimum metric value
{ "type" : "min", "name" : <output_name>, "fieldName" : <metric_name> }
max
aggregatormax
computes the maximum metric value
{ "type" : "max", "name" : <output_name>, "fieldName" : <metric_name> }
Computes an arbitrary JavaScript function over a set of columns (both metrics and dimensions).
All JavaScript functions must return numerical values.
{ "type": "javascript", "name": "<output_name>", "fieldNames" : [ <column1>, <column2>, ... ], "fnAggregate" : "function(current, column1, column2, ...) { <updates partial aggregate (current) based on the current row values> return <updated partial aggregate> }", "fnCombine" : "function(partialA, partialB) { return <combined partial results>; }", "fnReset" : "function() { return <initial value>; }" }
Example
{ "type": "javascript", "name": "sum(log(x)/y) + 10", "fieldNames": ["x", "y"], "fnAggregate" : "function(current, a, b) { return current + (Math.log(a) * b); }", "fnCombine" : "function(partialA, partialB) { return partialA + partialB; }", "fnReset" : "function() { return 10; }" }
Computes the cardinality of a set of Druid dimensions, using HyperLogLog to estimate the cardinality.
{ "type": "cardinality", "name": "<output_name>", "fieldNames": [ <dimension1>, <dimension2>, ... ], "byRow": <false | true> # (optional, defaults to false) }
When setting byRow
to false
(the default) it computes the cardinality of the set composed of the union of all dimension values for all the given dimensions.
SELECT COUNT(DISCTINCT(dimension)) FROM <datasource>
SELECT COUNT(DISTINCT(value)) FROM ( SELECT dim_1 as value FROM <datasource> UNION SELECT dim_2 as value FROM <datasource> UNION SELECT dim_3 as value FROM <datasource> )
When setting byRow
to true
it computes the cardinality by row, i.e. the cardinality of distinct dimension combinations This is equivalent to something akin to
SELECT COUNT(*) FROM ( SELECT DIM1, DIM2, DIM3 FROM <datasource> GROUP BY DIM1, DIM2, DIM3
Example
Determine the number of distinct categories items are assigned to.
{ "type": "cardinality", "name": "distinct_values", "fieldNames": [ "main_category", "secondary_category" ] }
Determine the number of distinct are assigned to.
{ "type": "cardinality", "name": "distinct_values", "fieldNames": [ "", "secondary_category" ], "byRow" : true }
Uses HyperLogLog to compute the estimated cardinality of a dimension that has been aggregated as a “hyperUnique” metric at indexing time.
{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }