To use this Apache Druid extension, include the druid-distinctcount
in the extensions load list.
Additionally, follow these steps:
There are some limitations, when used with groupBy, the groupBy keys' numbers should not exceed maxIntermediateRows in every segment. If exceeded the result will be wrong. When used with topN, numValuesPerPass should not be too big. If too big the distinctCount will use a lot of memory and might cause the JVM to go our of memory.
Example:
{ "queryType": "timeseries", "dataSource": "sample_datasource", "granularity": "day", "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ], "intervals": [ "2016-03-01T00:00:00.000/2013-03-20T00:00:00.000" ] }
{ "queryType": "topN", "dataSource": "sample_datasource", "dimension": "sample_dim", "threshold": 5, "metric": "uv", "granularity": "all", "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ], "intervals": [ "2016-03-06T00:00:00/2016-03-06T23:59:59" ] }
{ "queryType": "groupBy", "dataSource": "sample_datasource", "dimensions": ["sample_dim"], "granularity": "all", "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ], "intervals": [ "2016-03-06T00:00:00/2016-03-06T23:59:59" ] }