To use this Apache Druid extension, make sure to include the druid-distinctcount
extension.
Additionally, follow these steps:
There are some limitations, when used with groupBy, the groupBy keys' numbers should not exceed maxIntermediateRows in every segment. If exceeded the result will be wrong. When used with topN, numValuesPerPass should not be too big. If too big the distinctCount will use a lot of memory and might cause the JVM to go our of memory.
Example:
{ "queryType": "timeseries", "dataSource": "sample_datasource", "granularity": "day", "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ], "intervals": [ "2016-03-01T00:00:00.000/2013-03-20T00:00:00.000" ] }
{ "queryType": "topN", "dataSource": "sample_datasource", "dimension": "sample_dim", "threshold": 5, "metric": "uv", "granularity": "all", "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ], "intervals": [ "2016-03-06T00:00:00/2016-03-06T23:59:59" ] }
{ "queryType": "groupBy", "dataSource": "sample_datasource", "dimensions": ["sample_dim"], "granularity": "all", "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ], "intervals": [ "2016-03-06T00:00:00/2016-03-06T23:59:59" ] }