docs/content/GroupByQuery.md - druid - Git at Google

 ---
 layout: doc_page
 ---
 # groupBy Queries
 These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query. Note: If you only want to do straight aggregates for some time range, we highly recommend using [TimeseriesQueries](TimeseriesQuery.html) instead. The performance will be substantially better.
 An example groupBy query object is shown below:

 ``` json
 {
   "queryType": "groupBy",
   "dataSource": "sample_datasource",
   "granularity": "day",
   "dimensions": ["dim1", "dim2"],
   "limitSpec": { "type": "default", "limit": 5000, "columns": ["dim1", "metric1"] },
   "filter": {
     "type": "and",
     "fields": [
       { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
       { "type": "or",
         "fields": [
           { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
           { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
         ]
       }
     ]
   },
   "aggregations": [
     { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
     { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
   ],
   "postAggregations": [
     { "type": "arithmetic",
       "name": "sample_divide",
       "fn": "/",
       "fields": [
         { "type": "fieldAccess", "name": "sample_name1", "fieldName": "sample_fieldName1" },
         { "type": "fieldAccess", "name": "sample_name2", "fieldName": "sample_fieldName2" }
       ]
     }
   ],
   "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
   "having": { "type": "greaterThan", "aggregation": "sample_name1", "value": 0 }
 }
 ```

 There are 11 main parts to a groupBy query:

 |property|description|required?|
 |--------|-----------|---------|
 |queryType|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
 |dataSource|A String defining the data source to query, very similar to a table in a relational database, or a [DataSource](DataSource.html) structure.|yes|
 |dimensions|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](DimensionSpecs) for ways to extract dimensions. |yes|
 |limitSpec|See [LimitSpec](LimitSpec.html).|no|
 |having|See [Having](Having.html).|no|
 |granularity|Defines the granularity of the query. See [Granularities](Granularities.html)|yes|
 |filter|See [Filters](Filters.html)|no|
 |aggregations|See [Aggregations](Aggregations.html)|yes|
 |postAggregations|See [Post Aggregations](Post-aggregations.html)|no|
 |intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
 |context|An additional JSON Object which can be used to specify certain flags.|no|

 To pull it all together, the above query would return *n\*m* data points, up to a maximum of 5000 points, where n is the cardinality of the "dim1" dimension, m is the cardinality of the "dim2" dimension, each day between 2012-01-01 and 2012-01-03, from the "sample_datasource" table. Each data point contains the (long) sum of sample_fieldName1 if the value of the data point is greater than 0, the (double) sum of sample_fieldName2 and the (double) the result of sample_fieldName1 divided by sample_fieldName2 for the filter set for a particular grouping of "dim1" and "dim2". The output looks like this:

 ```json
 [
   {
     "version" : "v1",
     "timestamp" : "2012-01-01T00:00:00.000Z",
     "event" : {
       "dim1" : <some_dim_value_one>,
       "dim2" : <some_dim_value_two>,
       "sample_name1" : <some_sample_name_value_one>,
       "sample_name2" :<some_sample_name_value_two>,
       "sample_divide" : <some_sample_divide_value>
     }
   },
   {
     "version" : "v1",
     "timestamp" : "2012-01-01T00:00:00.000Z",
     "event" : {
       "dim1" : <some_other_dim_value_one>,
       "dim2" : <some_other_dim_value_two>,
       "sample_name1" : <some_other_sample_name_value_one>,
       "sample_name2" :<some_other_sample_name_value_two>,
       "sample_divide" : <some_other_sample_divide_value>
     }
   },
 ...
 ]
 ```
	---
	layout: doc_page
	---
	# groupBy Queries
	These types of queries take a groupBy query object and return an array of JSON objects where each object represents a grouping asked for by the query. Note: If you only want to do straight aggregates for some time range, we highly recommend using [TimeseriesQueries](TimeseriesQuery.html) instead. The performance will be substantially better.
	An example groupBy query object is shown below:

	``` json
	{
	"queryType": "groupBy",
	"dataSource": "sample_datasource",
	"granularity": "day",
	"dimensions": ["dim1", "dim2"],
	"limitSpec": { "type": "default", "limit": 5000, "columns": ["dim1", "metric1"] },
	"filter": {
	"type": "and",
	"fields": [
	{ "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
	{ "type": "or",
	"fields": [
	{ "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
	{ "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
	]
	}
	]
	},
	"aggregations": [
	{ "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
	{ "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
	],
	"postAggregations": [
	{ "type": "arithmetic",
	"name": "sample_divide",
	"fn": "/",
	"fields": [
	{ "type": "fieldAccess", "name": "sample_name1", "fieldName": "sample_fieldName1" },
	{ "type": "fieldAccess", "name": "sample_name2", "fieldName": "sample_fieldName2" }
	]
	}
	],
	"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
	"having": { "type": "greaterThan", "aggregation": "sample_name1", "value": 0 }
	}
	```

	There are 11 main parts to a groupBy query:

	\|property\|description\|required?\|
	\|--------\|-----------\|---------\|
	\|queryType\|This String should always be "groupBy"; this is the first thing Druid looks at to figure out how to interpret the query\|yes\|
	\|dataSource\|A String defining the data source to query, very similar to a table in a relational database, or a [DataSource](DataSource.html) structure.\|yes\|
	\|dimensions\|A JSON list of dimensions to do the groupBy over; or see [DimensionSpec](DimensionSpecs) for ways to extract dimensions. \|yes\|
	\|limitSpec\|See [LimitSpec](LimitSpec.html).\|no\|
	\|having\|See [Having](Having.html).\|no\|
	\|granularity\|Defines the granularity of the query. See [Granularities](Granularities.html)\|yes\|
	\|filter\|See [Filters](Filters.html)\|no\|
	\|aggregations\|See [Aggregations](Aggregations.html)\|yes\|
	\|postAggregations\|See [Post Aggregations](Post-aggregations.html)\|no\|
	\|intervals\|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.\|yes\|
	\|context\|An additional JSON Object which can be used to specify certain flags.\|no\|

	To pull it all together, the above query would return n\m* data points, up to a maximum of 5000 points, where n is the cardinality of the "dim1" dimension, m is the cardinality of the "dim2" dimension, each day between 2012-01-01 and 2012-01-03, from the "sample_datasource" table. Each data point contains the (long) sum of sample_fieldName1 if the value of the data point is greater than 0, the (double) sum of sample_fieldName2 and the (double) the result of sample_fieldName1 divided by sample_fieldName2 for the filter set for a particular grouping of "dim1" and "dim2". The output looks like this:

	```json
	[
	{
	"version" : "v1",
	"timestamp" : "2012-01-01T00:00:00.000Z",
	"event" : {
	"dim1" : <some_dim_value_one>,
	"dim2" : <some_dim_value_two>,
	"sample_name1" : <some_sample_name_value_one>,
	"sample_name2" :<some_sample_name_value_two>,
	"sample_divide" : <some_sample_divide_value>
	}
	},
	{
	"version" : "v1",
	"timestamp" : "2012-01-01T00:00:00.000Z",
	"event" : {
	"dim1" : <some_other_dim_value_one>,
	"dim2" : <some_other_dim_value_two>,
	"sample_name1" : <some_other_sample_name_value_one>,
	"sample_name2" :<some_other_sample_name_value_two>,
	"sample_divide" : <some_other_sample_divide_value>
	}
	},
	...
	]
	```