docs/content/Tutorial:-All-About-Queries.md - druid - Git at Google

 ---
 layout: doc_page
 ---

 # Tutorial: All About Queries
 Hello! This tutorial is meant to provide a more in-depth look into Druid queries. The tutorial is somewhat incomplete right now but we hope to add more content to it in the near future.

 Setup
 -----

 Before we start digging into how to query Druid, make sure you've gone through the other tutorials and are comfortable with spinning up a local cluster and loading data into Druid.

 #### Booting a Druid Cluster

 Let's start up a simple Druid cluster so we can query all the things.

 Note: If Zookeeper and MySQL aren't running, you'll have to start them again as described in [The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html).

 To start a Coordinator node:

 ```bash
 java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/coordinator io.druid.cli.Main server coordinator
 ```

 To start a Historical node:

 ```bash
 java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/historical io.druid.cli.Main server historical
 ```

 To start a Broker node:

 ```bash
 java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker io.druid.cli.Main server broker
 ```

 Querying Your Data
 ------------------

 Make sure you've completed [Loading Your Data](Loading-Your-Data-Part-1.html) so we have some data to query. Having done that, it's time to query our data! For a complete specification of queries, see [Querying](Querying.html).

 #### Construct a Query
 ```json
 {
     "queryType": "groupBy",
     "dataSource": "wikipedia",
     "granularity": "all",
     "dimensions": [],
     "aggregations": [
         {"type": "count", "name": "rows"},
         {"type": "longSum", "name": "edit_count", "fieldName": "count"},
         {"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
     ],
     "intervals": ["2010-01-01T00:00/2020-01-01T00"]
 }
 ```

 #### Query That Data
 Run the query against your broker:

 ```bash
 curl -X POST "http://localhost:8080/druid/v2/?pretty" -H 'Content-type: application/json' -d @query.body
 ```

 And get:

 ```json
 [ {
   "version" : "v1",
   "timestamp" : "2010-01-01T00:00:00.000Z",
   "event" : {
     "chars_added" : 1545.0,
     "edit_count" : 5,
     "rows" : 5
   }
 } ]
 ```

 This result tells us that our query has 5 edits, and we have 5 rows of data as well. In those 5 edits, we have 1545 characters added.

 #### What can I query for?

 How are we to know what queries we can run? Although [Querying](Querying.html) is a helpful index, to get a handle on querying our data we need to look at our ingestion schema. There are a few particular fields we care about in the ingestion schema. All of these fields should in present in the real-time ingestion schema and the batch ingestion schema.

 Datasource:

 ```json
 "dataSource":"wikipedia"
 ```

 Our dataSource tells us the name of the relation/table, or 'source of data'. What we decide to name our data source must match the data source we are going to be querying.

 Granularity:

 ```json
 "indexGranularity": "none",
 ```

 Druid will roll up data at ingestion time unless the index/rollup granularity is specified as "none". Your query granularity cannot be lower than your index granularity.

 Aggregators:

 ```json
 "aggregators" : [{
    "type" : "count",
    "name" : "count"
   }, {
    "type" : "doubleSum",
    "name" : "added",
    "fieldName" : "added"
   }, {
    "type" : "doubleSum",
    "name" : "deleted",
    "fieldName" : "deleted"
   }, {
    "type" : "doubleSum",
    "name" : "delta",
    "fieldName" : "delta"
 }]
 ```

 The [Aggregations](Aggregations.html) specified at ingestion time correlated directly to the metrics that can be queried.

 Dimensions:

 ```json
 "dimensions" : ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"]
 ```

 These specify the dimensions that we can filter our data on. If we added a dimension to our groupBy query, we get:

 ```json
 {
     "queryType": "groupBy",
     "dataSource": "wikipedia",
     "granularity": "all",
     "dimensions": ["namespace"],
     "aggregations": [
         {"type": "longSum", "name": "edit_count", "fieldName": "count"},
         {"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
     ],
     "intervals": ["2010-01-01T00:00/2020-01-01T00"]
 }
 ```

 Which gets us data grouped over the namespace dimension in return!

 ```json
 [ {
   "version" : "v1",
   "timestamp" : "2010-01-01T00:00:00.000Z",
   "event" : {
     "chars_added" : 180.0,
     "edit_count" : 2,
     "namespace" : "article"
   }
 }, {
   "version" : "v1",
   "timestamp" : "2010-01-01T00:00:00.000Z",
   "event" : {
     "chars_added" : 1365.0,
     "edit_count" : 3,
     "namespace" : "wikipedia"
   }
 } ]
 ```

 Additionally,, we can also filter our query to narrow down our metric values:

 ```json
 {
     "queryType": "groupBy",
     "dataSource": "wikipedia",
     "granularity": "all",
     "filter": { "type": "selector", "dimension": "namespace", "value": "article" },
     "aggregations": [
         {"type": "longSum", "name": "edit_count", "fieldName": "count"},
         {"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
     ],
     "intervals": ["2010-01-01T00:00/2020-01-01T00"]
 }
 ```

 Which gets us metrics about only those edits where the namespace is 'article':

 ```json
 [ {
   "version" : "v1",
   "timestamp" : "2010-01-01T00:00:00.000Z",
   "event" : {
     "chars_added" : 180.0,
     "edit_count" : 2
   }
 } ]
 ```

 Check out [Filters](Filters.html) for more information.

 What Types of Queries to Use
 ----------------------------

 The types of query you should use depends on your use case. [TimeBoundary queries](TimeBoundaryQuery.html) are useful to understand the range of your data. [Timeseries queries](TimeseriesQuery.html) are useful for aggregates and filters over a time range, and offer significant speed improvements over [GroupBy queries](GroupByQuery.html). To find the top values for a given dimension, [TopN queries](TopNQuery.html) should be used over group by queries as well.


 ## Learn More ##

 You can learn more about querying at [Querying](Querying.html)! If you are ready to evaluate Druid more in depth, check out [Booting a production cluster](Booting-a-production-cluster.html)!
	---
	layout: doc_page
	---

	# Tutorial: All About Queries
	Hello! This tutorial is meant to provide a more in-depth look into Druid queries. The tutorial is somewhat incomplete right now but we hope to add more content to it in the near future.

	Setup
	-----

	Before we start digging into how to query Druid, make sure you've gone through the other tutorials and are comfortable with spinning up a local cluster and loading data into Druid.

	#### Booting a Druid Cluster

	Let's start up a simple Druid cluster so we can query all the things.

	Note: If Zookeeper and MySQL aren't running, you'll have to start them again as described in [The Druid Cluster](Tutorial%3A-The-Druid-Cluster.html).

	To start a Coordinator node:

	```bash
	java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/coordinator io.druid.cli.Main server coordinator
	```

	To start a Historical node:

	```bash
	java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/historical io.druid.cli.Main server historical
	```

	To start a Broker node:

	```bash
	java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/broker io.druid.cli.Main server broker
	```

	Querying Your Data
	------------------

	Make sure you've completed [Loading Your Data](Loading-Your-Data-Part-1.html) so we have some data to query. Having done that, it's time to query our data! For a complete specification of queries, see [Querying](Querying.html).

	#### Construct a Query
	```json
	{
	"queryType": "groupBy",
	"dataSource": "wikipedia",
	"granularity": "all",
	"dimensions": [],
	"aggregations": [
	{"type": "count", "name": "rows"},
	{"type": "longSum", "name": "edit_count", "fieldName": "count"},
	{"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
	],
	"intervals": ["2010-01-01T00:00/2020-01-01T00"]
	}
	```

	#### Query That Data
	Run the query against your broker:

	```bash
	curl -X POST "http://localhost:8080/druid/v2/?pretty" -H 'Content-type: application/json' -d @query.body
	```

	And get:

	```json
	[ {
	"version" : "v1",
	"timestamp" : "2010-01-01T00:00:00.000Z",
	"event" : {
	"chars_added" : 1545.0,
	"edit_count" : 5,
	"rows" : 5
	}
	} ]
	```

	This result tells us that our query has 5 edits, and we have 5 rows of data as well. In those 5 edits, we have 1545 characters added.

	#### What can I query for?

	How are we to know what queries we can run? Although [Querying](Querying.html) is a helpful index, to get a handle on querying our data we need to look at our ingestion schema. There are a few particular fields we care about in the ingestion schema. All of these fields should in present in the real-time ingestion schema and the batch ingestion schema.

	Datasource:

	```json
	"dataSource":"wikipedia"
	```

	Our dataSource tells us the name of the relation/table, or 'source of data'. What we decide to name our data source must match the data source we are going to be querying.

	Granularity:

	```json
	"indexGranularity": "none",
	```

	Druid will roll up data at ingestion time unless the index/rollup granularity is specified as "none". Your query granularity cannot be lower than your index granularity.

	Aggregators:

	```json
	"aggregators" : [{
	"type" : "count",
	"name" : "count"
	}, {
	"type" : "doubleSum",
	"name" : "added",
	"fieldName" : "added"
	}, {
	"type" : "doubleSum",
	"name" : "deleted",
	"fieldName" : "deleted"
	}, {
	"type" : "doubleSum",
	"name" : "delta",
	"fieldName" : "delta"
	}]
	```

	The [Aggregations](Aggregations.html) specified at ingestion time correlated directly to the metrics that can be queried.

	Dimensions:

	```json
	"dimensions" : ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"]
	```

	These specify the dimensions that we can filter our data on. If we added a dimension to our groupBy query, we get:

	```json
	{
	"queryType": "groupBy",
	"dataSource": "wikipedia",
	"granularity": "all",
	"dimensions": ["namespace"],
	"aggregations": [
	{"type": "longSum", "name": "edit_count", "fieldName": "count"},
	{"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
	],
	"intervals": ["2010-01-01T00:00/2020-01-01T00"]
	}
	```

	Which gets us data grouped over the namespace dimension in return!

	```json
	[ {
	"version" : "v1",
	"timestamp" : "2010-01-01T00:00:00.000Z",
	"event" : {
	"chars_added" : 180.0,
	"edit_count" : 2,
	"namespace" : "article"
	}
	}, {
	"version" : "v1",
	"timestamp" : "2010-01-01T00:00:00.000Z",
	"event" : {
	"chars_added" : 1365.0,
	"edit_count" : 3,
	"namespace" : "wikipedia"
	}
	} ]
	```

	Additionally,, we can also filter our query to narrow down our metric values:

	```json
	{
	"queryType": "groupBy",
	"dataSource": "wikipedia",
	"granularity": "all",
	"filter": { "type": "selector", "dimension": "namespace", "value": "article" },
	"aggregations": [
	{"type": "longSum", "name": "edit_count", "fieldName": "count"},
	{"type": "doubleSum", "name": "chars_added", "fieldName": "added"}
	],
	"intervals": ["2010-01-01T00:00/2020-01-01T00"]
	}
	```

	Which gets us metrics about only those edits where the namespace is 'article':

	```json
	[ {
	"version" : "v1",
	"timestamp" : "2010-01-01T00:00:00.000Z",
	"event" : {
	"chars_added" : 180.0,
	"edit_count" : 2
	}
	} ]
	```

	Check out [Filters](Filters.html) for more information.

	What Types of Queries to Use
	----------------------------

	The types of query you should use depends on your use case. [TimeBoundary queries](TimeBoundaryQuery.html) are useful to understand the range of your data. [Timeseries queries](TimeseriesQuery.html) are useful for aggregates and filters over a time range, and offer significant speed improvements over [GroupBy queries](GroupByQuery.html). To find the top values for a given dimension, [TopN queries](TopNQuery.html) should be used over group by queries as well.


	## Learn More ##

	You can learn more about querying at [Querying](Querying.html)! If you are ready to evaluate Druid more in depth, check out [Booting a production cluster](Booting-a-production-cluster.html)!