layout: doc_page title: “Tutorial: Querying data”

Tutorial: Querying data

This tutorial will demonstrate how to query data in Druid, with examples for Druid's native query format and Druid SQL.

The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data.

Native JSON queries

Druid's native query format is expressed in JSON. We have included a sample native TopN query under quickstart/tutorial/wikipedia-top-pages.json:

{
  "queryType" : "topN",
  "dataSource" : "wikipedia",
  "intervals" : ["2015-09-12/2015-09-13"],
  "granularity" : "all",
  "dimension" : "page",
  "metric" : "count",
  "threshold" : 10,
  "aggregations" : [
    {
      "type" : "count",
      "name" : "count"
    }
  ]
}

This query retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12.

Let's submit this query to the Druid broker:

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8082/druid/v2?pretty

You should see the following query results:

[ {
  "timestamp" : "2015-09-12T00:46:58.771Z",
  "result" : [ {
    "count" : 33,
    "page" : "Wikipedia:Vandalismusmeldung"
  }, {
    "count" : 28,
    "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
  }, {
    "count" : 27,
    "page" : "Jeremy Corbyn"
  }, {
    "count" : 21,
    "page" : "Wikipedia:Administrators' noticeboard/Incidents"
  }, {
    "count" : 20,
    "page" : "Flavia Pennetta"
  }, {
    "count" : 18,
    "page" : "Total Drama Presents: The Ridonculous Race"
  }, {
    "count" : 18,
    "page" : "User talk:Dudeperson176123"
  }, {
    "count" : 18,
    "page" : "Wikipédia:Le Bistro/12 septembre 2015"
  }, {
    "count" : 17,
    "page" : "Wikipedia:In the news/Candidates"
  }, {
    "count" : 17,
    "page" : "Wikipedia:Requests for page protection"
  } ]
} ]

Druid SQL queries

Druid also supports a dialect of SQL for querying. Let's run a SQL query that is equivalent to the native JSON query shown above:

SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;

The SQL queries are submitted as JSON over HTTP.

TopN query example

The tutorial package includes an example file that contains the SQL query shown above at quickstart/tutorial/wikipedia-top-pages-sql.json. Let's submit that query to the Druid broker:

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql

The following results should be returned:

[
  {
    "page": "Wikipedia:Vandalismusmeldung",
    "Edits": 33
  },
  {
    "page": "User:Cyde/List of candidates for speedy deletion/Subpage",
    "Edits": 28
  },
  {
    "page": "Jeremy Corbyn",
    "Edits": 27
  },
  {
    "page": "Wikipedia:Administrators' noticeboard/Incidents",
    "Edits": 21
  },
  {
    "page": "Flavia Pennetta",
    "Edits": 20
  },
  {
    "page": "Total Drama Presents: The Ridonculous Race",
    "Edits": 18
  },
  {
    "page": "User talk:Dudeperson176123",
    "Edits": 18
  },
  {
    "page": "Wikipédia:Le Bistro/12 septembre 2015",
    "Edits": 18
  },
  {
    "page": "Wikipedia:In the news/Candidates",
    "Edits": 17
  },
  {
    "page": "Wikipedia:Requests for page protection",
    "Edits": 17
  }
]

dsql client

For convenience, the Druid package includes a SQL command-line client, located at bin/dsql from the Druid package root.

Let's now run bin/dsql; you should see the following prompt:

Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help.
dsql> 

To submit the query, paste it to the dsql prompt and press enter:

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
 page                                                      Edits 
├──────────────────────────────────────────────────────────┼───────┤
 Wikipedia:Vandalismusmeldung                                 33 
 User:Cyde/List of candidates for speedy deletion/Subpage     28 
 Jeremy Corbyn                                                27 
 Wikipedia:Administrators' noticeboard/Incidents          │    21 │
│ Flavia Pennetta                                          │    20 │
│ Total Drama Presents: The Ridonculous Race               │    18 │
│ User talk:Dudeperson176123                               │    18 │
│ Wikipédia:Le Bistro/12 septembre 2015                    │    18 │
│ Wikipedia:In the news/Candidates                         │    17 │
│ Wikipedia:Requests for page protection                   │    17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.

Additional Druid SQL queries

Timeseries

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);

dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
┌──────────────────────────┬──────────────┐
 HourTime                  LinesDeleted 
├──────────────────────────┼──────────────┤
 2015-09-12T00:00:00.000Z          1761 
 2015-09-12T01:00:00.000Z         16208 
 2015-09-12T02:00:00.000Z         14543 
 2015-09-12T03:00:00.000Z         13101 
 2015-09-12T04:00:00.000Z         12040 
 2015-09-12T05:00:00.000Z          6399 
 2015-09-12T06:00:00.000Z          9036 
 2015-09-12T07:00:00.000Z         11409 
 2015-09-12T08:00:00.000Z         11616 
 2015-09-12T09:00:00.000Z         17509 
 2015-09-12T10:00:00.000Z         19406 
 2015-09-12T11:00:00.000Z         16284 
 2015-09-12T12:00:00.000Z         18672 
 2015-09-12T13:00:00.000Z         30520 
 2015-09-12T14:00:00.000Z         18025 
 2015-09-12T15:00:00.000Z         26399 
 2015-09-12T16:00:00.000Z         24759 
 2015-09-12T17:00:00.000Z         19634 
 2015-09-12T18:00:00.000Z         17345 
 2015-09-12T19:00:00.000Z         19305 
 2015-09-12T20:00:00.000Z         22265 
 2015-09-12T21:00:00.000Z         16394 
 2015-09-12T22:00:00.000Z         16379 
 2015-09-12T23:00:00.000Z         15289 
└──────────────────────────┴──────────────┘
Retrieved 24 rows in 0.08s.

GroupBy

SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;

dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;
┌───────────────┬─────────┐
 channel        EXPR$1  
├───────────────┼─────────┤
 #en.wikipedia │ 3045299 │
 #it.wikipedia │  711011 │
 #fr.wikipedia │  642555 │
 #ru.wikipedia │  640698 │
 #es.wikipedia │  634670 │
└───────────────┴─────────┘
Retrieved 5 rows in 0.05s.

Scan

SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;

 dsql> SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;
┌────────────────────────┬────────────────────────────────────────────────────────┐
 user                    page                                                   
├────────────────────────┼────────────────────────────────────────────────────────┤
 Thiago89                Campeonato Mundial de Voleibol Femenino Sub-20 de 2015 
 91.34.200.249           Friede von Schönbrunn                                  
 TuHan-Bot               Trĩ vàng                                               
 Lowercase sigmabot III  User talk:ErrantX                                      
 BattyBot                Hans W. Jung                                           
└────────────────────────┴────────────────────────────────────────────────────────┘
Retrieved 5 rows in 0.04s.

EXPLAIN PLAN FOR

By prepending EXPLAIN PLAN FOR to a Druid SQL query, it is possible to see what native Druid queries a SQL query will plan into.

Using the TopN query above as an example:

EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;

dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 PLAN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
 DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) 
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.

Further reading

The Queries documentation has more information on Druid's native JSON queries.

The Druid SQL documentation has more information on using Druid SQL queries.