This tutorial will demonstrate how to query data in Druid, with examples for Druid's native query format and Druid SQL.
The tutorial assumes that you've already completed one of the 4 ingestion tutorials, as we will be querying the sample Wikipedia edits data.
Druid's native query format is expressed in JSON. We have included a sample native TopN query under quickstart/tutorial/wikipedia-top-pages.json
:
{ "queryType" : "topN", "dataSource" : "wikipedia", "intervals" : ["2015-09-12/2015-09-13"], "granularity" : "all", "dimension" : "page", "metric" : "count", "threshold" : 10, "aggregations" : [ { "type" : "count", "name" : "count" } ] }
This query retrieves the 10 Wikipedia pages with the most page edits on 2015-09-12.
Let's submit this query to the Druid broker:
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8082/druid/v2?pretty
You should see the following query results:
[ { "timestamp" : "2015-09-12T00:46:58.771Z", "result" : [ { "count" : 33, "page" : "Wikipedia:Vandalismusmeldung" }, { "count" : 28, "page" : "User:Cyde/List of candidates for speedy deletion/Subpage" }, { "count" : 27, "page" : "Jeremy Corbyn" }, { "count" : 21, "page" : "Wikipedia:Administrators' noticeboard/Incidents" }, { "count" : 20, "page" : "Flavia Pennetta" }, { "count" : 18, "page" : "Total Drama Presents: The Ridonculous Race" }, { "count" : 18, "page" : "User talk:Dudeperson176123" }, { "count" : 18, "page" : "Wikipédia:Le Bistro/12 septembre 2015" }, { "count" : 17, "page" : "Wikipedia:In the news/Candidates" }, { "count" : 17, "page" : "Wikipedia:Requests for page protection" } ] } ]
Druid also supports a dialect of SQL for querying. Let's run a SQL query that is equivalent to the native JSON query shown above:
SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
The SQL queries are submitted as JSON over HTTP.
The tutorial package includes an example file that contains the SQL query shown above at quickstart/tutorial/wikipedia-top-pages-sql.json
. Let's submit that query to the Druid broker:
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql
The following results should be returned:
[ { "page": "Wikipedia:Vandalismusmeldung", "Edits": 33 }, { "page": "User:Cyde/List of candidates for speedy deletion/Subpage", "Edits": 28 }, { "page": "Jeremy Corbyn", "Edits": 27 }, { "page": "Wikipedia:Administrators' noticeboard/Incidents", "Edits": 21 }, { "page": "Flavia Pennetta", "Edits": 20 }, { "page": "Total Drama Presents: The Ridonculous Race", "Edits": 18 }, { "page": "User talk:Dudeperson176123", "Edits": 18 }, { "page": "Wikipédia:Le Bistro/12 septembre 2015", "Edits": 18 }, { "page": "Wikipedia:In the news/Candidates", "Edits": 17 }, { "page": "Wikipedia:Requests for page protection", "Edits": 17 } ]
For convenience, the Druid package includes a SQL command-line client, located at bin/dsql
from the Druid package root.
Let's now run bin/dsql
; you should see the following prompt:
Welcome to dsql, the command-line client for Druid SQL. Type "\h" for help. dsql>
To submit the query, paste it to the dsql
prompt and press enter:
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; ┌──────────────────────────────────────────────────────────┬───────┐ │ page │ Edits │ ├──────────────────────────────────────────────────────────┼───────┤ │ Wikipedia:Vandalismusmeldung │ 33 │ │ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │ │ Jeremy Corbyn │ 27 │ │ Wikipedia:Administrators' noticeboard/Incidents │ 21 │ │ Flavia Pennetta │ 20 │ │ Total Drama Presents: The Ridonculous Race │ 18 │ │ User talk:Dudeperson176123 │ 18 │ │ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │ │ Wikipedia:In the news/Candidates │ 17 │ │ Wikipedia:Requests for page protection │ 17 │ └──────────────────────────────────────────────────────────┴───────┘ Retrieved 10 rows in 0.06s.
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR);
dsql> SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY FLOOR(__time to HOUR); ┌──────────────────────────┬──────────────┐ │ HourTime │ LinesDeleted │ ├──────────────────────────┼──────────────┤ │ 2015-09-12T00:00:00.000Z │ 1761 │ │ 2015-09-12T01:00:00.000Z │ 16208 │ │ 2015-09-12T02:00:00.000Z │ 14543 │ │ 2015-09-12T03:00:00.000Z │ 13101 │ │ 2015-09-12T04:00:00.000Z │ 12040 │ │ 2015-09-12T05:00:00.000Z │ 6399 │ │ 2015-09-12T06:00:00.000Z │ 9036 │ │ 2015-09-12T07:00:00.000Z │ 11409 │ │ 2015-09-12T08:00:00.000Z │ 11616 │ │ 2015-09-12T09:00:00.000Z │ 17509 │ │ 2015-09-12T10:00:00.000Z │ 19406 │ │ 2015-09-12T11:00:00.000Z │ 16284 │ │ 2015-09-12T12:00:00.000Z │ 18672 │ │ 2015-09-12T13:00:00.000Z │ 30520 │ │ 2015-09-12T14:00:00.000Z │ 18025 │ │ 2015-09-12T15:00:00.000Z │ 26399 │ │ 2015-09-12T16:00:00.000Z │ 24759 │ │ 2015-09-12T17:00:00.000Z │ 19634 │ │ 2015-09-12T18:00:00.000Z │ 17345 │ │ 2015-09-12T19:00:00.000Z │ 19305 │ │ 2015-09-12T20:00:00.000Z │ 22265 │ │ 2015-09-12T21:00:00.000Z │ 16394 │ │ 2015-09-12T22:00:00.000Z │ 16379 │ │ 2015-09-12T23:00:00.000Z │ 15289 │ └──────────────────────────┴──────────────┘ Retrieved 24 rows in 0.08s.
SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5;
dsql> SELECT channel, SUM(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY channel ORDER BY SUM(added) DESC LIMIT 5; ┌───────────────┬─────────┐ │ channel │ EXPR$1 │ ├───────────────┼─────────┤ │ #en.wikipedia │ 3045299 │ │ #it.wikipedia │ 711011 │ │ #fr.wikipedia │ 642555 │ │ #ru.wikipedia │ 640698 │ │ #es.wikipedia │ 634670 │ └───────────────┴─────────┘ Retrieved 5 rows in 0.05s.
SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5;
dsql> SELECT user, page FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00' LIMIT 5; ┌────────────────────────┬────────────────────────────────────────────────────────┐ │ user │ page │ ├────────────────────────┼────────────────────────────────────────────────────────┤ │ Thiago89 │ Campeonato Mundial de Voleibol Femenino Sub-20 de 2015 │ │ 91.34.200.249 │ Friede von Schönbrunn │ │ TuHan-Bot │ Trĩ vàng │ │ Lowercase sigmabot III │ User talk:ErrantX │ │ BattyBot │ Hans W. Jung │ └────────────────────────┴────────────────────────────────────────────────────────┘ Retrieved 5 rows in 0.04s.
By prepending EXPLAIN PLAN FOR
to a Druid SQL query, it is possible to see what native Druid queries a SQL query will plan into.
Using the TopN query above as an example:
EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10; ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ PLAN │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Retrieved 1 row in 0.03s.
The Queries documentation has more information on Druid's native JSON queries.
The Druid SQL documentation has more information on using Druid SQL queries.