| --- |
| id: tutorial-query |
| title: Query data |
| sidebar_label: Query data |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| |
| This tutorial demonstrates how to query data in Apache Druid using SQL. |
| |
| It assumes that you've completed the [Quickstart](../tutorials/index.md) |
| or one of the following tutorials, since we'll query datasources that you would have created |
| by following one of them: |
| |
| * [Load a file](../tutorials/tutorial-batch.md) |
| * [Load stream data from Kafka](../tutorials/tutorial-kafka.md) |
| * [Load a file using Hadoop](../tutorials/tutorial-batch-hadoop.md) |
| |
| There are various ways to run Druid SQL queries: from the web console, using a command line utility |
| and by posting the query by HTTP. We'll look at each of these. |
| |
| |
| ## Query SQL from the web console |
| |
| The web console includes a view that makes it easier to build and test queries, and |
| view their results. |
| |
| 1. Start up the Druid cluster, if it's not already running, and open the web console in your web |
| browser. |
| |
| 2. Click **Query** from the header to open the Query view: |
| |
|  |
| |
| You can always write queries directly in the edit pane, but the Query view also provides |
| facilities to help you construct SQL queries, which we will use to generate a starter query. |
| |
| 3. Expand the wikipedia datasource tree in the left pane. We'll |
| create a query for the page dimension. |
| |
| 4. Click `page` and then **Show:page** from the menu: |
| |
|  |
| |
| A SELECT query appears in the query edit pane and immediately runs. However, in this case, the query |
| returns no data, since by default the query filters for data from the last day, while our data is considerably |
| older than that. Let's remove the filter. |
| |
| 5. Click **Run** to run the query. |
| |
| You should now see two columns of data, a page name and the count: |
| |
|  |
| |
| Notice that the results are limited in the console to about a hundred, by default, due to the **Smart query limit** |
| feature. This helps users avoid inadvertently running queries that return an excessive amount of data, possibly |
| overwhelming their system. |
| |
| 6. Let's edit the query directly and take a look at a few more query building features in the editor. |
| Click in the query edit pane and make the following changes: |
| |
| 1. Add a line after the first column, `"page"` and Start typing the name of a new column, `"countryName"`. Notice that the autocomplete menu suggests column names, functions, keywords, and more. Choose "countryName" and |
| add the new column to the GROUP BY clause as well, either by name or by reference to its position, `2`. |
| |
| 2. For readability, replace `Count` column name with `Edits`, since the `COUNT()` function actually |
| returns the number of edits for the page. Make the same column name change in the ORDER BY clause as well. |
| |
| The `COUNT()` function is one of many functions available for use in Druid SQL queries. You can mouse over a function name |
| in the autocomplete menu to see a brief description of a function. Also, you can find more information in the Druid |
| documentation; for example, the `COUNT()` function is documented in |
| [Aggregation functions](../querying/sql-aggregations.md). |
| |
| The query should now be: |
| |
| ```sql |
| SELECT |
| "page", |
| "countryName", |
| COUNT(*) AS "Edits" |
| FROM "wikipedia" |
| GROUP BY 1, 2 |
| ORDER BY "Edits" DESC |
| ``` |
| |
| When you run the query again, notice that we're getting the new dimension,`countryName`, but for most of the rows, its value |
| is null. Let's |
| show only rows with a `countryName` value. |
| |
| 7. Click the `countryName` dimension in the left pane and choose the first filtering option. It's not exactly what we want, but |
| we'll edit it by hand. The new WHERE clause should appear in your query. |
| |
| 8. Modify the WHERE clause to exclude results that do not have a value for countryName: |
| |
| ```sql |
| WHERE "countryName" IS NOT NULL |
| ``` |
| Run the query again. You should now see the top edits by country: |
| |
|  |
| |
| 9. Under the covers, every Druid SQL query is translated into a query in the JSON-based _Druid native query_ format before it runs |
| on data nodes. You can view the native query for this query by clicking `...` and **Explain SQL Query**. |
| |
| While you can use Druid SQL for most purposes, familiarity with native query is useful for composing complex queries and for troubleshooting |
| performance issues. For more information, see [Native queries](../querying/querying.md). |
| |
|  |
| |
| :::info |
| Another way to view the explain plan is by adding EXPLAIN PLAN FOR to the front of your query, as follows: |
| |
| ```sql |
| EXPLAIN PLAN FOR |
| SELECT |
| "page", |
| "countryName", |
| COUNT(*) AS "Edits" |
| FROM "wikipedia" |
| WHERE "countryName" IS NOT NULL |
| GROUP BY 1, 2 |
| ORDER BY "Edits" DESC |
| ``` |
| This is particularly useful when running queries |
| from the command line or over HTTP. |
| ::: |
| |
| |
| 9. Finally, click `...` and **Edit context** to see how you can add additional parameters controlling the execution of the query execution. In the field, enter query context options as JSON key-value pairs, as described in [Context flags](../querying/query-context.md). |
| |
| That's it! We've built a simple query using some of the query builder features built into the web console. The following |
| sections provide a few more example queries you can try. |
| |
| See [Query SQL over HTTP](#query-sql-over-http) for an example of how to use the Druid SQL HTTP API. |
| |
| ## More Druid SQL examples |
| |
| Try the following queries to learn a few more Druid SQL tricks: |
| |
| ### Query over time |
| |
| ```sql |
| SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted |
| FROM wikipedia WHERE TIME_IN_INTERVAL("__time", '2016-06-27/2016-06-28') |
| GROUP BY 1 |
| ``` |
| |
|  |
| |
| ### General group by |
| |
| ```sql |
| SELECT channel, page, SUM(added) |
| FROM wikipedia WHERE TIME_IN_INTERVAL("__time", '2016-06-27/2016-06-28') |
| GROUP BY channel, page |
| ORDER BY SUM(added) DESC |
| ``` |
| |
|  |
| |
| ## Query SQL over HTTP |
| |
| |
| You can submit native queries [directly to the Druid Broker over HTTP](../api-reference/sql-api.md#submit-a-query). The request body should be a JSON object, with the value for the key `query` containing text of the query: |
| |
| ```json |
| { |
| "query": "SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE TIME_IN_INTERVAL(\"__time\", '2016-06-27/2016-06-28') GROUP BY page ORDER BY Edits DESC LIMIT 10" |
| } |
| ``` |
| |
| The tutorial package includes an example file that contains the SQL query shown above at `quickstart/tutorial/wikipedia-top-pages-sql.json`. Let's submit that query to the Druid Broker: |
| |
| ```bash |
| curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql |
| ``` |
| |
| The following results should be returned: |
| |
| ```json |
| [ |
| { |
| "page": "Copa América Centenario", |
| "Edits": 29 |
| }, |
| { |
| "page": "User:Cyde/List of candidates for speedy deletion/Subpage", |
| "Edits": 16 |
| }, |
| { |
| "page": "Wikipedia:Administrators' noticeboard/Incidents", |
| "Edits": 16 |
| }, |
| { |
| "page": "2016 Wimbledon Championships – Men's Singles", |
| "Edits": 15 |
| }, |
| { |
| "page": "Wikipedia:Administrator intervention against vandalism", |
| "Edits": 15 |
| }, |
| { |
| "page": "Wikipedia:Vandalismusmeldung", |
| "Edits": 15 |
| }, |
| { |
| "page": "The Winds of Winter (Game of Thrones)", |
| "Edits": 12 |
| }, |
| { |
| "page": "ولاية الجزائر", |
| "Edits": 12 |
| }, |
| { |
| "page": "Copa América", |
| "Edits": 10 |
| }, |
| { |
| "page": "Lionel Messi", |
| "Edits": 10 |
| } |
| ] |
| ``` |
| |
| ## Further reading |
| |
| See the [Druid SQL documentation](../querying/sql.md) for more information on using Druid SQL queries. |
| |
| See the [Queries documentation](../querying/querying.md) for more information on Druid native queries. |