| --- |
| id: sql-translation |
| title: "SQL query translation" |
| sidebar_label: "SQL query translation" |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| > Apache Druid supports two query languages: Druid SQL and [native queries](querying.md). |
| > This document describes the SQL language. |
| |
| Druid uses [Apache Calcite](https://calcite.apache.org/) to parse and plan SQL queries. |
| Druid translates SQL statements into its [native JSON-based query language](querying.md). |
| In general, the slight overhead of translating SQL on the Broker is the only minor performance penalty to using Druid SQL compared to native queries. |
| |
| This topic includes best practices and tools to help you achieve good performance and minimize the impact of translation. |
| |
| ## Best practices |
| |
| Consider this (non-exhaustive) list of things to look out for when looking into the performance implications of |
| how your SQL queries are translated to native queries. |
| |
| 1. If you wrote a filter on the primary time column `__time`, make sure it is being correctly translated to an |
| `"intervals"` filter, as described in the [Time filters](#time-filters) section below. If not, you may need to change |
| the way you write the filter. |
| |
| 2. Try to avoid subqueries underneath joins: they affect both performance and scalability. This includes implicit |
| subqueries generated by conditions on mismatched types, and implicit subqueries generated by conditions that use |
| expressions to refer to the right-hand side. |
| |
| 3. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into |
| Join's children). Druid only supports pushing predicates into the join if they originated from |
| above the join. Hence, the location of predicates and filters in your Druid SQL is very important. |
| Also, as a result of this, comma joins should be avoided. |
| |
| 4. Read through the [Query execution](query-execution.md) page to understand how various types of native queries |
| will be executed. |
| |
| 5. Be careful when interpreting EXPLAIN PLAN output, and use request logging if in doubt. Request logs will show the |
| exact native query that was run. See the [next section](#interpreting-explain-plan-output) for more details. |
| |
| 6. If you encounter a query that could be planned better, feel free to |
| [raise an issue on GitHub](https://github.com/apache/druid/issues/new/choose). A reproducible test case is always |
| appreciated. |
| |
| ## Interpreting EXPLAIN PLAN output |
| |
| The [EXPLAIN PLAN](sql.md#explain-plan) functionality can help you understand how a given SQL query will |
| be translated to native. For simple queries that do not involve subqueries or joins, the output of EXPLAIN PLAN |
| is easy to interpret. The native query that will run is embedded as JSON inside a "DruidQueryRel" line: |
| |
| ``` |
| > EXPLAIN PLAN FOR SELECT COUNT(*) FROM wikipedia |
| |
| DruidQueryRel(query=[{"queryType":"timeseries","dataSource":"wikipedia","intervals":"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z","granularity":"all","aggregations":[{"type":"count","name":"a0"}]}], signature=[{a0:LONG}]) |
| ``` |
| |
| For more complex queries that do involve subqueries or joins, EXPLAIN PLAN is somewhat more difficult to interpret. |
| For example, consider this query: |
| |
| ``` |
| > EXPLAIN PLAN FOR |
| > SELECT |
| > channel, |
| > COUNT(*) |
| > FROM wikipedia |
| > WHERE channel IN (SELECT page FROM wikipedia GROUP BY page ORDER BY COUNT(*) DESC LIMIT 10) |
| > GROUP BY channel |
| |
| DruidJoinQueryRel(condition=[=($1, $3)], joinType=[inner], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"granularity":"all","dimensions":["channel"],"aggregations":[{"type":"count","name":"a0"}]}], signature=[{d0:STRING, a0:LONG}]) |
| DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"wikipedia"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"resultFormat":"compactedList","columns":["__time","channel","page"],"granularity":"all"}], signature=[{__time:LONG, channel:STRING, page:STRING}]) |
| DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"dimension":"page","metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"granularity":"all","aggregations":[{"type":"count","name":"a0"}]}], signature=[{d0:STRING}]) |
| ``` |
| |
| Here, there is a join with two inputs. The way to read this is to consider each line of the EXPLAIN PLAN output as |
| something that might become a query, or might just become a simple datasource. The `query` field they all have is |
| called a "partial query" and represents what query would be run on the datasource represented by that line, if that |
| line ran by itself. In some cases — like the "scan" query in the second line of this example — the query does not |
| actually run, and it ends up being translated to a simple table datasource. See the [Join translation](#joins) section |
| for more details about how this works. |
| |
| We can see this for ourselves using Druid's [request logging](../configuration/index.md#request-logging) feature. After |
| enabling logging and running this query, we can see that it actually runs as the following native query. |
| |
| ```json |
| { |
| "queryType": "groupBy", |
| "dataSource": { |
| "type": "join", |
| "left": "wikipedia", |
| "right": { |
| "type": "query", |
| "query": { |
| "queryType": "topN", |
| "dataSource": "wikipedia", |
| "dimension": {"type": "default", "dimension": "page", "outputName": "d0"}, |
| "metric": {"type": "numeric", "metric": "a0"}, |
| "threshold": 10, |
| "intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z", |
| "granularity": "all", |
| "aggregations": [ |
| { "type": "count", "name": "a0"} |
| ] |
| } |
| }, |
| "rightPrefix": "j0.", |
| "condition": "(\"page\" == \"j0.d0\")", |
| "joinType": "INNER" |
| }, |
| "intervals": "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z", |
| "granularity": "all", |
| "dimensions": [ |
| {"type": "default", "dimension": "channel", "outputName": "d0"} |
| ], |
| "aggregations": [ |
| { "type": "count", "name": "a0"} |
| ] |
| } |
| ``` |
| |
| ## Query types |
| |
| Druid SQL uses four different native query types. |
| |
| - [Scan](scan-query.md) is used for queries that do not aggregate (no GROUP BY, no DISTINCT). |
| |
| - [Timeseries](timeseriesquery.md) is used for queries that GROUP BY `FLOOR(__time TO unit)` or `TIME_FLOOR(__time, |
| period)`, have no other grouping expressions, no HAVING or LIMIT clauses, no nesting, and either no ORDER BY, or an |
| ORDER BY that orders by same expression as present in GROUP BY. It also uses Timeseries for "grand total" queries that |
| have aggregation functions but no GROUP BY. This query type takes advantage of the fact that Druid segments are sorted |
| by time. |
| |
| - [TopN](topnquery.md) is used by default for queries that group by a single expression, do have ORDER BY and LIMIT |
| clauses, do not have HAVING clauses, and are not nested. However, the TopN query type will deliver approximate ranking |
| and results in some cases; if you want to avoid this, set "useApproximateTopN" to "false". TopN results are always |
| computed in memory. See the TopN documentation for more details. |
| |
| - [GroupBy](groupbyquery.md) is used for all other aggregations, including any nested aggregation queries. Druid's |
| GroupBy is a traditional aggregation engine: it delivers exact results and rankings and supports a wide variety of |
| features. GroupBy aggregates in memory if it can, but it may spill to disk if it doesn't have enough memory to complete |
| your query. Results are streamed back from data processes through the Broker if you ORDER BY the same expressions in your |
| GROUP BY clause, or if you don't have an ORDER BY at all. If your query has an ORDER BY referencing expressions that |
| don't appear in the GROUP BY clause (like aggregation functions) then the Broker will materialize a list of results in |
| memory, up to a max of your LIMIT, if any. See the GroupBy documentation for details about tuning performance and memory |
| use. |
| |
| ## Time filters |
| |
| For all native query types, filters on the `__time` column will be translated into top-level query "intervals" whenever |
| possible, which allows Druid to use its global time index to quickly prune the set of data that must be scanned. |
| Consider this (non-exhaustive) list of time filters that will be recognized and translated to "intervals": |
| |
| - `__time >= TIMESTAMP '2000-01-01 00:00:00'` (comparison to absolute time) |
| - `__time >= CURRENT_TIMESTAMP - INTERVAL '8' HOUR` (comparison to relative time) |
| - `FLOOR(__time TO DAY) = TIMESTAMP '2000-01-01 00:00:00'` (specific day) |
| |
| Refer to the [Interpreting EXPLAIN PLAN output](#interpreting-explain-plan-output) section for details on confirming |
| that time filters are being translated as you expect. |
| |
| ## Joins |
| |
| SQL join operators are translated to native join datasources as follows: |
| |
| 1. Joins that the native layer can handle directly are translated literally, to a [join datasource](datasource.md#join) |
| whose `left`, `right`, and `condition` are faithful translations of the original SQL. This includes any SQL join where |
| the right-hand side is a lookup or subquery, and where the condition is an equality where one side is an expression based |
| on the left-hand table, the other side is a simple column reference to the right-hand table, and both sides of the |
| equality are the same data type. |
| |
| 2. If a join cannot be handled directly by a native [join datasource](datasource.md#join) as written, Druid SQL |
| will insert subqueries to make it runnable. For example, `foo INNER JOIN bar ON foo.abc = LOWER(bar.def)` cannot be |
| directly translated, because there is an expression on the right-hand side instead of a simple column access. A subquery |
| will be inserted that effectively transforms this clause to |
| `foo INNER JOIN (SELECT LOWER(def) AS def FROM bar) t ON foo.abc = t.def`. |
| |
| 3. Druid SQL does not currently reorder joins to optimize queries. |
| |
| Refer to the [Interpreting EXPLAIN PLAN output](#interpreting-explain-plan-output) section for details on confirming |
| that joins are being translated as you expect. |
| |
| Refer to the [Query execution](query-execution.md#join) page for information about how joins are executed. |
| |
| ## Subqueries |
| |
| Subqueries in SQL are generally translated to native query datasources. Refer to the |
| [Query execution](query-execution.md#query) page for information about how subqueries are executed. |
| |
| > Note: Subqueries in the WHERE clause, like `WHERE col1 IN (SELECT foo FROM ...)` are translated to inner joins. |
| |
| ## Approximations |
| |
| Druid SQL will use approximate algorithms in some situations: |
| |
| - The `COUNT(DISTINCT col)` aggregation functions by default uses a variant of |
| [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf), a fast approximate distinct counting |
| algorithm. Druid SQL will switch to exact distinct counts if you set "useApproximateCountDistinct" to "false", either |
| through query context or through Broker configuration. |
| |
| - GROUP BY queries over a single column with ORDER BY and LIMIT may be executed using the TopN engine, which uses an |
| approximate algorithm. Druid SQL will switch to an exact grouping algorithm if you set "useApproximateTopN" to "false", |
| either through query context or through Broker configuration. |
| |
| - Aggregation functions that are labeled as using sketches or approximations, such as APPROX_COUNT_DISTINCT, are always |
| approximate, regardless of configuration. |
| |
| **A known issue with approximate functions based on data sketches** |
| |
| The `APPROX_QUANTILE_DS` and `DS_QUANTILES_SKETCH` functions can fail with an `IllegalStateException` if one of the sketches for |
| the query hits `maxStreamLength`: the maximum number of items to store in each sketch. |
| See [GitHub issue 11544](https://github.com/apache/druid/issues/11544) for more details. |
| To workaround the issue, increase value of the maximum string length with the `approxQuantileDsMaxStreamLength` parameter |
| in the query context. Since it is set to 1,000,000,000 by default, you don't need to override it in most cases. |
| See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for how many bytes are required per stream length. |
| This query context parameter is a temporary solution to avoid the known issue. It may be removed in a future release after the bug is fixed. |
| |
| ## Unsupported features |
| |
| Druid does not support all SQL features. In particular, the following features are not supported. |
| |
| - JOIN between native datasources (table, lookup, subquery) and [system tables](sql-metadata-tables.md). |
| - JOIN conditions that are not an equality between expressions from the left- and right-hand sides. |
| - JOIN conditions containing a constant value inside the condition. |
| - JOIN conditions on a column which contains a multi-value dimension. |
| - OVER clauses, and analytic functions such as `LAG` and `LEAD`. |
| - ORDER BY for a non-aggregating query, except for `ORDER BY __time` or `ORDER BY __time DESC`, which are supported. |
| This restriction only applies to non-aggregating queries; you can ORDER BY any column in an aggregating query. |
| - DDL and DML. |
| - Using Druid-specific functions like `TIME_PARSE` and `APPROX_QUANTILE_DS` on [system tables](sql-metadata-tables.md). |
| |
| Additionally, some Druid native query features are not supported by the SQL language. Some unsupported Druid features |
| include: |
| |
| - [Inline datasources](datasource.md#inline). |
| - [Spatial filters](../development/geo.md). |
| - [Multi-value dimensions](sql-data-types.md#multi-value-strings) are only partially implemented in Druid SQL. There are known |
| inconsistencies between their behavior in SQL queries and in native queries due to how they are currently treated by |
| the SQL planner. |
| |
| |