commit | 202c78c8f32bd3ec5e70e021b1494d878f38d678 | [log] [tgz] |
---|---|---|
author | Gian Merlino <gianmerlino@gmail.com> | Wed Apr 14 10:49:27 2021 -0700 |
committer | GitHub <noreply@github.com> | Wed Apr 14 10:49:27 2021 -0700 |
tree | 7b2f3dc6d420dabdbedc96a65e15a7903d022284 | |
parent | b51632b0bf38d1ef396e62a4851d84433029713d [diff] |
Enable rewriting certain inner joins as filters. (#11068) * Enable rewriting certain inner joins as filters. The main logic for doing the rewrite is in JoinableFactoryWrapper's segmentMapFn method. The requirements are: - It must be an inner equi-join. - The right-hand columns referenced by the condition must not contain any duplicate values. (If they did, the inner join would not be guaranteed to return at most one row for each left-hand-side row.) - No columns from the right-hand side can be used by anything other than the join condition itself. HashJoinSegmentStorageAdapter is also modified to pass through to the base adapter (even allowing vectorization!) in the case where 100% of join clauses could be rewritten as filters. In support of this goal: - Add Query getRequiredColumns() method to help us figure out whether the right-hand side of a join datasource is being used or not. - Add JoinConditionAnalysis getRequiredColumns() method to help us figure out if the right-hand side of a join is being used by later join clauses acting on the same base. - Add Joinable getNonNullColumnValuesIfAllUnique method to enable retrieving the set of values that will form the "in" filter. - Add LookupExtractor canGetKeySet() and keySet() methods to support LookupJoinable in its efforts to implement the new Joinable method. - Add "enableRewriteJoinToFilter" feature flag to JoinFilterRewriteConfig. The default is disabled. * Test improvements. * Test fixes. * Avoid slow size() call. * Remove invalid test. * Fix style. * Fix mistaken default. * Small fixes. * Fix logic error.
Website | Documentation | Developer Mailing List | User Mailing List | Slack | Twitter | Download
Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.
Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases.
You can get started with Druid with our local or Docker quickstart.
Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).
Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.
Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.
Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.
You can find the documentation for the latest Druid release on the project website.
If you would like to contribute documentation, please do so under /docs
in this repository and submit a pull request.
Community support is available on the druid-user mailing list, which is hosted at Google Groups.
Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.
Chat with Druid committers and users in real-time on the #druid
channel in the Apache Slack team. Please use this invitation link to join the ASF Slack, and once joined, go into the #druid
channel.
Please note that JDK 8 is required to build Druid.
For instructions on building Druid from source, see docs/development/build.md
Please follow the community guidelines for contributing.
For instructions on setting up IntelliJ dev/intellij-setup.md