IMPALA-14796: Show effective runtime filter targets in profile This patch adds an "Eff. Tgt. Node(s)" (Effective Target Node(s)) column to the "Final filter table" in the query profile. This shows which scan nodes actually had rows rejected by each runtime filter, distinguishing filters that were effective from those that were applied but rejected no data. E.g. ID Src. Node Tgt. Node(s) Eff. Tgt. Node(s) Target type ... -------------------------------------------------------------------- 10 6 2 2 LOCAL ... 8 7 1 1 REMOTE ... 5 8 2 2 LOCAL ... 4 8 0 N REMOTE ... 2 9 0, 3 0, 3 REMOTE, REMOTE ... 0 10 4 4 REMOTE ... In the above example, filter 4 has "N" in the "Eff. Tgt. Node(s)" column, which means it doesn't filter out any rows, i.e. effective target node is "None". All the other filters are effective. Implementation - In ScanNode::Close(), collect the effective runtime filter ids by checking the "rejected" counters of all the FilterStats. These counters correspond to "Files rejected", "RowGroups rejected", "Rows rejected", "Splits rejected" in the query profile. If any of them is non-zero, the filter has rejected some data so it's effective. - Executor reports this info to coordinator via ReportExecStatus RPCs. A list of (filter_id, scan_node_id) pairs is added in ReportExecStatusRequestPB to carry this info. - Coordinator aggregates the effective filter targets when processing the status reports. - In FilterDebugString(), add a column to show the node ids where the runtime filter is effective. Other minor changes - In coordinator.cc, move the code of setting the "Final filter table" from ReleaseExecResources() to ComputeQuerySummary() to ensure the final status reports from backends all arrive. - Removed temp_object_pool and temp_mem_tracker from FilterDebugString() as they have been unused since commit a985e11. - Replaced boost::lexical_cast<string> with std::to_string in converting int to string which is more optimized. - Sort node ids in "Tgt. Node(s)" and "Eff. Tgt. Node(s)" columns to make the output consistent across different runs. Limitation - Kudu scanner doesn't expose metrics reflecting effect of individual filters so we can't detect effective runtime filters on KuduScanNode. Currently the "Eff. Tgt. Node(s)" column of them always has value "N" (IMPALA-15002). Tests - Added e2e test for TPCH-Q5 where some filters are ineffective in both the original profile and aggregated profile modes. - Added checks in runtime_filters.test for queries that have only one runtime filter. - Updated in_list_filters.test for the new column. - Ran tests on both the original planner and the calcite planner. Assisted-by: Claude Sonnet 4.5 Change-Id: Iccf4b87ac4579a70273f3306ec7b58850f06b17c Reviewed-on: http://gerrit.cloudera.org:8080/24123 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Lightning-fast, distributed SQL queries for petabytes of data stored in open data and table formats.
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:
The fastest way to try out Impala is a quickstart Docker container. You can try out running queries and processing data sets in Impala on a single machine without installing dependencies. It can automatically load test data sets into Apache Kudu and Apache Parquet formats and you can start playing around with Apache Impala SQL within minutes.
To learn more about Impala as a user or administrator, or to try Impala, please visit the Impala homepage. Detailed documentation for administrators and users is available at Apache Impala documentation.
If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.
Impala only supports Linux at the moment. Impala supports x86_64 and arm64 (as of Impala 4.4). Impala Requirements contains more detailed information on the minimum CPU requirements.
Impala runs on Linux systems only. The supported distros are
Other systems, e.g. SLES15, may also be supported but are not tested by the community.
This distribution uses cryptographic software and may be subject to export controls. Please refer to EXPORT_CONTROL.md for more information.
See Impala's developer documentation to get started.
Detailed build notes has some detailed information on the project layout and build.