commit | 603091ed772f3f82511fd8fec355fe9b0126933b | [log] [tgz] |
---|---|---|
author | Csaba Ringhofer <csringhofer@cloudera.com> | Sat May 01 16:19:19 2021 +0200 |
committer | Csaba Ringhofer <csringhofer@cloudera.com> | Wed May 05 05:41:29 2021 +0000 |
tree | 8e2bfe524790bb00a833db2eaf879b341a36d46d | |
parent | f0f083e45e2c77b1499fa6fa08ff8d9dc4a2785f [diff] |
IMPALA-10692: Fix acid insert when event polling is enabled IMPALA-10656 broke inserts to acid tables when HMS event polling is enabled. The issue was that the new partitions created during insert were not added to the catalog table yet when createInsertEvents is called, as the table is reloaded only after firing the events and committing the transaction. The fix is to create the INSERT event based on the partition name and the fileset alone for new partitions. Already existing partitions need the Partition object as we add the event to the list of the partition's in-flight events to detect self-events, but luckily new partitions don't need self event-handling because: - new partitions fire events only if the table is ACID - ACID inserts don't fire any INSERT event visible to Impala, so it cannot cause an unnecessary metadata reload ACID inserts from Hive work differently, they always cause an ALTER_TABLE or ALTER_PARTITION event which are detected by Impala and lead to metadata reload. I think that this situation is hacky at best because these events come before COMMIT event (which is currently ignored by Impala), so Impala may reload the table too early (before the commit is finished). Testing: - added acid tables to TestEventProcessing.test_self_events Change-Id: I8c2d0702232538a746410539ad55f87b7fde57e7 Reviewed-on: http://gerrit.cloudera.org:8080/17380 Reviewed-by: Csaba Ringhofer <csringhofer@cloudera.com> Tested-by: Csaba Ringhofer <csringhofer@cloudera.com>
Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:
The fastest way to try out Impala is a quickstart Docker container. You can try out running queries and processing data sets in Impala on a single machine without installing dependencies. It can automatically load test data sets into Apache Kudu and Apache Parquet formats and you can start playing around with Apache Impala SQL within minutes.
To learn more about Impala as a user or administrator, or to try Impala, please visit the Impala homepage. Detailed documentation for administrators and users is available at Apache Impala documentation.
If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.
Impala only supports Linux at the moment. Impala supports x86_64 and has experimental support for arm64 (as of Impala 4.0). Impala Requirements contains more detailed information on the minimum CPU requirements.
This distribution uses cryptographic software and may be subject to export controls. Please refer to EXPORT_CONTROL.md for more information.
See Impala's developer documentation to get started.
Detailed build notes has some detailed information on the project layout and build.