Starting with version 4.2.0, Apache Impala provides full support for querying data stored in Apache Ozone. To utilize this functionality, ensure that your Ozone version is 1.4.0 or later.
Impala supports the following protocols for accessing Ozone data:
Note: The o3fs protocol is NOT supported by Impala.
Impala is compatible with Ozone buckets configured with either:
Impala provides two approaches to interact with Ozone:
If the Hive Warehouse Directory is located in Ozone, you can execute Impala queries without any changes, treating the Ozone file system like HDFS. For example:
CREATE DATABASE d1;
CREATE TABLE t1 (x INT, s STRING);
The data will be stored under the Hive Warehouse Directory path in Ozone.
You can create managed databases, tables, or partitions at a specific Ozone path using the LOCATION
clause. Example:
CREATE DATABASE d1 LOCATION 'ofs://ozone1/vol1/bucket1/d1.db';
CREATE TABLE t1 LOCATION 'ofs://ozone1/vol1/bucket1/table1';
You can create an external table in Impala to query Ozone data. For example:
CREATE EXTERNAL TABLE external_table ( id INT, name STRING ) LOCATION 'ofs://ozone1/vol1/bucket1/table1';
In addition to ofs, Impala can access Ozone via the S3 Gateway using the S3A file system. For more details, refer to
For additional information, consult the Apache Impala User Documentation Using Impala with Apache Ozone Storage.