Apache Hive has supported Apache Ozone since Hive 4.0. To enable Hive to work with Ozone paths, ensure that the ozone-filesystem-hadoop3
JAR is added to the Hive classpath.
Hive supports the following protocols for accessing Ozone data:
Hive is compatible with Ozone buckets configured with either:
Hive provides two methods to interact with data in Ozone:
To store managed tables in Ozone, update the following properties in the hive-site.xml
configuration file:
<property> <name>hive.metastore.warehouse.dir</name> <value>ofs://ozone1/vol1/bucket1/warehouse/</value> </property>
You can create a managed table with a standard CREATE TABLE
statement:
CREATE TABLE myTable ( id INT, name STRING );
Data can be loaded into a Hive table from an Ozone location:
LOAD DATA INPATH 'ofs://ozone1/vol1/bucket1/table.csv' INTO TABLE myTable;
You can define a custom Ozone path for a database using the MANAGEDLOCATION
clause:
CREATE DATABASE d1 MANAGEDLOCATION 'ofs://ozone1/vol1/bucket1/data';
Tables created in the database d1 will be stored under the specified path: ofs://ozone1/vol1/bucket1/data
You can confirm that Hive references the correct Ozone path using:
SHOW CREATE DATABASE d1;
Output Example:
+----------------------------------------------------+ | createdb_stmt | +----------------------------------------------------+ | CREATE DATABASE `d1` | | LOCATION | | 'ofs://ozone1/vol1/bucket1/external/d1.db' | | MANAGEDLOCATION | | 'ofs://ozone1/vol1/bucket1/data' | +----------------------------------------------------+
Hive allows the creation of external tables to query existing data stored in Ozone.
CREATE EXTERNAL TABLE external_table ( id INT, name STRING ) LOCATION 'ofs://ozone1/vol1/bucket1/table1';
To set a default path for external tables, configure the following property in the hive-site.xml
file:
<property> <name>hive.metastore.warehouse.external.dir</name> <value>ofs://ozone1/vol1/bucket1/external/</value> </property>
This property specifies the base directory for external tables when no explicit LOCATION
is provided.
To confirm the table's metadata and location, use:
SHOW CREATE TABLE external_table;
Output Example:
+----------------------------------------------------+ | createtab_stmt | +----------------------------------------------------+ | CREATE EXTERNAL TABLE `external_table`( | | `id` int, | | `name` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'ofs://ozone1/vol1/bucket1/table1' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'transient_lastDdlTime'='1734725573') | +----------------------------------------------------+
In addition to ofs, Hive can access Ozone using the S3 Gateway via the S3A file system.
For more information, consult: