blob: f4db61794865f65581447beb5aa98a2841b13355 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="4.2.0" id="impala_ozone">
<title>Using Impala with Apache Ozone Storage</title>
<titlealts audience="PDF">
<navtitle>Ozone Storage</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Ozone"/>
<data name="Category" value="Disk Storage"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
<indexterm audience="hidden">Ozone</indexterm>
You can use Impala to query data files that reside on Apache Ozone distributed storage,
rather than in HDFS. The combination of the Impala query engine and Apache Ozone storage
is certified on <keyword keyref="impala42"/> or higher.
</p>
<p>
For more information on Ozone, see <xref keyref="upstream_ozone_site"/>.
</p>
<p>
The typical use case for Impala and Ozone together is to use Ozone for the default
filesystem, replacing HDFS entirely. In this configuration, when you create a database,
table, or partition, the data always resides on Ozone storage and you do not need to
specify any special <codeph>LOCATION</codeph> attribute. If you do specify a
<codeph>LOCATION</codeph> attribute, its value refers to a path within the Ozone
filesystem. For example:
</p>
<codeblock>-- If the default filesystem is Ozone, all Impala data resides there
-- and all Impala databases and tables are located there.
CREATE TABLE t1 (x INT, s STRING);
-- You can specify LOCATION for database, table, or partition,
-- using values from the Ozone filesystem.
CREATE DATABASE d1 LOCATION '/some/path/on/ozone/server/d1.db';
CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN);
</codeblock>
<p>
Impala can write to, delete, and rename data files and database, table, and partition
directories on Ozone storage. Therefore, Impala statements such as <codeph>CREATE
TABLE</codeph>, <codeph>DROP TABLE</codeph>, <codeph>CREATE DATABASE</codeph>,
<codeph>DROP DATABASE</codeph>, <codeph>ALTER TABLE</codeph>, and <codeph>INSERT</codeph>
work the same with Ozone storage as with HDFS.
</p>
<p>
Ozone supports multiple protocols: <codeph>ofs</codeph>, <codeph>o3fs</codeph>, and
<codeph>s3a</codeph>. Impala supports reading <codeph>ofs</codeph> and <codeph>o3fs</codeph>.
Impala can also read <codeph>s3a</codeph> (see <xref href="impala_s3.xml#s3"/>). However
<codeph>ofs</codeph> is their newer protocol, and the only one Impala supports as a default
filesystem. We recommend using it for <xref href="impala_ddl.xml#ddl"/> to avoid access
limitations, and for <xref href="impala_dml.xml#dml"/> and
<xref href="impala_select.xml#select"/> for performance.
</p>
<p conref="../shared/impala_common.xml#common/ozone_block_size_caveat"/>
<p>
Impala's spill-to-disk feature may be configured to use Ozone storage by specifying a full
URI (e.g. <codeph>ofs://host:port/volume/bucket/key</codeph>) for the spill location. See
<xref href="impala_disk_space.xml#disk_space"/> for details on configuring remote
spill-to-disk.
</p>
<!-- <p outputclass="toc inpage"/> -->
</conbody>
</concept>