| <?xml version="1.0" encoding="UTF-8"?><!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept rev="5.4.3" id="impala_isilon"> |
| |
| <title>Using Impala with Isilon Storage</title> |
| <titlealts audience="PDF"><navtitle>Isilon Storage</navtitle></titlealts> |
| |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Isilon"/> |
| <data name="Category" value="Disk Storage"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| <indexterm audience="hidden">Isilon</indexterm> |
| You can use Impala to query data files that reside on EMC Isilon storage devices, rather than in HDFS. |
| This capability allows convenient query access to a storage system where you might already be |
| managing large volumes of data. The combination of the Impala query engine and Isilon storage is |
| certified on <keyword keyref="impala224"/> or higher. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/isilon_block_size_caveat"/> |
| |
| <p> |
| The typical use case for Impala and Isilon together is to use Isilon for the |
| default filesystem, replacing HDFS entirely. In this configuration, |
| when you create a database, table, or partition, the data always resides on |
| Isilon storage and you do not need to specify any special <codeph>LOCATION</codeph> |
| attribute. If you do specify a <codeph>LOCATION</codeph> attribute, its value refers |
| to a path within the Isilon filesystem. |
| For example: |
| </p> |
| <codeblock>-- If the default filesystem is Isilon, all Impala data resides there |
| -- and all Impala databases and tables are located there. |
| CREATE TABLE t1 (x INT, s STRING); |
| |
| -- You can specify LOCATION for database, table, or partition, |
| -- using values from the Isilon filesystem. |
| CREATE DATABASE d1 LOCATION '/some/path/on/isilon/server/d1.db'; |
| CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN); |
| </codeblock> |
| |
| <p> |
| Impala can write to, delete, and rename data files and database, table, |
| and partition directories on Isilon storage. Therefore, Impala statements such |
| as |
| <codeph>CREATE TABLE</codeph>, <codeph>DROP TABLE</codeph>, |
| <codeph>CREATE DATABASE</codeph>, <codeph>DROP DATABASE</codeph>, |
| <codeph>ALTER TABLE</codeph>, |
| and |
| <codeph>INSERT</codeph> work the same with Isilon storage as with HDFS. |
| </p> |
| |
| <p> |
| When the Impala spill-to-disk feature is activated by a query that approaches |
| the memory limit, Impala writes all the temporary data to a local (not Isilon) |
| storage device. Because the I/O bandwidth for the temporary data depends on |
| the number of local disks, and clusters using Isilon storage might not have |
| as many local disks attached, pay special attention on Isilon-enabled clusters |
| to any queries that use the spill-to-disk feature. Where practical, tune the |
| queries or allocate extra memory for Impala to avoid spilling. |
| Although you can specify an Isilon storage device as the destination for |
| the temporary data for the spill-to-disk feature, that configuration is |
| not recommended due to the need to transfer the data both ways using remote I/O. |
| </p> |
| |
| <p> |
| When tuning Impala queries on HDFS, you typically try to avoid any remote reads. |
| When the data resides on Isilon storage, all the I/O consists of remote reads. |
| Do not be alarmed when you see non-zero numbers for remote read measurements |
| in query profile output. The benefit of the Impala and Isilon integration is |
| primarily convenience of not having to move or copy large volumes of data to HDFS, |
| rather than raw query performance. You can increase the performance of Impala |
| I/O for Isilon systems by increasing the value for the |
| <codeph>--num_remote_hdfs_io_threads</codeph> startup option for the |
| <cmdname>impalad</cmdname> daemon. |
| </p> |
| |
| <!-- <p outputclass="toc inpage"/> --> |
| </conbody> |
| |
| </concept> |