| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="config_performance"> |
| |
| <title>Post-Installation Configuration for Impala</title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Performance"/> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Configuring"/> |
| <data name="Category" value="Administrators"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p id="p_24"> |
| This section describes the mandatory and recommended configuration settings for Impala. If Impala is |
| installed using cluster management software, some of these configurations might be completed automatically; you must still |
| configure short-circuit reads manually. If you want to customize your environment, consider making the changes described in this topic. |
| </p> |
| |
| <ul> |
| <li> |
| You must enable short-circuit reads, whether or not Impala was installed with cluster |
| management software. This setting goes in the Impala configuration settings, not the Hadoop-wide settings. |
| </li> |
| |
| <li> |
| You must enable block location tracking, and you can optionally enable native checksumming for optimal performance. |
| </li> |
| </ul> |
| |
| <section id="section_fhq_wyv_ls"> |
| <title>Mandatory: Short-Circuit Reads</title> |
| <p> Enabling short-circuit reads allows Impala to read local data directly |
| from the file system. This removes the need to communicate through the |
| DataNodes, improving performance. This setting also minimizes the number |
| of additional copies of data. Short-circuit reads requires |
| <codeph>libhadoop.so</codeph> |
| (the Hadoop Native Library) to be accessible to both the server and the |
| client. <codeph>libhadoop.so</codeph> is not available if you have |
| installed from a tarball. You must install from an |
| <codeph>.rpm</codeph>, <codeph>.deb</codeph>, or parcel to use |
| short-circuit local reads. |
| </p> |
| <p> |
| <b>To configure DataNodes for short-circuit reads:</b> |
| </p> |
| <ol id="ol_qlq_wyv_ls"> |
| <li id="copy_config_files"> Copy the client |
| <codeph>core-site.xml</codeph> and <codeph>hdfs-site.xml</codeph> |
| configuration files from the Hadoop configuration directory to the |
| Impala configuration directory. The default Impala configuration |
| location is <codeph>/etc/impala/conf</codeph>. </li> |
| <li> |
| <indexterm audience="hidden">dfs.client.read.shortcircuit</indexterm> |
| <indexterm audience="hidden">dfs.domain.socket.path</indexterm> |
| <indexterm audience="hidden">dfs.client.file-block-storage-locations.timeout.millis</indexterm> |
| On all Impala nodes, configure the following properties in <!-- Exact timing is unclear, since we say farther down to copy /etc/hadoop/conf/hdfs-site.xml to /etc/impala/conf. |
| Which wouldn't work if we already modified the Impala version of the file here. Also on a 'managed' |
| cluster, these /etc files might not exist in those locations. --> |
| <!-- <codeph>/etc/impala/conf/hdfs-site.xml</codeph> as shown: --> |
| Impala's copy of <codeph>hdfs-site.xml</codeph> as shown: <codeblock><property> |
| <name>dfs.client.read.shortcircuit</name> |
| <value>true</value> |
| </property> |
| |
| <property> |
| <name>dfs.domain.socket.path</name> |
| <value>/var/run/hdfs-sockets/dn</value> |
| </property> |
| |
| <property> |
| <name>dfs.client.file-block-storage-locations.timeout.millis</name> |
| <value>10000</value> |
| </property></codeblock> |
| <!-- Former socket.path value: <value>/var/run/hadoop-hdfs/dn._PORT</value> --> |
| <!-- |
| <note> |
| The text <codeph>_PORT</codeph> appears just as shown; you do not need to |
| substitute a number. |
| </note> |
| --> |
| </li> |
| <li> |
| <p> If <codeph>/var/run/hadoop-hdfs/</codeph> is group-writable, make |
| sure its group is <codeph>root</codeph>. </p> |
| <note> If you are also going to enable block location tracking, you |
| can skip copying configuration files and restarting DataNodes and go |
| straight to <xref href="#config_performance/block_location_tracking" |
| >Optional: Block Location Tracking</xref>. |
| Configuring short-circuit reads and block location tracking require |
| the same process of copying files and restarting services, so you |
| can complete that process once when you have completed all |
| configuration changes. Whether you copy files and restart services |
| now or during configuring block location tracking, short-circuit |
| reads are not enabled until you complete those final steps. </note> |
| </li> |
| <li id="restart_all_datanodes"> After applying these changes, restart |
| all DataNodes. </li> |
| </ol> |
| </section> |
| |
| <section id="block_location_tracking"> |
| |
| <title>Mandatory: Block Location Tracking</title> |
| |
| <p> |
| Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing |
| better utilization of the underlying disks. Impala will not start unless this setting is enabled. |
| </p> |
| |
| <p> |
| <b>To enable block location tracking:</b> |
| </p> |
| |
| <ol> |
| <li> |
| For each DataNode, adding the following to theĀ <codeph>hdfs-site.xml</codeph> file: |
| <codeblock><property> |
| <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> |
| <value>true</value> |
| </property> </codeblock> |
| </li> |
| |
| <li conref="#config_performance/copy_config_files"/> |
| |
| <li conref="#config_performance/restart_all_datanodes"/> |
| </ol> |
| </section> |
| |
| <section id="native_checksumming"> |
| |
| <title>Optional: Native Checksumming</title> |
| |
| <p> |
| Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if |
| that library is available. |
| </p> |
| |
| <p id="p_29"> |
| <b>To enable native checksumming:</b> |
| </p> |
| |
| <p> |
| If you installed <keyword keyref="distro"/> from packages, the native checksumming library is installed and setup correctly. In |
| such a case, no additional steps are required. Conversely, if you installed by other means, such as with |
| tarballs, native checksumming may not be available due to missing shared objects. Finding the message |
| "<codeph>Unable to load native-hadoop library for your platform... using builtin-java classes where |
| applicable</codeph>" in the Impala logs indicates native checksumming may be unavailable. To enable native |
| checksumming, you must build and install <codeph>libhadoop.so</codeph> (the |
| <!-- Another instance of stale link. --> |
| <!-- <xref href="http://hadoop.apache.org/docs/r0.19.1/native_libraries.html" scope="external" format="html">Hadoop Native Library</xref>). --> |
| Hadoop Native Library). |
| </p> |
| </section> |
| </conbody> |
| </concept> |