| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="perf_skew"> |
| |
| <title>Detecting and Correcting HDFS Block Skew Conditions</title> |
| <titlealts audience="PDF"><navtitle>HDFS Block Skew</navtitle></titlealts> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Performance"/> |
| <data name="Category" value="HDFS"/> |
| <data name="Category" value="Proof of Concept"/> |
| <data name="Category" value="Administrators"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| For best performance of Impala parallel queries, the work is divided equally across hosts in the cluster, and |
| all hosts take approximately equal time to finish their work. If one host takes substantially longer than |
| others, the extra time needed for the slow host can become the dominant factor in query performance. |
| Therefore, one of the first steps in performance tuning for Impala is to detect and correct such conditions. |
| </p> |
| |
| <p> |
| The main cause of uneven performance that you can correct within Impala is <term>skew</term> in the number of |
| HDFS data blocks processed by each host, where some hosts process substantially more data blocks than others. |
| This condition can occur because of uneven distribution of the data values themselves, for example causing |
| certain data files or partitions to be large while others are very small. (Although it is possible to have |
| unevenly distributed data without any problems with the distribution of HDFS blocks.) Block skew could also |
| be due to the underlying block allocation policies within HDFS, the replication factor of the data files, and |
| the way that Impala chooses the host to process each data block. |
| </p> |
| |
| <p> |
| The most convenient way to detect block skew, or slow-host issues in general, is to examine the <q>executive |
| summary</q> information from the query profile after running a query: |
| </p> |
| |
| <ul> |
| <li> |
| <p> |
| In <cmdname>impala-shell</cmdname>, issue the <codeph>SUMMARY</codeph> command immediately after the |
| query is complete, to see just the summary information. If you detect issues involving skew, you might |
| switch to issuing the <codeph>PROFILE</codeph> command, which displays the summary information followed |
| by a detailed performance analysis. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| In the Impala debug web UI, click on the <uicontrol>Profile</uicontrol> link associated with the query after it is |
| complete. The executive summary information is displayed early in the profile output. |
| </p> |
| </li> |
| </ul> |
| |
| <p> |
| For each phase of the query, you see an <uicontrol>Avg Time</uicontrol> and a <uicontrol>Max Time</uicontrol> |
| value, along with <uicontrol>#Hosts</uicontrol> indicating how many hosts are involved in that query phase. |
| For all the phases with <uicontrol>#Hosts</uicontrol> greater than one, look for cases where the maximum time |
| is substantially greater than the average time. Focus on the phases that took the longest, for example, those |
| taking multiple seconds rather than milliseconds or microseconds. |
| </p> |
| |
| <p> |
| If you detect that some hosts take longer than others, first rule out non-Impala causes. One reason that some |
| hosts could be slower than others is if those hosts have less capacity than the others, or if they are |
| substantially busier due to unevenly distributed non-Impala workloads: |
| </p> |
| |
| <ul> |
| <li> |
| <p> |
| For clusters running Impala, keep the relative capacities of all hosts roughly equal. Any cost savings |
| from including some underpowered hosts in the cluster will likely be outweighed by poor or uneven |
| performance, and the time spent diagnosing performance issues. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| If non-Impala workloads cause slowdowns on some hosts but not others, use the appropriate load-balancing |
| techniques for the non-Impala components to smooth out the load across the cluster. |
| </p> |
| </li> |
| </ul> |
| |
| <p> |
| If the hosts on your cluster are evenly powered and evenly loaded, examine the detailed profile output to |
| determine which host is taking longer than others for the query phase in question. Examine how many bytes are |
| processed during that phase on that host, how much memory is used, and how many bytes are transmitted across |
| the network. |
| </p> |
| |
| <p> |
| The most common symptom is a higher number of bytes read on one host than others, due to one host being |
| requested to process a higher number of HDFS data blocks. This condition is more likely to occur when the |
| number of blocks accessed by the query is relatively small. For example, if you have a 10-node cluster and |
| the query processes 10 HDFS blocks, each node might not process exactly one block. If one node sits idle |
| while another node processes two blocks, the query could take twice as long as if the data was perfectly |
| distributed. |
| </p> |
| |
| <p> |
| Possible solutions in this case include: |
| </p> |
| |
| <ul> |
| <li> |
| <p> |
| If the query is artificially small, perhaps for benchmarking purposes, scale it up to process a larger |
| data set. For example, if some nodes read 10 HDFS data blocks while others read 11, the overall effect of |
| the uneven distribution is much lower than when some nodes did twice as much work as others. As a |
| guideline, aim for a <q>sweet spot</q> where each node reads 2 GB or more from HDFS per query. Queries |
| that process lower volumes than that could experience inconsistent performance that smooths out as |
| queries become more data-intensive. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| If the query processes only a few large blocks, so that many nodes sit idle and cannot help to |
| parallelize the query, consider reducing the overall block size. For example, you might adjust the |
| <codeph>PARQUET_FILE_SIZE</codeph> query option before copying or converting data into a Parquet table. |
| Or you might adjust the granularity of data files produced earlier in the ETL pipeline by non-Impala |
| components. In Impala 2.0 and later, the default Parquet block size is 256 MB, reduced from 1 GB, to |
| improve parallelism for common cluster sizes and data volumes. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| Reduce the amount of compression applied to the data. For text data files, the highest degree of |
| compression (gzip) produces unsplittable files that are more difficult for Impala to process in parallel, |
| and require extra memory during processing to hold the compressed and uncompressed data simultaneously. |
| For binary formats such as Parquet and Avro, compression can result in fewer data blocks overall, but |
| remember that when queries process relatively few blocks, there is less opportunity for parallel |
| execution and many nodes in the cluster might sit idle. Note that when Impala writes Parquet data with |
| the query option <codeph>COMPRESSION_CODEC=NONE</codeph> enabled, the data is still typically compact due |
| to the encoding schemes used by Parquet, independent of the final compression step. |
| </p> |
| </li> |
| </ul> |
| </conbody> |
| </concept> |