| <html> |
| |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <body> |
| This package consists of 3 map/reduce applications for Hadoop to |
| compete in the annual <a |
| href="http://www.hpl.hp.com/hosted/sortbenchmark" target="_top">terabyte sort</a> |
| competition. |
| |
| <ul> |
| <li><b>TeraGen</b> is a map/reduce program to generate the data. |
| <li><b>TeraSort</b> samples the input data and uses map/reduce to |
| sort the data into a total order. |
| <li><b>TeraValidate</b> is a map/reduce program that validates the |
| output is sorted. |
| </ul> |
| |
| <p> |
| |
| <b>TeraGen</b> generates output data that is byte for byte |
| equivalent to the C version including the newlines and specific |
| keys. It divides the desired number of rows by the desired number of |
| tasks and assigns ranges of rows to each map. The map jumps the random |
| number generator to the correct value for the first row and generates |
| the following rows. |
| |
| <p> |
| |
| <b>TeraSort</b> is a standard map/reduce sort, except for a custom |
| partitioner that uses a sorted list of <i>N-1</i> sampled keys that define |
| the key range for each reduce. In particular, all keys such that |
| <i>sample[i-1] <= key < sample[i]</i> are sent to reduce |
| <i>i</i>. This guarantees that the output of reduce <i>i</i> are all |
| less than the output of reduce <i>i+1</i>. To speed up the |
| partitioning, the partitioner builds a two level trie that quickly |
| indexes into the list of sample keys based on the first two bytes of |
| the key. TeraSort generates the sample keys by sampling the input |
| before the job is submitted and writing the list of keys into HDFS. |
| The input and output format, which are used by all 3 applications, |
| read and write the text files in the right format. The output of the |
| reduce has replication set to 1, instead of the default 3, because the |
| contest does not require the output data be replicated on to multiple |
| nodes. |
| |
| <p> |
| |
| <b>TeraValidate</b> ensures that the output is globally sorted. It |
| creates one map per a file in the output directory and each map ensures that |
| each key is less than or equal to the previous one. The map also generates |
| records with the first and last keys of the file and the reduce |
| ensures that the first key of file <i>i</i> is greater that the last key of |
| file <i>i-1</i>. Any problems are reported as output of the reduce with the |
| keys that are out of order. |
| |
| <p> |
| |
| In May 2008, Owen O'Malley ran this code on a 910 node cluster and |
| sorted the 10 billion records (1 TB) in 209 seconds (3.48 minutes) to |
| win the annual general purpose (daytona) |
| <a href="http://www.hpl.hp.com/hosted/sortbenchmark/">terabyte sort |
| benchmark</a>. |
| |
| <p> |
| |
| The cluster statistics were: |
| <ul> |
| <li> 910 nodes |
| <li> 4 dual core Xeons @ 2.0ghz per a node |
| <li> 4 SATA disks per a node |
| <li> 8G RAM per a node |
| <li> 1 gigabit ethernet on each node |
| <li> 40 nodes per a rack |
| <li> 8 gigabit ethernet uplinks from each rack to the core |
| <li> Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18) |
| <li> Sun Java JDK 1.6.0_05-b13 |
| </ul> |
| |
| <p> |
| |
| The test was on Hadoop trunk (pre-0.18) patched with <a |
| href="http://issues.apache.org/jira/browse/HADOOP-3443">HADOOP-3443</a> |
| and <a |
| href="http://issues.apache.org/jira/browse/HADOOP-3446">HADOOP-3446</a>, |
| which were required to remove intermediate writes to disk. |
| TeraGen used |
| 1800 tasks to generate a total of 10 billion rows in HDFS, with a |
| block size of 1024 MB. |
| TeraSort was configured with 1800 maps and 1800 reduces, and |
| <i>mapreduce.task.io.sort.mb</i>, |
| <i>mapreduce.task.io.sort.factor</i>, |
| <i>fs.inmemory.size.mb</i>, and task heap size |
| sufficient that transient data was never spilled to disk, other at the |
| end of the map. The sampler looked at 100,000 keys to determine the |
| reduce boundaries, which lead to imperfect balancing with reduce |
| outputs ranging from 337 MB to 872 MB. |
| |
| </body> |
| </html> |