blob: 2a47e684708dc332a395149a0325dbc676c2d64d [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Hadoop 1.0.4 Release Notes</title>
<STYLE type="text/css">
H1 {font-family: sans-serif}
H2 {font-family: sans-serif; margin-left: 7mm}
TABLE {margin-left: 7mm}
<h1>Hadoop 1.0.4 Release Notes</h1>
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
<a name="changes"/>
<h2>Changes since Hadoop 1.0.3</h2>
<h3>Jiras with Release Notes (describe major or incompatible changes)</h3>
<h3>Other Jiras (describe bug fixes and minor changes)</h3>
<li> <a href="">HADOOP-7154</a>.
Minor improvement reported by tlipcon and fixed by tlipcon (scripts)<br>
<b>Should set MALLOC_ARENA_MAX in</b><br>
<blockquote>New versions of glibc present in RHEL6 include a new arena allocator design. In several clusters we&apos;ve seen this new allocator cause huge amounts of virtual memory to be used, since when multiple threads perform allocations, they each get their own memory arena. On a 64-bit system, these arenas are 64M mappings, and the maximum number of arenas is 8 times the number of cores. We&apos;ve observed a DN process using 14GB of vmem for only 300M of resident set. This causes all kinds of nasty issues fo...</blockquote></li>
<li> <a href="">HDFS-3652</a>.
Blocker bug reported by tlipcon and fixed by tlipcon (name-node)<br>
<b>1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name</b><br>
<blockquote>In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams trying to find the stream corresponding to a given dir. To check equality, we currently use the following condition:<br>{code}<br> File parentDir = getStorageDirForStream(idx);<br> if (parentDir.getName().equals(sd.getRoot().getName())) {<br>{code}<br>... which is horribly incorrect. If two or more storage dirs happen to have the same terminal path component (eg /data/1/nn and /data/2/nn) then it will pick the wrong strea...</blockquote></li>
<li> <a href="">MAPREDUCE-4399</a>.
Major bug reported by vicaya and fixed by vicaya (performance, tasktracker)<br>
<b>Fix performance regression in shuffle </b><br>
<blockquote>There is a significant (up to 3x) performance regression in shuffle (vs 0.20.2) in the Hadoop 1.x series. Most noticeable with high-end switches.</blockquote></li>
<h2>Changes since Hadoop 1.0.2</h2>
<h3>Jiras with Release Notes (describe major or incompatible changes)</h3>
<li> <a href="">HADOOP-5528</a>.
Major new feature reported by klbostee and fixed by klbostee <br>
<b>Binary partitioner</b><br>
<blockquote> New BinaryPartitioner that partitions BinaryComparable keys by hashing a configurable part of the bytes array corresponding to the key.
<li> <a href="">HADOOP-8352</a>.
Major improvement reported by owen.omalley and fixed by owen.omalley <br>
<b>We should always generate a new configure script for the c++ code</b><br>
<blockquote>If you are compiling c++, the configure script will now be automatically regenerated as it should be.<br>This requires autoconf version 2.61 or greater.</blockquote></li>
<li> <a href="">MAPREDUCE-4017</a>.
Trivial improvement reported by knoguchi and fixed by tgraves (jobhistoryserver, jobtracker)<br>
<b>Add jobname to jobsummary log</b><br>
<blockquote> The Job Summary log may contain commas in values that are escaped by a &#39;\&#39; character. This was true before, but is more likely to be exposed now.
<h3>Other Jiras (describe bug fixes and minor changes)</h3>
<li> <a href="">HADOOP-6924</a>.
Major bug reported by wattsteve and fixed by devaraj <br>
<b>Build fails with non-Sun JREs due to different pathing to the operating system architecture shared libraries</b><br>
<blockquote>The src/native/configure script used to build the native libraries has an environment variable called JNI_LDFLAGS which is set as follows:<br><br>JNI_LDFLAGS=&quot;-L$JAVA_HOME/jre/lib/$OS_ARCH/server&quot;<br><br>This pathing convention to the shared libraries for the operating system architecture is unique to Oracle/Sun Java and thus on other flavors of Java the path will not exist and will result in a build failure with the following exception:<br><br> [exec] gcc -shared ../src/org/apache/hadoop/io/compress/zlib...</blockquote></li>
<li> <a href="">HADOOP-6941</a>.
Major bug reported by wattsteve and fixed by devaraj <br>
<b>Support non-SUN JREs in UserGroupInformation</b><br>
<blockquote>Attempting to format the namenode or attempting to start Hadoop using Apache Harmony or the IBM Java JREs results in the following exception:<br><br>10/09/07 16:35:05 ERROR namenode.NameNode: java.lang.NoClassDefFoundError:<br> at;clinit&gt;(<br> at java.lang.J9VMInternals.initializeImpl(Native Method)<br> at java.lang.J9VMInternals.initialize(<br> at org.apache.hadoop.hdfs.ser...</blockquote></li>
<li> <a href="">HADOOP-6963</a>.
Critical bug reported by owen.omalley and fixed by raviprak (fs)<br>
<b>Fix FileUtil.getDU. It should not include the size of the directory or follow symbolic links</b><br>
<blockquote>The getDU method should not include the size of the directory. The Java interface says that the value is undefined and in Linux/Sun it gets the 4096 for the inode. Clearly this isn&apos;t useful.<br>It also recursively calls itself. In case the directory has a symbolic link forming a cycle, getDU keeps spinning in the cycle. In our case, we saw this in the org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects call. This prevented other tasks on the same node from committing, causing the T...</blockquote></li>
<li> <a href="">HADOOP-7381</a>.
Major bug reported by jrottinghuis and fixed by jrottinghuis (build)<br>
<b>FindBugs OutOfMemoryError</b><br>
<blockquote>When running the findbugs target from Jenkins, I get an OutOfMemory error.<br>The &quot;effort&quot; in FindBugs is set to Max which ends up using a lot of memory to go through all the classes. The jvmargs passed to FindBugs is hardcoded to 512 MB max.<br><br>We can leave the default to 512M, as long as we pass this as an ant parameter which can be overwritten in individual cases through -D, or in the file (either basedir, or user&apos;s home directory).<br></blockquote></li>
<li> <a href="">HADOOP-8027</a>.
Minor improvement reported by qwertymaniac and fixed by atm (metrics)<br>
<b>Visiting /jmx on the daemon web interfaces may print unnecessary error in logs</b><br>
<blockquote>Logs that follow a {{/jmx}} servlet visit:<br><br>{code}<br>11/11/22 12:09:52 ERROR jmx.JMXJsonServlet: getting attribute UsageThreshold of java.lang:type=MemoryPool,name=Par Eden Space threw an exception<br> java.lang.UnsupportedOperationException: Usage threshold is not supported<br> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(<br>...<br>{code}</blockquote></li>
<li> <a href="">HADOOP-8151</a>.
Major bug reported by tlipcon and fixed by mattf (io, native)<br>
<b>Error handling in snappy decompressor throws invalid exceptions</b><br>
<blockquote>SnappyDecompressor.c has the following code in a few places:<br>{code}<br> THROW(env, &quot;Ljava/lang/InternalError&quot;, &quot;Could not decompress data. Buffer length is too small.&quot;);<br>{code}<br>this is incorrect, though, since the THROW macro doesn&apos;t need the &quot;L&quot; before the class name. This results in a ClassNotFoundException for Ljava.lang.InternalError being thrown, instead of the intended exception.</blockquote></li>
<li> <a href="">HADOOP-8188</a>.
Major improvement reported by devaraj and fixed by devaraj <br>
<b>Fix the build process to do with jsvc, with IBM&apos;s JDK as the underlying jdk</b><br>
<blockquote>When IBM JDK is used as the underlying JDK for the build process, the build of jsvc fails. I just needed to add an extra &quot;os arch&quot; expression in the condition that sets os-arch.</blockquote></li>
<li> <a href="">HADOOP-8251</a>.
Blocker bug reported by tlipcon and fixed by tlipcon (security)<br>
<b>SecurityUtil.fetchServiceTicket broken after HADOOP-6941</b><br>
<blockquote>HADOOP-6941 replaced direct references to some classes with reflective access so as to support other JDKs. Unfortunately there was a mistake in the name of the Krb5Util class, which broke fetchServiceTicket. This manifests itself as the inability to run checkpoints or other krb5-SSL HTTP-based transfers:<br><br>java.lang.ClassNotFoundException:</blockquote></li>
<li> <a href="">HADOOP-8293</a>.
Major bug reported by owen.omalley and fixed by owen.omalley (build)<br>
<b>The native library&apos;s doesn&apos;t include JNI path</b><br>
<blockquote>When compiling on centos 6, I get the following error when compiling the native library:<br><br>{code}<br> [exec] /usr/bin/ld: cannot find -ljvm<br>{code}<br><br>The problem is simply that the libhadoop_la_LDFLAGS doesn&apos;t include AM_LDFLAGS.</blockquote></li>
<li> <a href="">HADOOP-8294</a>.
Critical bug reported by kihwal and fixed by kihwal (ipc)<br>
<b>IPC Connection becomes unusable even if server address was temporarilly unresolvable</b><br>
<blockquote>This is same as HADOOP-7428, but was observed on 1.x data nodes. This can happen more frequently after HADOOP-7472, which allows IPC Connection to re-resolve the name. HADOOP-7428 needs to be back-ported.</blockquote></li>
<li> <a href="">HADOOP-8338</a>.
Major bug reported by owen.omalley and fixed by owen.omalley (security)<br>
<b>Can&apos;t renew or cancel HDFS delegation tokens over secure RPC</b><br>
<blockquote>The fetchdt tool is failing for secure deployments when given --renew or --cancel on tokens fetched using RPC. (The tokens fetched over HTTP can be renewed and canceled fine.)</blockquote></li>
<li> <a href="">HADOOP-8346</a>.
Blocker bug reported by tucu00 and fixed by devaraj (security)<br>
<b>Changes to support Kerberos with non Sun JVM (HADOOP-6941) broke SPNEGO</b><br>
<blockquote>before HADOOP-6941 hadoop-auth testcases with Kerberos ON pass, *mvn test -PtestKerberos*<br><br>after HADOOP-6941 the tests fail with the error below.<br><br>Doing some IDE debugging I&apos;ve found out that the changes in HADOOP-6941 are making the JVM Kerberos libraries to append an extra element to the kerberos principal of the server (on the client side when creating the token) so *HTTP/localhost* ends up being *HTTP/localhost/localhost*. Then, when contacting the KDC to get the granting ticket, the serv...</blockquote></li>
<li> <a href="">HDFS-119</a>.
Major bug reported by shv and fixed by sureshms (name-node)<br>
<b>logSync() may block NameNode forever.</b><br>
<blockquote># {{FSEditLog.logSync()}} first waits until {{isSyncRunning}} is false and then performs syncing to file streams by calling {{EditLogOutputStream.flush()}}.<br>If an exception is thrown after {{isSyncRunning}} is set to {{true}} all threads will always wait on this condition.<br>An {{IOException}} may be thrown by {{EditLogOutputStream.setReadyToFlush()}} or a {{RuntimeException}} may be thrown by {{EditLogOutputStream.flush()}} or by {{processIOError()}}.<br># The loop that calls {{eStream.flush()}} ...</blockquote></li>
<li> <a href="">HDFS-1041</a>.
Major bug reported by szetszwo and fixed by szetszwo (hdfs client)<br>
<b>DFSClient does not retry in getFileChecksum(..)</b><br>
<blockquote>If connection to the first datanode fails, DFSClient does not retry in getFileChecksum(..).</blockquote></li>
<li> <a href="">HDFS-3061</a>.
Blocker bug reported by alex.holmes and fixed by kihwal (name-node)<br>
<b>Cached directory size in INodeDirectory can get permantently out of sync with computed size, causing quota issues</b><br>
<blockquote>It appears that there&apos;s a condition under which a HDFS directory with a space quota set can get to a point where the cached size for the directory can permanently differ from the computed value. When this happens the following command:<br><br>{code}<br>hadoop fs -count -q /tmp/quota-test<br>{code}<br><br>results in the following output in the NameNode logs:<br><br>{code}<br>WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory quota-test. Cached: 6000 Computed: 6072<br>{code}<br><br>I&apos;ve ob...</blockquote></li>
<li> <a href="">HDFS-3127</a>.
Major bug reported by brandonli and fixed by brandonli (name-node)<br>
<b>failure in recovering removed storage directories should not stop checkpoint process</b><br>
<blockquote>When a restore fails, rollEditLog() also fails even if there are healthy directories. Any exceptions from recovering the removed directories should not fail checkpoint process.</blockquote></li>
<li> <a href="">HDFS-3265</a>.
Major bug reported by kumarr and fixed by kumarr (build)<br>
<b>PowerPc Build error.</b><br>
<blockquote>When attempting to build branch-1, the following error is seen and ant exits.<br>[exec] configure: error: Unsupported CPU architecture &quot;powerpc64&quot;<br><br>The following command was used to build hadoop-common<br><br>ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true -Dforrest.home=$FORREST_HOME compile-core-native compile-c++ compile-c++-examples task-controller tar record-parser compile-hdfs-classes package -Djava5.home=/opt/ibm/ibm-java2-ppc64-50/ </blockquote></li>
<li> <a href="">HDFS-3310</a>.
Major bug reported by cmccabe and fixed by cmccabe <br>
<b>Make sure that we abort when no edit log directories are left</b><br>
<blockquote>We should make sure to abort when there are no edit log directories left to write to. It seems that there is at least one case that is slipping through the cracks right now in branch-1.</blockquote></li>
<li> <a href="">HDFS-3374</a>.
Major bug reported by owen.omalley and fixed by owen.omalley (name-node)<br>
<b>hdfs&apos; TestDelegationToken fails intermittently with a race condition</b><br>
<blockquote>The testcase is failing because the MiniDFSCluster is shutdown before the secret manager can change the key, which calls system.exit with no edit streams available.<br><br>{code}<br><br> [junit] 2012-05-04 15:03:51,521 WARN common.Storage ( - Removing storage dir /home/horton/src/hadoop/build/test/data/dfs/name1<br> [junit] 2012-05-04 15:03:51,522 FATAL namenode.FSNamesystem ( - No edit streams are accessible<br> [junit] java.lang.Exce...</blockquote></li>
<li> <a href="">MAPREDUCE-1238</a>.
Major bug reported by rramya and fixed by tgraves (jobtracker)<br>
<b>mapred metrics shows negative count of waiting maps and reduces </b><br>
<blockquote>Negative waiting_maps and waiting_reduces count is observed in the mapred metrics</blockquote></li>
<li> <a href="">MAPREDUCE-3377</a>.
Major bug reported by jxchen and fixed by jxchen <br>
<b>Compatibility issue with 0.20.203.</b><br>
<blockquote>I have an OutputFormat which implements Configurable. I set new config entries to a job configuration during checkOutputSpec() so that the tasks will get the config entries through the job configuration. This works fine in 0.20.2, but stopped working starting from 0.20.203. With 0.20.203, my OutputFormat still has the configuration set, but the copy a task gets does not have the new entries that are set as part of checkOutputSpec(). <br><br>I believe that the problem is with JobClient. The job...</blockquote></li>
<li> <a href="">MAPREDUCE-3857</a>.
Major bug reported by jeagles and fixed by jeagles (examples)<br>
<b>Grep example ignores</b><br>
<blockquote>Grep example creates two jobs as part of its implementation. The first job correctly uses the configuration settings. The second job ignores configuration settings.</blockquote></li>
<li> <a href="">MAPREDUCE-4003</a>.
Major bug reported by zaozaowang and fixed by knoguchi (task-controller, tasktracker)<br>
<b>log.index (No such file or directory) AND Task process exit with nonzero status of 126</b><br>
<blockquote>hello?I have dwelled on this hadoop(cdhu3) problem for 2 days,I have tried every google method.This is the issue: when ran hadoop example &quot;wordcount&quot; ,the tasktracker&apos;s log in one slave node presented such errors<br><br> 1.WARN org.apache.hadoop.mapred.DefaultTaskController: Task wrapper stderr: bash: /var/tmp/mapred/local/ttprivate/taskTracker/hdfs/jobcache/job_201203131751_0003/attempt_201203131751_0003_m_000006_0/ Permission denied<br><br>2.WARN org.apache.hadoop.mapred.TaskRunner: attempt_...</blockquote></li>
<li> <a href="">MAPREDUCE-4012</a>.
Minor bug reported by knoguchi and fixed by tgraves <br>
<b>Hadoop Job setup error leaves no useful info to users (when LinuxTaskController is used)</b><br>
<blockquote>When distributed cache pull fail on the TaskTracker, job webUI only shows <br>{noformat}<br>Job initialization failed (255)<br>{noformat}<br>leaving users confused. <br><br>On the TaskTracker log, there is a log with useful info <br>{noformat}<br>2012-03-14 21:44:17,083 INFO org.apache.hadoop.mapred.TaskController: <br>Permission denied: user=user1, access=READ, inode=&quot;testfile&quot;:user3:users:rw-------<br>...<br>2012-03-14 21...</blockquote></li>
<li> <a href="">MAPREDUCE-4154</a>.
Major bug reported by thejas and fixed by devaraj <br>
<b>streaming MR job succeeds even if the streaming command fails</b><br>
<blockquote>Hadoop 1.0.1 behaves as expected - The task fails for streaming MR job if the streaming command fails. But it succeeds in hadoop 1.0.2 .<br></blockquote></li>
<li> <a href="">MAPREDUCE-4207</a>.
Major bug reported by kihwal and fixed by kihwal (mrv1)<br>
<b>Remove System.out.println() in FileInputFormat</b><br>
<blockquote>MAPREDUCE-3607 accidentally left the println statement. </blockquote></li>
<h2>Changes since Hadoop 1.0.1</h2>
<h3>Jiras with Release Notes (describe major or incompatible changes)</h3>
<li> <a href="">HADOOP-1722</a>.
Major improvement reported by runping and fixed by klbostee <br>
<b>Make streaming to handle non-utf8 byte array</b><br>
<blockquote> Streaming allows binary (or other non-UTF8) streams.
<li> <a href="">MAPREDUCE-3851</a>.
Major bug reported by kihwal and fixed by tgraves (tasktracker)<br>
<b>Allow more aggressive action on detection of the jetty issue</b><br>
<blockquote> added new configuration variables to control when TT aborts if it sees a certain number of exceptions: <br/>
&nbsp;&nbsp;&nbsp;&nbsp;// Percent of shuffle exceptions (out of sample size) seen before it&#39;s <br/>
&nbsp;&nbsp;&nbsp;&nbsp;// fatal - acceptable values are from 0 to 1.0, 0 disables the check. <br/>
&nbsp;&nbsp;&nbsp;&nbsp;// ie. 0.3 = 30% of the last X number of requests matched the exception, <br/>
&nbsp;&nbsp;&nbsp;&nbsp;// so abort. <br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;conf.getFloat( <br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal&quot;, 0); <br/>
&nbsp;&nbsp;&nbsp;&nbsp;// The number of trailing requests we track, used for the fatal <br/>
&nbsp;&nbsp;&nbsp;&nbsp;// limit calculation <br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;conf.getInt(&quot;mapreduce.reduce.shuffle.catch.exception.sample.size&quot;, 1000);
<h3>Other Jiras (describe bug fixes and minor changes)</h3>
<li> <a href="">HADOOP-5450</a>.
Blocker improvement reported by klbostee and fixed by klbostee <br>
<b>Add support for application-specific typecodes to typed bytes</b><br>
<blockquote>For serializing objects of types that are not supported by typed bytes serialization, applications might want to use a custom serialization format. Right now, typecode 0 has to be used for the bytes resulting from this custom serialization, which could lead to problems when deserializing the objects because the application cannot know if a byte sequence following typecode 0 is a customly serialized object or just a raw sequence of bytes. Therefore, a range of typecodes that are treated as ali...</blockquote></li>
<li> <a href="">HADOOP-7206</a>.
Major new feature reported by eli and fixed by tucu00 <br>
<b>Integrate Snappy compression</b><br>
<blockquote>Google release Zippy as an open source (APLv2) project called Snappy ( This tracks integrating it into Hadoop.<br><br>{quote}<br>Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed ...</blockquote></li>
<li> <a href="">HADOOP-8050</a>.
Major bug reported by kihwal and fixed by kihwal (metrics)<br>
<b>Deadlock in metrics</b><br>
<blockquote>The metrics serving thread and the periodic snapshot thread can deadlock.<br>It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven&apos;t look at the trunk too closely, but it might happen there too.</blockquote></li>
<li> <a href="">HADOOP-8088</a>.
Major bug reported by kihwal and fixed by (security)<br>
<b>User-group mapping cache incorrectly does negative caching on transient failures</b><br>
<blockquote>We&apos;ve seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don&apos;t think we want negative caching here, but even if we do, it should be intelligent eno...</blockquote></li>
<li> <a href="">HADOOP-8090</a>.
Major improvement reported by gkesavan and fixed by gkesavan <br>
<b>rename hadoop 64 bit rpm/deb package name</b><br>
<blockquote>change hadoop rpm/deb name from hadoop-&lt;version&gt;.amd64.rpm/deb hadoop-&lt;version&gt;.x86_64.rpm/deb </blockquote></li>
<li> <a href="">HADOOP-8132</a>.
Major bug reported by arpitgupta and fixed by arpitgupta <br>
<b>64bit secure datanodes do not start as the jsvc path is wrong</b><br>
<blockquote>64bit secure datanodes were looking for /usr/libexec/../libexec/jsvc. instead of /usr/libexec/../libexec/jsvc.amd64</blockquote></li>
<li> <a href="">HADOOP-8201</a>.
Blocker bug reported by gkesavan and fixed by gkesavan <br>
<b>create the configure script for native compilation as part of the build</b><br>
<blockquote>configure script is checked into svn and its not regenerated during build. Ideally configure scritp should not be checked into svn and instead should be generated during build using autoreconf.</blockquote></li>
<li> <a href="">HDFS-2701</a>.
Major improvement reported by eli and fixed by eli (name-node)<br>
<b>Cleanup FS* processIOError methods</b><br>
<blockquote>Let&apos;s rename the various &quot;processIOError&quot; methods to be more descriptive. The current code makes it difficult to identify and reason about bug fixes. While we&apos;re at it let&apos;s remove &quot;Fatal&quot; from the &quot;Unable to sync the edit log&quot; log since it&apos;s not actually a fatal error (this is confusing to users). And 2NN &quot;Checkpoint done&quot; should be info, not a warning (also confusing to users).<br><br>Thanks to HDFS-1073 these issues don&apos;t exist on trunk or 23.</blockquote></li>
<li> <a href="">HDFS-2702</a>.
Critical bug reported by eli and fixed by eli (name-node)<br>
<b>A single failed name dir can cause the NN to exit </b><br>
<blockquote>There&apos;s a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Here&apos;s the relevant code:<br><br>{code}<br>close() // So editStreams.size() is 0 <br>foreach edits dir {<br> ..<br> eStream = new ... // Might get an IOE here<br> editStreams.add(eStream);<br>} catch (IOException ioe) {<br> removeEditsForStorageDir(sd); // exits if editStreams.size() &lt;= 1 <br>}<br>{code}<br><br>If we get an IOException before we&apos;ve added two edits streams to the list we&apos;ll exit, eg if there&apos;s an ...</blockquote></li>
<li> <a href="">HDFS-2703</a>.
Major bug reported by eli and fixed by eli (name-node)<br>
<b>removedStorageDirs is not updated everywhere we remove a storage dir</b><br>
<blockquote>There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) where we remove a storage directory but don&apos;t add it to the removedStorageDirs list. This means a storage dir may have been removed but we don&apos;t see it in the log or Web UI. This doesn&apos;t affect trunk/23 since the code there is totally different.</blockquote></li>
<li> <a href="">HDFS-2978</a>.
Major new feature reported by atm and fixed by atm (name-node)<br>
<b>The NameNode should expose name dir statuses via JMX</b><br>
<blockquote>We currently display this info on the NN web UI, so users who wish to monitor this must either do it manually or parse HTML. We should publish this information via JMX.</blockquote></li>
<li> <a href="">HDFS-3006</a>.
Major bug reported by bcwalrus and fixed by szetszwo (name-node)<br>
<b>Webhdfs &quot;SETOWNER&quot; call returns incorrect content-type</b><br>
<blockquote>The SETOWNER call returns an empty body. But the header has &quot;Content-Type: application/json&quot;, which is a contradiction (empty string is not valid json). This appears to happen for SETTIMES and SETPERMISSION as well.</blockquote></li>
<li> <a href="">HDFS-3075</a>.
Major improvement reported by brandonli and fixed by brandonli (name-node)<br>
<b>Backport HADOOP-4885 to branch-1</b><br>
<blockquote>When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again. <br><br>The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can ...</blockquote></li>
<li> <a href="">HDFS-3101</a>.
Major bug reported by wangzw and fixed by szetszwo (hdfs client)<br>
<b>cannot read empty file using webhdfs</b><br>
<blockquote>STEP:<br>1, create a new EMPTY file<br>2, read it using webhdfs.<br><br>RESULT:<br>expected: get a empty file<br>I got: {&quot;RemoteException&quot;:{&quot;exception&quot;:&quot;IOException&quot;,&quot;javaClassName&quot;:&quot;;,&quot;message&quot;:&quot;Offset=0 out of the range [0, 0); OPEN, path=/testFile&quot;}}<br><br>First of all, [0, 0) is not a valid range, and I think read a empty file should be OK.</blockquote></li>
<li> <a href="">MAPREDUCE-764</a>.
Blocker bug reported by klbostee and fixed by klbostee (contrib/streaming)<br>
<b>TypedBytesInput&apos;s readRaw() does not preserve custom type codes</b><br>
<blockquote>The typed bytes format supports byte sequences of the form {{&lt;custom type code&gt; &lt;length&gt; &lt;bytes&gt;}}. When reading such a sequence via {{TypedBytesInput}}&apos;s {{readRaw()}} method, however, the returned sequence currently is {{0 &lt;length&gt; &lt;bytes&gt;}} (0 is the type code for a bytes array), which leads to bugs such as the one described [here|].</blockquote></li>
<li> <a href="">MAPREDUCE-3583</a>.
Critical bug reported by and fixed by <br>
<b>ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException</b><br>
<blockquote>HBase PreCommit builds frequently gave us NumberFormatException.<br><br>From<br>{code}<br>2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).<br>java.lang.NumberFormatException: For input string: &quot;18446743988060683582&quot;<br> at</blockquote></li>
<li> <a href="">MAPREDUCE-3773</a>.
Major new feature reported by owen.omalley and fixed by owen.omalley (jobtracker)<br>
<b>Add queue metrics with buckets for job run times</b><br>
<blockquote>It would be nice to have queue metrics that reflect the number of jobs in each queue that have been running for different ranges of time.<br><br>Reasonable time ranges are probably 0-1 hr, 1-5 hr, 5-24 hr, 24+ hrs; but they should be configurable.</blockquote></li>
<li> <a href="">MAPREDUCE-3824</a>.
Critical bug reported by aw and fixed by tgraves (distributed-cache)<br>
<b>Distributed caches are not removed properly</b><br>
<blockquote>Distributed caches are not being properly removed by the TaskTracker when they are expected to be expired. </blockquote></li>
<h2>Changes since Hadoop 1.0.0</h2>
<h3>Jiras with Release Notes (describe major or incompatible changes)</h3>
<li> <a href="">HADOOP-8009</a>.
Critical improvement reported by tucu00 and fixed by tucu00 (build)<br>
<b>Create hadoop-client and hadoop-minicluster artifacts for downstream projects </b><br>
<blockquote> Generate integration artifacts &quot;org.apache.hadoop:hadoop-client&quot; and &quot;org.apache.hadoop:hadoop-minicluster&quot; containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.
<li> <a href="">HADOOP-8037</a>.
Blocker bug reported by mattf and fixed by gkesavan (build)<br>
<b>Binary tarball does not preserve platform info for native builds, and RPMs fail to provide needed symlinks for</b><br>
<blockquote> This fix is marked &quot;incompatible&quot; only because it changes the bin-tarball directory structure to be consistent with the source tarball directory structure. The source tarball is unchanged. RPMs and DEBs now use an intermediate bin-tarball with an &quot;${os.arch}&quot; tag (like the packages themselves). The un-tagged bin-tarball is now multi-platform and retains the structure of the source tarball; it is in fact generated by target &quot;tar&quot;, not by target &quot;binary&quot;. Finally, in the 64-bit RPMs and DEBs, the native libs go in the &quot;lib64&quot; directory instead of &quot;lib&quot;.
<li> <a href="">MAPREDUCE-3184</a>.
Major improvement reported by tlipcon and fixed by tlipcon (jobtracker)<br>
<b>Improve handling of fetch failures when a tasktracker is not responding on HTTP</b><br>
<blockquote> The TaskTracker now has a thread which monitors for a known Jetty bug in which the selector thread starts spinning and map output can no longer be served. If the bug is detected, the TaskTracker will shut itself down. This feature can be disabled by setting mapred.tasktracker.jetty.cpu.check.enabled to false.
<h3>Other Jiras (describe bug fixes and minor changes)</h3>
<li> <a href="">HADOOP-7470</a>.
Minor improvement reported by and fixed by enis (util)<br>
<b>move up to Jackson 1.8.8</b><br>
<blockquote>I see that hadoop-core still depends on Jackson 1.0.1 -but that project is now up to 1.8.2 in releases. Upgrading will make it easier for other Jackson-using apps that are more up to date to keep their classpath consistent.<br><br>The patch would be updating the ivy file to pull in the later version; no test</blockquote></li>
<li> <a href="">HADOOP-7960</a>.
Major bug reported by gkesavan and fixed by mattf <br>
<b>Port HADOOP-5203 to branch-1, build version comparison is too restrictive</b><br>
<blockquote>hadoop services should not be using the build timestamp to verify version difference in the cluster installation. Instead it should use the source checksum as in HADOOP-5203.<br> </blockquote></li>
<li> <a href="">HADOOP-7964</a>.
Blocker bug reported by kihwal and fixed by daryn (security, util)<br>
<b>Deadlock in class init.</b><br>
<blockquote>After HADOOP-7808, client-side commands hang occasionally. There are cyclic dependencies in NetUtils and SecurityUtil class initialization. Upon initial look at the stack trace, two threads deadlock when they hit the either of class init the same time.</blockquote></li>
<li> <a href="">HADOOP-7987</a>.
Major improvement reported by devaraj and fixed by jnp (security)<br>
<b>Support setting the run-as user in unsecure mode</b><br>
<blockquote>Some applications need to be able to perform actions (such as launch MR jobs) from map or reduce tasks. In earlier unsecure versions of hadoop (20.x), it was possible to do this by setting in the configuration. But in 20.205 and 1.0, when running in unsecure mode, this does not work. (In secure mode, you can do this using the kerberos credentials).</blockquote></li>
<li> <a href="">HADOOP-7988</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>Upper case in hostname part of the principals doesn&apos;t work with kerberos.</b><br>
<blockquote>Kerberos doesn&apos;t like upper case in the hostname part of the principals.<br>This issue has been seen in 23 as well as 1.0.</blockquote></li>
<li> <a href="">HADOOP-8010</a>.
Minor bug reported by rvs and fixed by rvs (scripts)<br>
<b> spews error message when HADOOP_HOME_WARN_SUPPRESS is set to true and HADOOP_HOME is present</b><br>
<blockquote>Running hadoop daemon commands when HADOOP_HOME_WARN_SUPPRESS is set to true and HADOOP_HOME is present produces:<br>{noformat}<br> [: 76: true: unexpected operator<br>{noformat}</blockquote></li>
<li> <a href="">HADOOP-8052</a>.
Major bug reported by reznor and fixed by reznor (metrics)<br>
<b>Hadoop Metrics2 should emit Float.MAX_VALUE (instead of Double.MAX_VALUE) to avoid making Ganglia&apos;s gmetad core</b><br>
<blockquote>Ganglia&apos;s gmetad converts the doubles emitted by Hadoop&apos;s Metrics2 system to strings, and the buffer it uses is 256 bytes wide.<br><br>When the SampleStat.MinMax class (in org.apache.hadoop.metrics2.util) emits its default min value (currently initialized to Double.MAX_VALUE), it ends up causing a buffer overflow in gmetad, which causes it to core, effectively rendering Ganglia useless (for some, the core is continuous; for others who are more fortunate, it&apos;s only a one-time Hadoop-startup-time thi...</blockquote></li>
<li> <a href="">HDFS-2379</a>.
Critical bug reported by tlipcon and fixed by tlipcon (data-node)<br>
<b>0.20: Allow block reports to proceed without holding FSDataset lock</b><br>
<blockquote>As disks are getting larger and more plentiful, we&apos;re seeing DNs with multiple millions of blocks on a single machine. When page cache space is tight, block reports can take multiple minutes to generate. Currently, during the scanning of the data directories to generate a report, the FSVolumeSet lock is held. This causes writes and reads to block, timeout, etc, causing big problems especially for clients like HBase.<br><br>This JIRA is to explore some of the ideas originally discussed in HADOOP-458...</blockquote></li>
<li> <a href="">HDFS-2814</a>.
Minor improvement reported by hitesh and fixed by hitesh <br>
<b>NamenodeMXBean does not account for svn revision in the version information</b><br>
<blockquote>Unlike the jobtracker where both the UI and jmx information report the version as &quot;x.y.z, r&lt;svn revision&quot;, in case of the namenode, the UI displays x.y.z and svn revision info but the jmx output only contains the x.y.z version.</blockquote></li>
<li> <a href="">MAPREDUCE-3343</a>.
Major bug reported by ahmed.radwan and fixed by zhaoyunjiong (mrv1)<br>
<b>TaskTracker Out of Memory because of distributed cache</b><br>
<blockquote>This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. <br><br>Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in, this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so th...</blockquote></li>
<li> <a href="">MAPREDUCE-3607</a>.
Major improvement reported by tomwhite and fixed by tomwhite (client)<br>
<b>Port missing new API mapreduce lib classes to 1.x</b><br>
<blockquote>There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.<br><br>A few examples of where this would help:<br>* Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).<br>* Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).<br>* HBase has a backport of mapreduce.lib.partition.InputSampler ...</blockquote></li>
<h2>Changes since Hadoop</h2>
<h3>Jiras with Release Notes (describe major or incompatible changes)</h3>
<li> <a href="">HADOOP-7728</a>.
Major bug reported by rramya and fixed by rramya (conf)<br>
<b> should be modified to enable task memory manager</b><br>
<blockquote> Enable task memory management to be configurable via hadoop config setup script.
<li> <a href="">HADOOP-7740</a>.
Minor bug reported by arpitgupta and fixed by arpitgupta (conf)<br>
<b>security audit logger is not on by default, fix the log4j properties to enable the logger</b><br>
<blockquote> Fixed security audit logger configuration. (Arpit Gupta via Eric Yang)
<li> <a href="">HADOOP-7923</a>.
Major task reported by szetszwo and fixed by szetszwo (build, documentation)<br>
<b>Update doc versions from 0.20 to 1.0</b><br>
<blockquote> Docs version number is now automatically updated by reference to the build number.
<li> <a href="">HDFS-617</a>.
Major improvement reported by kzhang and fixed by kzhang (hdfs client, name-node)<br>
<b>Support for non-recursive create() in HDFS</b><br>
<blockquote> New DFSClient.create(...) allows option of not creating missing parent(s).
<li> <a href="">HDFS-2246</a>.
Major improvement reported by sanjay.radia and fixed by jnp <br>
<b>Shortcut a local client reads to a Datanodes files directly</b><br>
<blockquote> 1. New configurations <br/>
a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read. <br/>
b. is the key to enable short circuit read at the client side configuration. <br/>
c. is the key to bypass checksum check at the client side. <br/>
2. By default none of the above are enabled and short circuit read will not kick in. <br/>
3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general. <br/>
<li> <a href="">HDFS-2316</a>.
Major new feature reported by szetszwo and fixed by szetszwo <br>
<b>[umbrella] webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP</b><br>
<blockquote> Provide webhdfs as a complete FileSystem implementation for accessing HDFS over HTTP. <br/>
Previous hftp feature was a read-only FileSystem and does not provide &quot;write&quot; accesses.
<h3>Other Jiras (describe bug fixes and minor changes)</h3>
<li> <a href="">HADOOP-5124</a>.
Major improvement reported by hairong and fixed by hairong <br>
<b>A few optimizations to FsNamesystem#RecentInvalidateSets</b><br>
<blockquote>This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:<br>1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.<br>2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.</blockquote></li>
<li> <a href="">HADOOP-6840</a>.
Minor improvement reported by nspiegelberg and fixed by jnp (fs, io)<br>
<b>Support non-recursive create() in FileSystem &amp; SequenceFile.Writer</b><br>
<blockquote>The proposed solution for HBASE-2312 requires the sequence file to handle a non-recursive create. This is already supported by HDFS, but needs to have an equivalent FileSystem &amp; SequenceFile.Writer API.</blockquote></li>
<li> <a href="">HADOOP-6886</a>.
Minor improvement reported by nspiegelberg and fixed by (fs)<br>
<b>LocalFileSystem Needs createNonRecursive API</b><br>
<blockquote>While running sanity check tests for HBASE-2312, I noticed that HDFS-617 did not include createNonRecursive() support for the LocalFileSystem. This is a problem for HBase, which allows the user to run over the LocalFS instead of HDFS for local cluster testing. I think this only affects 0.20-append, but may affect the trunk based upon how exactly FileContext handles non-recursive creates.</blockquote></li>
<li> <a href="">HADOOP-7461</a>.
Major bug reported by rbodkin and fixed by gkesavan (build)<br>
<b>Jackson Dependency Not Declared in Hadoop POM</b><br>
<blockquote>(COMMENT: This bug still affects, four months after the bug was filed. This causes total failure, and the fix is trivial for whoever manages the POM -- just add the missing dependency! --ben)<br><br>This issue was identified and the fix &amp; workaround was documented at <br><br><br><br>The issue affects use of Hadoop from the Maven central repo. I built a job using that maven repo and ran it, resulting in this failure:<br><br>Exception in thread &quot;main&quot; ...</blockquote></li>
<li> <a href="">HADOOP-7664</a>.
Minor improvement reported by raviprak and fixed by raviprak (conf)<br>
<b>o.a.h.conf.Configuration complains of overriding final parameter even if the value with which its attempting to override is the same. </b><br>
<blockquote>o.a.h.conf.Configuration complains of overriding final parameter even if the value with which its attempting to override is the same. </blockquote></li>
<li> <a href="">HADOOP-7765</a>.
Major bug reported by eyang and fixed by eyang (build)<br>
<b>Debian package contain both system and tar ball layout</b><br>
<blockquote>When packaging is invoked as &quot;ant clean tar deb&quot;. The system creates both system layout and tarball layout in the same build directory. Debian packaging target would pick up files for both layouts. The end result of using produced debian package built this way, would end up README.txt LICENSE.txt, and jar files in /usr.</blockquote></li>
<li> <a href="">HADOOP-7784</a>.
Major bug reported by arpitgupta and fixed by eyang <br>
<b>secure datanodes fail to come up stating jsvc not found </b><br>
<blockquote>building 205.1 and trying to startup a secure dn leads to the following<br><br>/usr/libexec/../bin/hadoop: line 386: /usr/libexec/../libexec/jsvc.amd64: No such file or directory<br>/usr/libexec/../bin/hadoop: line 386: exec: /usr/libexec/../libexec/jsvc.amd64: cannot execute: No such file or directory</blockquote></li>
<li> <a href="">HADOOP-7804</a>.
Major improvement reported by arpitgupta and fixed by arpitgupta (conf)<br>
<b>enable hadoop config generator to set dfs.block.local-path-access.user to enable short circuit read</b><br>
<blockquote>we have a new config that allows to select which user can have access for short circuit read. We should make that configurable through the config generator scripts.</blockquote></li>
<li> <a href="">HADOOP-7815</a>.
Minor bug reported by rramya and fixed by rramya (conf)<br>
<b>Map memory mb is being incorrectly set by</b><br>
<blockquote>HADOOP-7728 enabled task memory management to be configurable in the However, the default value for is being set incorrectly.</blockquote></li>
<li> <a href="">HADOOP-7816</a>.
Major bug reported by davet and fixed by davet <br>
<b>Allow HADOOP_HOME deprecated warning suppression based on config specified in</b><br>
<blockquote>Move suppression check for &quot;Warning: $HADOOP_HOME is deprecated&quot; to after sourcing of so that people can set HADOOP_HOME_WARN_SUPPRESS inside the config.</blockquote></li>
<li> <a href="">HADOOP-7853</a>.
Blocker bug reported by daryn and fixed by daryn (security)<br>
<b>multiple javax security configurations cause conflicts</b><br>
<blockquote>Both UGI and the SPNEGO KerberosAuthenticator set the global javax security configuration. SPNEGO stomps on UGI&apos;s security config which leads to kerberos/SASL authentication errors.<br></blockquote></li>
<li> <a href="">HADOOP-7854</a>.
Critical bug reported by daryn and fixed by daryn (security)<br>
<b>UGI getCurrentUser is not synchronized</b><br>
<blockquote>Sporadic {{ConcurrentModificationExceptions}} are originating from {{UGI.getCurrentUser}} when it needs to create a new instance. The problem was specifically observed in a JT under heavy load when a post-job cleanup is accessing the UGI while a new job is being processed.</blockquote></li>
<li> <a href="">HADOOP-7865</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>Test Failures in 1.0.0 hdfs/common</b><br>
<blockquote>Following tests in hdfs and common are failing<br>1. TestFileAppend2<br>2. TestFileConcurrentReader<br>3. TestDoAsEffectiveUser </blockquote></li>
<li> <a href="">HADOOP-7869</a>.
Critical bug reported by owen.omalley and fixed by owen.omalley (scripts)<br>
<b>HADOOP_HOME warning happens all of the time</b><br>
<blockquote>With HADOOP-7816, the check for HADOOP_HOME has moved after it is set by hadoop-config so that it always happens unless HADOOP_HOME_WARN_SUPPRESS is set in hadoop-env or the environment.</blockquote></li>
<li> <a href="">HDFS-611</a>.
Major bug reported by dhruba and fixed by zshao (data-node)<br>
<b>Heartbeats times from Datanodes increase when there are plenty of blocks to delete</b><br>
<blockquote>I am seeing that when we delete a large directory that has plenty of blocks, the heartbeat times from datanodes increase significantly from the normal value of 3 seconds to as large as 50 seconds or so. The heartbeat thread in the Datanode deletes a bunch of blocks sequentially, this causes the heartbeat times to increase.</blockquote></li>
<li> <a href="">HDFS-1257</a>.
Major bug reported by rvadali and fixed by eepayne (name-node)<br>
<b>Race condition on FSNamesystem#recentInvalidateSets introduced by HADOOP-5124</b><br>
<blockquote>HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException.</blockquote></li>
<li> <a href="">HDFS-1943</a>.
Blocker bug reported by weiyj and fixed by mattf (scripts)<br>
<b>fail to start datanode while is executed by root user</b><br>
<blockquote>When is run by root user, we got the following error message:<br>#<br>Starting namenodes on [localhost ]<br>localhost: namenode running as process 2556. Stop it first.<br>localhost: starting datanode, logging to /usr/hadoop/hadoop-common-0.23.0-SNAPSHOT/bin/../logs/hadoop-root-datanode-cspf01.out<br>localhost: Unrecognized option: -jvm<br>localhost: Could not create the Java virtual machine.<br><br>The -jvm options should be passed to jsvc when we starting a secure<br>datanode, but it still pa...</blockquote></li>
<li> <a href="">HDFS-2065</a>.
Major bug reported by bharathm and fixed by umamaheswararao <br>
<b>Fix NPE in DFSClient.getFileChecksum</b><br>
<blockquote>The following code can throw NPE if callGetBlockLocations returns null.<br><br>If server returns null <br><br>{code}<br> List&lt;LocatedBlock&gt; locatedblocks<br> = callGetBlockLocations(namenode, src, 0, Long.MAX_VALUE).getLocatedBlocks();<br>{code}<br><br>The right fix for this is server should throw right exception.<br><br></blockquote></li>
<li> <a href="">HDFS-2346</a>.
Blocker bug reported by umamaheswararao and fixed by lakshman (test)<br>
<b>TestHost2NodesMap &amp; TestReplicasMap will fail depending upon execution order of test methods</b><br>
<li> <a href="">HDFS-2416</a>.
Major sub-task reported by arpitgupta and fixed by jnp <br>
<b>distcp with a webhdfs uri on a secure cluster fails</b><br>
<li> <a href="">HDFS-2424</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs liststatus json does not convert to a valid xml document</b><br>
<li> <a href="">HDFS-2427</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs mkdirs api call creates path with 777 permission, we should default it to 755</b><br>
<li> <a href="">HDFS-2428</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs api parameter validation should be better</b><br>
<blockquote>PUT Request: http://localhost:50070/webhdfs/some_path?op=MKDIRS&amp;permission=955<br><br>Exception returned<br><br><br>HTTP/1.1 500 Internal Server Error<br>{&quot;RemoteException&quot;:{&quot;className&quot;:&quot;com.sun.jersey.api.ParamException$QueryParamException&quot;,&quot;message&quot;:&quot;java.lang.NumberFormatException: For input string: \&quot;955\&quot;&quot;}} <br><br><br>We should return a 400 with appropriate error message</blockquote></li>
<li> <a href="">HDFS-2432</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs setreplication api should return a 403 when called on a directory</b><br>
<blockquote>Currently the set replication api on a directory leads to a 200.<br><br>Request URI http://NN:50070/webhdfs/tmp/webhdfs_data/dir_replication_tests?op=SETREPLICATION&amp;replication=5<br>Request Method: PUT<br>Status Line: HTTP/1.1 200 OK<br>Response Content: {&quot;boolean&quot;:false}<br><br>Since we can determine that this call did not succeed (boolean=false) we should rather just return a 403</blockquote></li>
<li> <a href="">HDFS-2439</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs open an invalid path leads to a 500 which states a npe, we should return a 404 with appropriate error message</b><br>
<li> <a href="">HDFS-2441</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs returns two content-type headers</b><br>
<blockquote>$ curl -i &quot;http://localhost:50070/webhdfs/path?op=GETFILESTATUS&quot;<br>HTTP/1.1 200 OK<br>Content-Type: text/html; charset=utf-8<br>Expires: Thu, 01-Jan-1970 00:00:00 GMT<br>........<br>Content-Type: application/json<br>Transfer-Encoding: chunked<br>Server: Jetty(6.1.26)<br><br><br>It should only return one content type header = application/json</blockquote></li>
<li> <a href="">HDFS-2450</a>.
Major bug reported by rajsaha and fixed by daryn <br>
<b>Only complete hostname is supported to access data via hdfs://</b><br>
<blockquote>If my complete hostname is, only complete hostname must be used to access data via hdfs://<br><br>I am running following in .20.205 Client to get data from .20.205 NN (host1)<br>$hadoop dfs -copyFromLocal /etc/passwd hdfs://host1/tmp<br>copyFromLocal: Wrong FS: hdfs://host1/tmp, expected: hdfs://<br>Usage: java FsShell [-copyFromLocal &lt;localsrc&gt; ... &lt;dst&gt;]<br><br>$hadoop dfs -copyFromLocal /etc/passwd hdfs://<br>copyFromLocal: Wrong FS: hdfs://, exp...</blockquote></li>
<li> <a href="">HDFS-2453</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>tail using a webhdfs uri throws an error</b><br>
<blockquote>/usr//bin/hadoop --config /etc/hadoop dfs -tail webhdfs://NN:50070/file <br>tail: HTTP_PARTIAL expected, received 200<br></blockquote></li>
<li> <a href="">HDFS-2494</a>.
Major sub-task reported by umamaheswararao and fixed by umamaheswararao (data-node)<br>
<b>[webhdfs] When Getting the file using OP=OPEN with DN http address, ESTABLISHED sockets are growing.</b><br>
<blockquote>As part of the reliable test,<br>Scenario:<br>Initially check the socket count. ---there are aroud 42 sockets are there.<br>open the file with DataNode http address using op=OPEN request parameter about 500 times in loop.<br>Wait for some time and check the socket count. --- There are thousands of ESTABLISHED sockets are growing. ~2052<br><br>Here is the netstat result:<br><br>C:\Users\uma&gt;netstat | grep | grep ESTABLISHED |wc -l<br>2042<br>C:\Users\uma&gt;netstat | grep | grep ESTABLISHED |wc -l<br>2042<br>C:\...</blockquote></li>
<li> <a href="">HDFS-2501</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>add version prefix and root methods to webhdfs</b><br>
<li> <a href="">HDFS-2527</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Remove the use of Range header from webhdfs</b><br>
<li> <a href="">HDFS-2528</a>.
Major sub-task reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs rest call to a secure dn fails when a token is sent</b><br>
<blockquote>curl -L -u : --negotiate -i &quot;http://NN:50070/webhdfs/v1/tmp/webhdfs_data/file_small_data.txt?op=OPEN&quot;<br><br>the following exception is thrown by the datanode when the redirect happens.<br>{&quot;RemoteException&quot;:{&quot;exception&quot;:&quot;IOException&quot;,&quot;javaClassName&quot;:&quot;;,&quot;message&quot;:&quot;Call to failed on local exception: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]&quot;}}<br>...</blockquote></li>
<li> <a href="">HDFS-2539</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Support doAs and GETHOMEDIRECTORY in webhdfs</b><br>
<li> <a href="">HDFS-2540</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Change WebHdfsFileSystem to two-step create/append</b><br>
<li> <a href="">HDFS-2552</a>.
Major task reported by szetszwo and fixed by szetszwo (documentation)<br>
<b>Add WebHdfs Forrest doc</b><br>
<li> <a href="">HDFS-2589</a>.
Major bug reported by daryn and fixed by daryn (security)<br>
<b>unnecessary hftp token fetch and renewal thread</b><br>
<blockquote>Instantiation of the hftp filesystem is causing a token to be implicitly created and added to a custom token renewal thread. With the new token renewal feature in the JT, this causes the mapreduce {{obtainTokensForNamenodes}} to fetch two tokens (an implicit and uncancelled token, and an explicit token) and leave a spurious renewal thread running. This thread should not be running in the JT.<br><br>After speaking with Owen, the quick solution is to lazy fetch the token, and to lazy start the rene...</blockquote></li>
<li> <a href="">HDFS-2590</a>.
Major bug reported by szetszwo and fixed by szetszwo (documentation)<br>
<b>Some links in WebHDFS forrest doc do not work</b><br>
<blockquote>Some links are pointing to DistributedFileSystem javadoc but the javadoc of DistributedFileSystem is not generated by default.</blockquote></li>
<li> <a href="">HDFS-2604</a>.
Minor improvement reported by szetszwo and fixed by szetszwo (data-node, documentation, name-node)<br>
<b>Add a log message to show if WebHDFS is enabled</b><br>
<blockquote>WebHDFS can be enabled/disabled by the conf key {{dfs.webhdfs.enabled}}. Let&apos;s add a log message to show if it is enabled.</blockquote></li>
<li> <a href="">HDFS-2673</a>.
Trivial bug reported by umamaheswararao and fixed by umamaheswararao (name-node)<br>
<b>While Namenode processing the blocksBeingWrittenReport, it will log incorrect number blocks count</b><br>
<blockquote>In NameNode#blocksBeingWrittenReport<br> we have the following stateChangeLog<br>{code}<br>;*BLOCK* NameNode.blocksBeingWrittenReport: &quot;<br> +&quot;from &quot;+nodeReg.getName()+&quot; &quot;+blocks.length +&quot; blocks&quot;);<br>{code}<br><br>here blocks is long array. Every consecutive 3 elements represents a block ( length, blockid, genstamp).<br><br>So, here in log message, blocks.length should be blocks.length/3.<br><br> </blockquote></li>
<li> <a href="">MAPREDUCE-3169</a>.
Major improvement reported by tlipcon and fixed by ahmed.radwan (mrv1, mrv2, test)<br>
<b>Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2</b><br>
<blockquote>Many dependent projects like HBase, Hive, Pig, etc, depend on MiniMRCluster for writing tests. Many users do as well. MiniMRCluster, however, exposes MR implementation details like the existence of TaskTrackers, JobTrackers, etc, since it was used by MR1 for testing the server implementations as well.<br><br>This JIRA is to create a new interface which could be implemented either by MR1 or MR2 that exposes only the client-side portions of the MR framework. Ideally it would be &quot;recompile-compatible&quot;...</blockquote></li>
<li> <a href="">MAPREDUCE-3319</a>.
Blocker bug reported by rvs and fixed by subrotosanyal (examples)<br>
<b>multifilewc from hadoop examples seems to be broken in</b><br>
<blockquote>{noformat}<br>/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples- multifilewc examples/text examples-output/multifilewc<br>11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 2<br>11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220<br>11/10/31 16:50:27 INFO mapred.JobClient: map 0% reduce 0%<br>11/10/31 16:50:42 INFO mapred.JobClient: Task Id : attempt_201110311350_0220_m_000000_0, Status : FAILED<br>java.lang.ClassCastException: ...</blockquote></li>
<li> <a href="">MAPREDUCE-3374</a>.
Major bug reported by rvs and fixed by (task-controller)<br>
<b>src/c++/task-controller/configure is not set executable in the tarball and that prevents task-controller from rebuilding</b><br>
<blockquote>ant task-controller fails because src/c++/task-controller/configure is not set executable</blockquote></li>
<li> <a href="">MAPREDUCE-3475</a>.
Major bug reported by daryn and fixed by daryn (jobtracker)<br>
<b>JT can&apos;t renew its own tokens</b><br>
<blockquote>When external systems submit jobs whose tasks need to submit additional jobs (such as oozie/pig), they include their own MR token used to submit the job. The token&apos;s renewer may not allow the JT to renew the token. The JT log will include very long SASL/GSSAPI exceptions when the job is submitted. It is also dubious for the JT to renew its token because it renders the expiry as meaningless since the JT will renew its own token until the max lifetime is exceeded.<br><br>After speaking with Owen &amp;...</blockquote></li>
<li> <a href="">MAPREDUCE-3480</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>TestJvmReuse fails in 1.0</b><br>
<blockquote>TestJvmReuse is failing in apache builds, although it passes in my local machine.</blockquote></li>
<h2>Changes since Hadoop</h2>
<li> <a href="">HADOOP-6722</a>.
Major bug reported by tlipcon and fixed by tlipcon (util)<br>
<b>NetUtils.connect should check that it hasn&apos;t connected a socket to itself</b><br>
<blockquote>I had no idea this was possible, but it turns out that a TCP connection will be established in the rare case that the local side of the socket binds to the ephemeral port that you later try to connect to. This can present itself in very very rare occasion when an RPC client is trying to connect to a daemon running on the same node, but that daemon is down. To see what I&apos;m talking about, run &quot;while true ; do telnet localhost 60020 ; done&quot; on a multicore box and wait several minutes.<br><br>This can ...</blockquote></li>
<li> <a href="">HADOOP-6833</a>.
Blocker bug reported by tlipcon and fixed by tlipcon <br>
<b>IPC leaks call parameters when exceptions thrown</b><br>
<blockquote>HADOOP-6498 moved the calls.remove() call lower into the SUCCESS clause of receiveResponse(), but didn&apos;t put a similar calls.remove into the ERROR clause. So, any RPC call that throws an exception ends up orphaning the Call object in the connection&apos;s &quot;calls&quot; hashtable. This prevents cleanup of the connection and is a memory leak for the call parameters.</blockquote></li>
<li> <a href="">HADOOP-6889</a>.
Major new feature reported by hairong and fixed by johnvijoe (ipc)<br>
<b>Make RPC to have an option to timeout</b><br>
<blockquote>Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.<br><br>But Hadoop RPC is also used for some of client to DataNode communications, for e...</blockquote></li>
<li> <a href="">HADOOP-7119</a>.
Major new feature reported by tucu00 and fixed by tucu00 (security)<br>
<b>add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles</b><br>
<blockquote> Adding support for Kerberos HTTP SPNEGO authentication to the Hadoop web-consoles<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7314</a>.
Major improvement reported by naisbitt and fixed by naisbitt <br>
<b>Add support for throwing UnknownHostException when a host doesn&apos;t resolve</b><br>
<blockquote>As part of MAPREDUCE-2489, we need support for having the resolve methods (for DNS mapping) throw UnknownHostExceptions. (Currently, they hide the exception). Since the existing &apos;resolve&apos; method is ultimately used by several other locations/components, I propose we add a new &apos;resolveValidHosts&apos; method.</blockquote></li>
<li> <a href="">HADOOP-7343</a>.
Minor improvement reported by tgraves and fixed by tgraves (test)<br>
<b>backport HADOOP-7008 and HADOOP-7042 to branch-0.20-security</b><br>
<blockquote>backport HADOOP-7008 and HADOOP-7042 to branch-0.20-security so that we can enable to have a configured number of acceptable findbugs and javadoc warnings</blockquote></li>
<li> <a href="">HADOOP-7400</a>.
Major bug reported by gkesavan and fixed by gkesavan (build)<br>
<b>HdfsProxyTests fails when the and -Dbuild.test is set </b><br>
<blockquote>HdfsProxyTests fails when the and -Dbuild.test is set a dir other than build dir<br><br>test-junit:<br> [copy] Copying 1 file to /home/y/var/builds/thread2/workspace/Cloud-Hadoop-0.20.1xx-Secondary/src/contrib/hdfsproxy/src/test/resources/proxy-config<br> [junit] Running org.apache.hadoop.hdfsproxy.TestHdfsProxy<br> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec<br> [junit] Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED</blockquote></li>
<li> <a href="">HADOOP-7432</a>.
Major improvement reported by sherri_chen and fixed by sherri_chen <br>
<b>Back-port HADOOP-7110 to 0.20-security</b><br>
<blockquote>HADOOP-7110 implemented chmod in the NativeIO library so we can have good performance (ie not fork) and still not be prone to races. This should fix build failures (and probably task failures too).</blockquote></li>
<li> <a href="">HADOOP-7472</a>.
Minor improvement reported by kihwal and fixed by kihwal (ipc)<br>
<b>RPC client should deal with the IP address changes</b><br>
<blockquote>The current RPC client implementation and the client-side callers assume that the hostname-address mappings of servers never change. The resolved address is stored in an immutable InetSocketAddress object above/outside RPC, and the reconnect logic in the RPC Connection implementation also trusts the resolved address that was passed down.<br><br>If the NN suffers a failure that requires migration, it may be started on a different node with a different IP address. In this case, even if the name-addre...</blockquote></li>
<li> <a href="">HADOOP-7510</a>.
Major improvement reported by daryn and fixed by daryn (security)<br>
<b>Tokens should use original hostname provided instead of ip</b><br>
<blockquote>Tokens currently store the ip:port of the remote server. This precludes tokens from being used after a host&apos;s ip is changed. Tokens should store the hostname used to make the RPC connection. This will enable new processes to use their existing tokens.</blockquote></li>
<li> <a href="">HADOOP-7539</a>.
Major bug reported by johnvijoe and fixed by johnvijoe <br>
<b>merge hadoop archive goodness from trunk to .20</b><br>
<blockquote>hadoop archive in branch-0.20-security is outdated. When run recently, it produced some bugs which were all fixed in trunk. This JIRA aims to bring in all these JIRAs to branch-0.20-security.<br></blockquote></li>
<li> <a href="">HADOOP-7594</a>.
Major new feature reported by szetszwo and fixed by szetszwo <br>
<b>Support HTTP REST in HttpServer</b><br>
<blockquote>Provide an API in HttpServer for supporting HTTP REST.<br><br>This is a part of HDFS-2284.</blockquote></li>
<li> <a href="">HADOOP-7596</a>.
Major bug reported by eyang and fixed by eyang (build)<br>
<b>Enable jsvc to work with Hadoop RPM package</b><br>
<blockquote>For secure Hadoop 0.20.2xx cluster, datanode can only run with 32 bit jvm because Hadoop only packages 32 bit jsvc. The build process should download proper jsvc versions base on the build architecture. In addition, the shell script should be enhanced to locate hadoop jar files in the proper location.</blockquote></li>
<li> <a href="">HADOOP-7599</a>.
Major bug reported by eyang and fixed by eyang (scripts)<br>
<b>Improve hadoop setup conf script to setup secure Hadoop cluster</b><br>
<blockquote>Setting up a secure Hadoop cluster requires a lot of manual setup. The motivation of this jira is to provide setup scripts to automate setup secure Hadoop cluster.</blockquote></li>
<li> <a href="">HADOOP-7602</a>.
Major bug reported by johnvijoe and fixed by johnvijoe <br>
<b>wordcount, sort etc on har files fails with NPE</b><br>
<blockquote>wordcount, sort etc on har files fails with NPE@createSocketAddr( </blockquote></li>
<li> <a href="">HADOOP-7603</a>.
Major bug reported by eyang and fixed by eyang <br>
<b>Set default hdfs, mapred uid, and hadoop group gid for RPM packages</b><br>
<blockquote> Set hdfs, mapred uid, and hadoop uid to fixed numbers. (Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7610</a>.
Major bug reported by eyang and fixed by eyang (scripts)<br>
<b>/etc/profile.d does not exist on Debian</b><br>
<blockquote>As part of post installation script, there is a symlink created in /etc/profile.d/ to source /etc/hadoop/ Therefore, users do not need to configure HADOOP_* environment. Unfortunately, /etc/profile.d only exists in Ubuntu. [Section 9.9 of the Debian Policy|] states:<br><br>{quote}<br>A program must not depend on environment variables to get reasonable defaults. (That&apos;s because these environment variables would ha...</blockquote></li>
<li> <a href="">HADOOP-7615</a>.
Major bug reported by eyang and fixed by eyang (scripts)<br>
<b>Binary layout does not put share/hadoop/contrib/*.jar into the class path</b><br>
<blockquote>For contrib projects, contrib jar files are not included in HADOOP_CLASSPATH in the binary layout. Several projects jar files should be copied to $HADOOP_PREFIX/share/hadoop/lib for binary deployment. The interesting jar files to include in $HADOOP_PREFIX/share/hadoop/lib are: capacity-scheduler, thriftfs, fairscheduler.</blockquote></li>
<li> <a href="">HADOOP-7625</a>.
Major bug reported by owen.omalley and fixed by owen.omalley <br>
<b>TestDelegationToken is failing in 205</b><br>
<blockquote>After the patches on Friday, is failing.</blockquote></li>
<li> <a href="">HADOOP-7626</a>.
Major bug reported by eyang and fixed by eyang (scripts)<br>
<b>Allow overwrite of HADOOP_CLASSPATH and HADOOP_OPTS</b><br>
<blockquote>Quote email from Ashutosh Chauhan:<br><br>bq. There is a bug in which prevents hcatalog server to start in secure settings. Instead of adding classpath, it overrides them. I was not able to verify where the bug belongs to, in HMS or in hadoop scripts. Looks like is generated from in installation process by HMS. Hand crafted patch follows:<br><br>bq. - export HADOOP_CLASSPATH=$f<br>bq. +export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f<br><br>bq. -export HADOOP_OPTS=...</blockquote></li>
<li> <a href="">HADOOP-7630</a>.
Major bug reported by arpitgupta and fixed by eyang (conf)<br>
<b> should have a property *.period set to a default value foe metrics</b><br>
<blockquote>currently the file does not have a value set for *.period<br><br>This property is useful for metrics to determine when the property will refresh. We should set it to default of 60</blockquote></li>
<li> <a href="">HADOOP-7631</a>.
Major bug reported by rramya and fixed by eyang (conf)<br>
<b>In mapred-site.xml, stream.tmpdir is mapped to ${mapred.temp.dir} which is undeclared.</b><br>
<blockquote>Streaming jobs seem to fail with the following exception:<br><br>{noformat}<br>Exception in thread &quot;main&quot; No such file or directory<br> at Method)<br> at<br> at<br> at org.apache.hadoop.streaming.StreamJob.packageJobJar(<br> at org.apache.hadoop.streaming.StreamJob.setJobConf(<br> a...</blockquote></li>
<li> <a href="">HADOOP-7633</a>.
Major bug reported by arpitgupta and fixed by eyang (conf)<br>
<b> should be added to the hadoop conf on deploy</b><br>
<blockquote>currently the log4j properties are not present in the hadoop conf dir. We should add them so that log rotation happens appropriately and also define other logs that hadoop can generate for example the audit and the auth logs as well as the mapred summary logs etc.</blockquote></li>
<li> <a href="">HADOOP-7637</a>.
Major bug reported by eyang and fixed by eyang (build)<br>
<b>Fair scheduler configuration file is not bundled in RPM</b><br>
<blockquote>205 build of tar is fine, but rpm failed with:<br><br>{noformat}<br> [rpm] Processing files: hadoop-<br> [rpm] warning: File listed twice: /usr/libexec<br> [rpm] warning: File listed twice: /usr/libexec/<br> [rpm] warning: File listed twice: /usr/libexec/jsvc.i386<br> [rpm] Checking for unpackaged file(s): /usr/lib/rpm/check-files /tmp/hadoop_package_build_hortonfo/BUILD<br> [rpm] error: Installed (but unpackaged) file(s) found:<br> [rpm] /etc/hadoop/fai...</blockquote></li>
<li> <a href="">HADOOP-7644</a>.
Blocker bug reported by owen.omalley and fixed by owen.omalley (security)<br>
<b>Fix the delegation token tests to use the new style renewers</b><br>
<blockquote>Currently, TestDelegationTokenRenewal and TestDelegationTokenFetcher use the old style renewal and fail.<br><br></blockquote></li>
<li> <a href="">HADOOP-7645</a>.
Blocker bug reported by atm and fixed by jnp (security)<br>
<b>HTTP auth tests requiring Kerberos infrastructure are not disabled on branch-0.20-security</b><br>
<blockquote>The back-port of HADOOP-7119 to branch-0.20-security included tests which require Kerberos infrastructure in order to run. In trunk and 0.23, these are disabled unless one enables the {{testKerberos}} maven profile. In branch-0.20-security, these tests are always run regardless, and so fail most of the time.<br><br>See this Jenkins build for an example:</blockquote></li>
<li> <a href="">HADOOP-7649</a>.
Blocker bug reported by kihwal and fixed by jnp (security, test)<br>
<b>TestMapredGroupMappingServiceRefresh and TestRefreshUserMappings fail after HADOOP-7625</b><br>
<blockquote>TestMapredGroupMappingServiceRefresh and TestRefreshUserMappings fail after HADOOP-7625.<br>The classpath has been changed, so they try to create the rsrc file in a jar and fail.<br></blockquote></li>
<li> <a href="">HADOOP-7655</a>.
Major improvement reported by arpitgupta and fixed by arpitgupta <br>
<b>provide a small validation script that smoke tests the installed cluster</b><br>
<blockquote>currently we have scripts that will setup a hadoop cluster, create users etc. We should add a script that will smoke test the installed cluster. The script could run 3 small mr jobs teragen, terasort and teravalidate and cleanup once its done.</blockquote></li>
<li> <a href="">HADOOP-7658</a>.
Major bug reported by gkesavan and fixed by eyang <br>
<b>to fix hadoop config template</b><br>
<blockquote>hadoop rpm config template by default sets the HADOOP_SECURE_DN_USER, HADOOP_SECURE_DN_LOG_DIR &amp; HADOOP_SECURE_DN_PID_DIR <br>the above values should only be set for secured deployment ; <br># On secure datanodes, user to run the datanode as after dropping privileges<br>export HADOOP_SECURE_DN_USER=${HADOOP_HDFS_USER}<br><br># Where log files are stored. $HADOOP_HOME/logs by default.<br>export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER<br><br># Where log files are stored in the secure data environment.<br>export HADOOP_SE...</blockquote></li>
<li> <a href="">HADOOP-7661</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>FileSystem.getCanonicalServiceName throws NPE for any file system uri that doesn&apos;t have an authority.</b><br>
<blockquote>FileSystem.getCanonicalServiceName throws NPE for any file system uri that doesn&apos;t have an authority. <br><br>....<br>java.lang.NullPointerException<br>at<br>at<br>at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(<br>....</blockquote></li>
<li> <a href="">HADOOP-7674</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>TestKerberosName fails in 20 branch.</b><br>
<blockquote>TestKerberosName fails in 20 branch. In fact this test has got duplicated in 20, with a little change to the rules.</blockquote></li>
<li> <a href="">HADOOP-7676</a>.
Major bug reported by gkesavan and fixed by gkesavan <br>
<b>add rules to the core-site.xml template</b><br>
<blockquote>add rules for master and region in core-site.xml template.</blockquote></li>
<li> <a href="">HADOOP-7679</a>.
Major bug reported by rramya and fixed by rramya (conf)<br>
<b> templates does not define mapred.jobsummary.logger</b><br>
<blockquote>In templates/conf/, HADOOP_JOBTRACKER_OPTS is defined as -Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}<br>However, in templates/conf/, instead of mapred.jobsummary.logger, hadoop.mapreduce.jobsummary.logger is defined as follows:<br>hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}<br>This is preventing collection of jobsummary logs.<br><br>We have to consistently use mapred.jobsummary.logg...</blockquote></li>
<li> <a href="">HADOOP-7681</a>.
Minor bug reported by arpitgupta and fixed by arpitgupta (conf)<br>
<b> is missing properties for security audit and hdfs audit should be changed to info</b><br>
<blockquote>(Arpit Gupta via Eric Yang)<br></blockquote></li>
<li> <a href="">HADOOP-7683</a>.
Minor bug reported by arpitgupta and fixed by arpitgupta <br>
<b>hdfs-site.xml template has properties that are not used in 20</b><br>
<blockquote>properties dfs.namenode.http-address and dfs.namenode.https-address should be removed</blockquote></li>
<li> <a href="">HADOOP-7684</a>.
Major bug reported by eyang and fixed by eyang (scripts)<br>
<b>jobhistory server and secondarynamenode should have init.d script</b><br>
<blockquote> Added init.d script for jobhistory server and secondary namenode. (Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7685</a>.
Major bug reported by devaraj.k and fixed by eyang (scripts)<br>
<b>Issues with hadoop-common-project\hadoop-common\src\main\packages\ file </b><br>
<blockquote>hadoop-common-project\hadoop-common\src\main\packages\ has following issues<br>1. check_permission does not work as expected if there are two folders with $NAME as part of their name inside $PARENT<br>e.g. /home/hadoop/conf, /home/hadoop/someconf, <br>The result of `ls -ln $PARENT | grep -w $NAME| awk &apos;{print $3}&apos;` is non is 0 0 and hence the following if check becomes true.<br>{code:xml}<br>if [ &quot;$OWNER&quot; != &quot;0&quot; ]; then<br>RESULT=1<br>break<br>fi <br>{code}<br><br>2. Spelling mistake<br>{code:xml}<br>H...</blockquote></li>
<li> <a href="">HADOOP-7691</a>.
Major bug reported by gkesavan and fixed by eyang <br>
<b>hadoop deb pkg should take a diff group id</b><br>
<blockquote> Fixed conflict uid for install packages. (Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7707</a>.
Major improvement reported by arpitgupta and fixed by arpitgupta (conf)<br>
<b>improve config generator to allow users to specify proxy user, turn append on or off, turn webhdfs on or off</b><br>
<blockquote> Added toggle for, webhdfs and hadoop proxy user to setup config script. (Arpit Gupta via Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7708</a>.
Critical bug reported by arpitgupta and fixed by eyang (conf)<br>
<b>config generator does not update the properties file if on exists already</b><br>
<blockquote> Fixed to handle config file consistently. (Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7710</a>.
Major improvement reported by arpitgupta and fixed by arpitgupta <br>
<b>create a script to setup application in order to create root directories for application such hbase, hcat, hive etc</b><br>
<li> <a href="">HADOOP-7711</a>.
Major bug reported by arpitgupta and fixed by arpitgupta (conf)<br>
<b> generated from templates has duplicate info</b><br>
<blockquote> Fixed recursive sourcing of HADOOP_OPTS environment variables (Arpit Gupta via Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7715</a>.
Major bug reported by arpitgupta and fixed by eyang (conf)<br>
<b>see log4j Error when running mr jobs and certain dfs calls</b><br>
<blockquote> Removed unnecessary security logger configuration. (Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7720</a>.
Major improvement reported by arpitgupta and fixed by arpitgupta (conf)<br>
<b>improve the to read in the hbase user and setup the configs</b><br>
<blockquote> Added parameter for HBase user to setup config script. (Arpit Gupta via Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HADOOP-7721</a>.
Major bug reported by arpitgupta and fixed by jnp <br>
<b>dfs.web.authentication.kerberos.principal expects the full hostname and does not replace _HOST with the hostname</b><br>
<li> <a href="">HADOOP-7724</a>.
Major bug reported by gkesavan and fixed by arpitgupta <br>
<b> should put proxy user info into the core-site.xml </b><br>
<blockquote> Fixed to put proxy user in core-site.xml. (Arpit Gupta via Eric Yang)<br><br> <br></blockquote></li>
<li> <a href="">HDFS-142</a>.
Blocker bug reported by rangadi and fixed by dhruba <br>
<b>In 0.20, move blocks being written into a blocksBeingWritten directory</b><br>
<blockquote>Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work :<br><br>- remove the tmp files during upgrade, or<br>- if the files under /tmp are in pre-18 format (i.e. no generation), delete them.<br><br>Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause...</blockquote></li>
<li> <a href="">HDFS-200</a>.
Blocker new feature reported by szetszwo and fixed by dhruba <br>
<b>In HDFS, sync() not yet guarantees data available to the new readers</b><br>
<blockquote>In the append design doc (, it says<br>* A reader is guaranteed to be able to read data that was &apos;flushed&apos; before the reader opened the file<br><br>However, this feature is not yet implemented. Note that the operation &apos;flushed&apos; is now called &quot;sync&quot;.</blockquote></li>
<li> <a href="">HDFS-561</a>.
Major sub-task reported by kzhang and fixed by kzhang (data-node, hdfs client)<br>
<b>Fix write pipeline READ_TIMEOUT</b><br>
<blockquote>When writing a file, the pipeline status read timeouts for datanodes are not set up properly.</blockquote></li>
<li> <a href="">HDFS-606</a>.
Major bug reported by shv and fixed by shv (name-node)<br>
<b>ConcurrentModificationException in invalidateCorruptReplicas()</b><br>
<blockquote>{{BlockManager.invalidateCorruptReplicas()}} iterates over DatanodeDescriptor-s while removing corrupt replicas from the descriptors. This causes {{ConcurrentModificationException}} if there is more than one replicas of the block. I ran into this exception debugging different scenarios in append, but it should be fixed in the trunk too.</blockquote></li>
<li> <a href="">HDFS-630</a>.
Major improvement reported by mry.maillist and fixed by clehene (hdfs client, name-node)<br>
<b>In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.</b><br>
<blockquote>created from hdfs-200.<br><br>If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream).<br><br>This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-d...</blockquote></li>
<li> <a href="">HDFS-724</a>.
Blocker bug reported by szetszwo and fixed by hairong (data-node, hdfs client)<br>
<b>Pipeline close hangs if one of the datanode is not responsive.</b><br>
<blockquote>In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout.</blockquote></li>
<li> <a href="">HDFS-826</a>.
Major improvement reported by dhruba and fixed by dhruba (hdfs client)<br>
<b>Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline</b><br>
<blockquote>HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event.<br><br>In our use-case, the application can close the file on the fir...</blockquote></li>
<li> <a href="">HDFS-895</a>.
Major improvement reported by dhruba and fixed by tlipcon (hdfs client)<br>
<b>Allow hflush/sync to occur in parallel with new writes to the file</b><br>
<blockquote>In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized. This means that if a hflush/sync is in progress, an applicationn cannot write data to the HDFS client buffer. This reduces the write throughput of the transaction log in HBase. <br><br>The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync is in progress. It can record the seqno of the message for which it should receice the ack, indicate to the DataStream thread to sta...</blockquote></li>
<li> <a href="">HDFS-988</a>.
Blocker bug reported by dhruba and fixed by eli (name-node)<br>
<b>saveNamespace race can corrupt the edits log</b><br>
<blockquote>The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20.<br><br>;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-...</blockquote></li>
<li> <a href="">HDFS-1054</a>.
Major improvement reported by tlipcon and fixed by tlipcon (hdfs client)<br>
<b>Remove unnecessary sleep after failure in nextBlockOutputStream</b><br>
<blockquote>If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds before retrying. I don&apos;t see a great reason to wait at all, much less 6 seconds (especially now that HDFS-630 ensures that a retry won&apos;t go back to the bad node). We should at least make it configurable, and perhaps something like backoff makes some sense.</blockquote></li>
<li> <a href="">HDFS-1057</a>.
Blocker sub-task reported by tlipcon and fixed by rash37 (data-node)<br>
<b>Concurrent readers hit ChecksumExceptions if following a writer to very end of file</b><br>
<blockquote>In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it&apos;s possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable.</blockquote></li>
<li> <a href="">HDFS-1118</a>.
Major bug reported by zshao and fixed by zshao <br>
<b>DFSOutputStream socket leak when cannot connect to DataNode</b><br>
<blockquote>The offending code is in {{DFSOutputStream.nextBlockOutputStream}}<br><br>This function retries several times to call {{createBlockOutputStream}}. Each time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}.<br>That object is never closed, but overwritten the next time {{createBlockOutputStream}} is called.<br></blockquote></li>
<li> <a href="">HDFS-1122</a>.
Major sub-task reported by rash37 and fixed by rash37 <br>
<b>client block verification may result in blocks in DataBlockScanner prematurely</b><br>
<blockquote>found that when the DN uses client verification of a block that is open for writing, it will add it to the DataBlockScanner prematurely. </blockquote></li>
<li> <a href="">HDFS-1141</a>.
Blocker bug reported by tlipcon and fixed by tlipcon (name-node)<br>
<b>completeFile does not check lease ownership</b><br>
<blockquote>completeFile should check that the caller still owns the lease of the file that it&apos;s completing. This is for the &apos;testCompleteOtherLeaseHoldersFile&apos; case in HDFS-1139.</blockquote></li>
<li> <a href="">HDFS-1164</a>.
Major bug reported by eli and fixed by tlipcon (contrib/hdfsproxy)<br>
<b>TestHdfsProxy is failing</b><br>
<blockquote>TestHdfsProxy is failing on trunk, seen in HDFS-1132 and HDFS-1143. It doesn&apos;t look like hudson posts test results for contrib and it&apos;s hard to see what&apos;s going on from the raw console output. Can someone with access to hudson upload the individual test output for TestHdfsProxy so we can see what the issue is?</blockquote></li>
<li> <a href="">HDFS-1186</a>.
Blocker bug reported by tlipcon and fixed by tlipcon (data-node)<br>
<b>0.20: DNs should interrupt writers at start of recovery</b><br>
<blockquote>When block recovery starts (eg due to NN recovering lease) it needs to interrupt any writers currently writing to those blocks. Otherwise, an old writer (who hasn&apos;t realized he lost his lease) can continue to write+sync to the blocks, and thus recovery ends up truncating data that has been sync()ed.</blockquote></li>
<li> <a href="">HDFS-1197</a>.
Major bug reported by tlipcon and fixed by (data-node, hdfs client, name-node)<br>
<b>Blocks are considered &quot;complete&quot; prematurely after commitBlockSynchronization or DN restart</b><br>
<blockquote>I saw this failure once on my internal Hudson job that runs the append tests 48 times a day:<br>junit.framework.AssertionFailedError: expected:&lt;114688&gt; but was:&lt;98304&gt;<br> at org.apache.hadoop.hdfs.AppendTestUtil.check(<br> at org.apache.hadoop.hdfs.TestFileAppend3.testTC2(<br></blockquote></li>
<li> <a href="">HDFS-1202</a>.
Major bug reported by tlipcon and fixed by tlipcon (data-node)<br>
<b>DataBlockScanner throws NPE when updated before initialized</b><br>
<blockquote>Missing an isInitialized() check in updateScanStatusInternal</blockquote></li>
<li> <a href="">HDFS-1204</a>.
Major bug reported by tlipcon and fixed by rash37 <br>
<b>0.20: Lease expiration should recover single files, not entire lease holder</b><br>
<blockquote>This was brought up in HDFS-200 but didn&apos;t make it into the branch on Apache.</blockquote></li>
<li> <a href="">HDFS-1207</a>.
Major bug reported by tlipcon and fixed by tlipcon (name-node)<br>
<b>0.20-append: stallReplicationWork should be volatile</b><br>
<blockquote>the stallReplicationWork member in FSNamesystem is accessed by multiple threads without synchronization, but isn&apos;t marked volatile. I believe this is responsible for about 1% failure rate on TestFileAppend4.testAppendSyncChecksum* on my 8-core test boxes (looking at logs I see replication happening even though we&apos;ve supposedly disabled it)</blockquote></li>
<li> <a href="">HDFS-1210</a>.
Trivial improvement reported by tlipcon and fixed by tlipcon (hdfs client)<br>
<b>DFSClient should log exception when block recovery fails</b><br>
<blockquote>Right now we just retry without necessarily showing the exception. It can be useful to see what the error was that prevented the recovery RPC from succeeding.<br>(I believe this only applies in 0.20 style of block recovery)</blockquote></li>
<li> <a href="">HDFS-1211</a>.
Minor improvement reported by tlipcon and fixed by tlipcon (data-node)<br>
<b>0.20 append: Block receiver should not log &quot;rewind&quot; packets at INFO level</b><br>
<blockquote>In the 0.20 append implementation, it logs an INFO level message for every packet that &quot;rewinds&quot; the end of the block file. This is really noisy for applications like HBase which sync every edit.</blockquote></li>
<li> <a href="">HDFS-1218</a>.
Critical bug reported by tlipcon and fixed by tlipcon (data-node)<br>
<b>20 append: Blocks recovered on startup should be treated with lower priority during block synchronization</b><br>
<blockquote>When a datanode experiences power loss, it can come back up with truncated replicas (due to local FS journal replay). Those replicas should not be allowed to truncate the block during block synchronization if there are other replicas from DNs that have _not_ restarted.</blockquote></li>
<li> <a href="">HDFS-1242</a>.
Major test reported by tlipcon and fixed by tlipcon <br>
<b>0.20 append: Add test for appendFile() race solved in HDFS-142</b><br>
<blockquote>This is a unit test that didn&apos;t make it into branch-0.20-append, but worth having in TestFileAppend4.</blockquote></li>
<li> <a href="">HDFS-1252</a>.
Major test reported by tlipcon and fixed by tlipcon (test)<br>
<b>TestDFSConcurrentFileOperations broken in 0.20-appendj</b><br>
<blockquote>This test currently has several flaws:<br> - It calls DN.updateBlock with a BlockInfo instance, which then causes java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.hdfs.server.namenode.BlocksMap$BlockInfo.&lt;init&gt;() in the logs when the DN tries to send blockReceived for the block<br> - It assumes that getBlockLocations returns an up-to-date length block after a sync, which is false. It happens to work because it calls getBlockLocations directly on the NN, and thus gets a...</blockquote></li>
<li> <a href="">HDFS-1260</a>.
Critical bug reported by tlipcon and fixed by tlipcon <br>
<b>0.20: Block lost when multiple DNs trying to recover it to different genstamps</b><br>
<blockquote>Saw this issue on a cluster where some ops people were doing network changes without shutting down DNs first. So, recovery ended up getting started at multiple different DNs at the same time, and some race condition occurred that caused a block to get permanently stuck in recovery mode. What seems to have happened is the following:<br>- FSDataset.tryUpdateBlock called with old genstamp 7091, new genstamp 7094, while the block in the volumeMap (and on filesystem) was genstamp 7093<br>- we find the b...</blockquote></li>
<li> <a href="">HDFS-1346</a>.
Major bug reported by hairong and fixed by hairong (data-node, hdfs client)<br>
<b>DFSClient receives out of order packet ack</b><br>
<blockquote>When running 0.20 patched with HDFS-101, we sometimes see an error as follow:<br>WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block Responseprocessor: Expecting seq<br>no for block blk_-2871223654872350746_21421120 10280 but received 10281<br>at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$<br><br>This indicates that DFS client expects an ack for packet N, but receives an ack for packe...</blockquote></li>
<li> <a href="">HDFS-1520</a>.
Major new feature reported by hairong and fixed by hairong (name-node)<br>
<b>HDFS 20 append: Lightweight NameNode operation to trigger lease recovery</b><br>
<blockquote>Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I&apos;d like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.</blockquote></li>
<li> <a href="">HDFS-1554</a>.
Major improvement reported by hairong and fixed by hairong <br>
<b>Append 0.20: New semantics for recoverLease</b><br>
<blockquote> Change recoverLease API to return if the file is closed or not. It also change the semantics of recoverLease to start lease recovery immediately.<br><br> <br></blockquote></li>
<li> <a href="">HDFS-1555</a>.
Major improvement reported by hairong and fixed by hairong <br>
<b>HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered</b><br>
<blockquote>When a file is under lease recovery and the writer is still alive, the write pipeline will be killed and then the writer will start a pipeline recovery. Sometimes the pipeline recovery may race before the lease recovery and as a result fail the lease recovery. This is very bad if we want to support the strong recoverLease semantics in HDFS-1554. So it would be nice if we could disallow a file&apos;s pipeline recovery while its lease recovery is in progress.</blockquote></li>
<li> <a href="">HDFS-1779</a>.
Major bug reported by umamaheswararao and fixed by umamaheswararao (data-node, name-node)<br>
<b>After NameNode restart , Clients can not read partial files even after client invokes Sync.</b><br>
<blockquote>In Append HDFS-200 issue,<br>If file has 10 blocks and after writing 5 blocks if client invokes sync method then NN will persist the blocks information in edits. <br>After this if we restart the NN, All the DataNodes will reregister with NN. But DataNodes are not sending the blocks being written information to NN. DNs are sending the blocksBeingWritten information in DN startup. So, here NameNode can not find that the 5 persisted blocks belongs to which datanodes. This information can build based o...</blockquote></li>
<li> <a href="">HDFS-1836</a>.
Major bug reported by hkdennis2k and fixed by bharathm (hdfs client)<br>
<b>Thousand of CLOSE_WAIT socket </b><br>
<blockquote>$ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT<br>4471<br><br>It is better if everything runs normal. <br>However, from time to time there are some &quot;DataStreamer Exception:; and &quot;DFSClient.processDatanodeError(2507) | Error Recovery for&quot; can be found from log file and the number of CLOSE_WAIT socket just keep increasing<br><br>The CLOSE_WAIT handles may remain for hours and days; then &quot;Too many open file&quot; some day.<br></blockquote></li>
<li> <a href="">HDFS-2053</a>.
Minor bug reported by miguno and fixed by miguno (name-node)<br>
<b>Bug in INodeDirectory#computeContentSummary warning</b><br>
<blockquote>*How to reproduce*<br><br>{code}<br># create test directories<br>$ hadoop fs -mkdir /hdfs-1377/A<br>$ hadoop fs -mkdir /hdfs-1377/B<br>$ hadoop fs -mkdir /hdfs-1377/C<br><br># ...add some test data (few kB or MB) to all three dirs...<br><br># set space quota for subdir C only<br>$ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C<br><br># the following two commands _on the parent dir_ trigger the warning<br>$ hadoop fs -dus /hdfs-1377<br>$ hadoop fs -count -q /hdfs-1377<br>{code}<br><br>Warning message in the namenode logs:<br><br>{code}<br>2011-06-09 09:42...</blockquote></li>
<li> <a href="">HDFS-2117</a>.
Minor bug reported by eli and fixed by eli (data-node)<br>
<b>DiskChecker#mkdirsWithExistsAndPermissionCheck may return true even when the dir is not created</b><br>
<blockquote>In branch-0.20-security as part of HADOOP-6566, DiskChecker#mkdirsWithExistsAndPermissionCheck will return true even if it wasn&apos;t able to create the directory, which means instead of throwing a DiskErrorException the code will proceed to getFileStatus and throw a FNF exception. Post HADOOP-7040, which modified makeInstance to catch not just DiskErrorExceptions but IOExceptions as well, this is not an issue since now the exception is caught either way. But for future modifications we should st...</blockquote></li>
<li> <a href="">HDFS-2190</a>.
Major bug reported by atm and fixed by atm (name-node)<br>
<b>NN fails to start if it encounters an empty or malformed fstime file</b><br>
<blockquote>On startup, the NN reads the fstime file of all the configured to determine which one to load. However, if any of the searched directories contain an empty or malformed fstime file, the NN will fail to start. The NN should be able to just proceed with starting and ignore the directory containing the bad fstime file.</blockquote></li>
<li> <a href="">HDFS-2202</a>.
Major new feature reported by eepayne and fixed by eepayne (balancer, data-node)<br>
<b>Changes to balancer bandwidth should not require datanode restart.</b><br>
<blockquote> New dfsadmin command added: [-setBalancerBandwidth &amp;lt;bandwidth&amp;gt;] where bandwidth is max network bandwidth in bytes per second that the balancer is allowed to use on each datanode during balacing.&lt;br/&gt;<br><br>This is an incompatible change in 0.23. The versions of ClientProtocol and DatanodeProtocol are changed.<br></blockquote></li>
<li> <a href="">HDFS-2259</a>.
Minor bug reported by eli and fixed by eli (data-node)<br>
<b>DN web-UI doesn&apos;t work with paths that contain html </b><br>
<blockquote>The 20-based DN web UI doesn&apos;t work with paths that contain html. The paths need to be unescaped when used to access the file and escaped when printed for navigation.</blockquote></li>
<li> <a href="">HDFS-2284</a>.
Major sub-task reported by sanjay.radia and fixed by szetszwo <br>
<b>Write Http access to HDFS</b><br>
<blockquote>HFTP allows on read access to HDFS via HTTP. Add write HTTP access to HDFS.</blockquote></li>
<li> <a href="">HDFS-2300</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>TestFileAppend4 and TestMultiThreadedSync fail on 20.append and 20-security.</b><br>
<blockquote>TestFileAppend4 and TestMultiThreadedSync fail on the 20.append and 20-security branch.<br></blockquote></li>
<li> <a href="">HDFS-2309</a>.
Major bug reported by jnp and fixed by jnp <br>
<b>TestRenameWhileOpen fails in branch-0.20-security</b><br>
<blockquote>TestRenameWhileOpen is failing in branch-0.20-security.</blockquote></li>
<li> <a href="">HDFS-2317</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Read access to HDFS using HTTP REST</b><br>
<li> <a href="">HDFS-2318</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Provide authentication to webhdfs using SPNEGO</b><br>
<blockquote> Added two new conf properties dfs.web.authentication.kerberos.principal and dfs.web.authentication.kerberos.keytab for the SPNEGO servlet filter.<br><br> <br></blockquote></li>
<li> <a href="">HDFS-2320</a>.
Major bug reported by sureshms and fixed by sureshms (data-node, hdfs client, name-node)<br>
<b>Make merged protocol changes from 0.20-append to 0.20-security compatible with previous releases.</b><br>
<blockquote>0.20-append changes have been merged to 0.20-security. The merge has changes to version numbers in several protocols. This jira makes the protocol changes compatible with older release, allowing clients running older version to talk to server running 205 version and clients running 205 version talk to older servers running 203, 204.</blockquote></li>
<li> <a href="">HDFS-2325</a>.
Blocker bug reported by charlescearl and fixed by kihwal (contrib/fuse-dfs, libhdfs)<br>
<b>Fuse-DFS fails to build on Hadoop 20.203.0</b><br>
<blockquote>In building fuse-dfs, the compile fails due to an argument mismatch between call to hdfsConnectAsUser on line 40 of src/contrib/fuse-dfs/src/fuse_connect.c and an earlier definition of hdfsConnectAsUser given in src/c++/libhdfs/hdfs.h.<br>I suggest changing hdfs.h. I made the following change in hdfs.h in my local copy:<br><br>106c106,107<br>&lt; hdfsFS hdfsConnectAsUser(const char* host, tPort port, const char *user);<br>---<br>&gt; // hdfsFS hdfsConnectAsUser(const char* host, tPort port, const char *us...</blockquote></li>
<li> <a href="">HDFS-2328</a>.
Critical bug reported by daryn and fixed by owen.omalley <br>
<b>hftp throws NPE if security is not enabled on remote cluster</b><br>
<blockquote>If hftp cannot locate either a hdfs or hftp token in the ugi, it will call {{getDelegationToken}} to acquire one from the remote nn. This method may return a null {{Token}} if security is disabled(*) on the remote nn. Hftp will internally call its {{setDelegationToken}} which will throw a NPE when the token is {{null}}.<br><br>(*) Actually, if any problem happens while acquiring the token it assumes security is disabled! However, it&apos;s a pre-existing issue beyond the scope of the token renewal c...</blockquote></li>
<li> <a href="">HDFS-2331</a>.
Major bug reported by abhijit.shingate and fixed by abhijit.shingate (hdfs client)<br>
<b>Hdfs compilation fails</b><br>
<blockquote>I am trying to perform complete build from trunk folder but the compilation fails.<br><br>*Commandline:*<br>mvn clean install <br><br>*Error Message:*<br><br>[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.<br>3.2:compile (default-compile) on project hadoop-hdfs: Compilation failure<br>[ERROR] \Hadoop\SVN\trunk\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org<br>\apache\hadoop\hdfs\web\[209,21] type parameters of &lt;T&gt;T<br>cannot be determined; no unique maximal instance...</blockquote></li>
<li> <a href="">HDFS-2333</a>.
Major bug reported by ikelly and fixed by szetszwo <br>
<b>HDFS-2284 introduced 2 findbugs warnings on trunk</b><br>
<blockquote>When HDFS-2284 was submitted it made DFSOutputStream public which triggered two SC_START_IN_CTOR findbug warnings.</blockquote></li>
<li> <a href="">HDFS-2338</a>.
Major sub-task reported by jnp and fixed by jnp <br>
<b>Configuration option to enable/disable webhdfs.</b><br>
<blockquote> Added a conf property dfs.webhdfs.enabled for enabling/disabling webhdfs.<br><br> <br></blockquote></li>
<li> <a href="">HDFS-2340</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Support getFileBlockLocations and getDelegationToken in webhdfs</b><br>
<li> <a href="">HDFS-2342</a>.
Blocker bug reported by kihwal and fixed by szetszwo (build)<br>
<b>TestSleepJob and TestHdfsProxy broken after HDFS-2284</b><br>
<blockquote>After HDFS-2284, TestSleepJob and TestHdfsProxy are failing.<br>The both work in rev 1167444 and fail in rev 1167663.<br>It will be great if they can be fixed for 205.</blockquote></li>
<li> <a href="">HDFS-2348</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Support getContentSummary and getFileChecksum in webhdfs</b><br>
<li> <a href="">HDFS-2356</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>webhdfs: support case insensitive query parameter names</b><br>
<li> <a href="">HDFS-2358</a>.
Major bug reported by rajsaha and fixed by daryn (name-node)<br>
<b>NPE when the default filesystem&apos;s uri has no authority</b><br>
<blockquote> Give meaningful error message instead of NPE.<br><br> <br></blockquote></li>
<li> <a href="">HDFS-2359</a>.
Major bug reported by rajsaha and fixed by jeagles (data-node)<br>
<b>NPE found in Datanode log while Disk failed during different HDFS operation</b><br>
<blockquote>Scenario:<br>I have a cluster of 4 DN ,each of them have 12disks.<br><br>In hdfs-site.xml I have &quot;dfs.datanode.failed.volumes.tolerated=3&quot; <br><br>During the execution of distcp (hdfs-&gt;hdfs), I am failing 3 disks in one Datanode, by making Data Directory permission 000, The distcp job is successful but , I am getting some NullPointerException in Datanode log<br><br>In one thread<br>$hadoop distcp /user/$HADOOPQA_USER/data1 /user/$HADOOPQA_USER/data3<br><br>In another thread in a datanode<br>$ chmod 000 /xyz/{0,1,2}/hadoop/v...</blockquote></li>
<li> <a href="">HDFS-2361</a>.
Critical bug reported by rajsaha and fixed by jnp (name-node)<br>
<b>hftp is broken</b><br>
<blockquote>Distcp with hftp is failing.<br><br>{noformat}<br>$hadoop distcp hftp://&lt;NNhostname&gt;:50070/user/hadoopqa/1316814737/newtemp 1316814737/as<br>11/09/23 21:52:33 INFO tools.DistCp: srcPaths=[hftp://&lt;NNhostname&gt;:50070/user/hadoopqa/1316814737/newtemp]<br>11/09/23 21:52:33 INFO tools.DistCp: destPath=1316814737/as<br>Retrieving token from: https://&lt;NN IP&gt;:50470/getDelegationToken<br>Retrieving token from: https://&lt;NN IP&gt;:50470/getDelegationToken?renewer=mapred<br>11/09/23 21:52:34 INFO security.TokenCache: Got dt for h...</blockquote></li>
<li> <a href="">HDFS-2366</a>.
Major bug reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs throws a npe when ugi is null from getDelegationToken</b><br>
<li> <a href="">HDFS-2368</a>.
Major bug reported by arpitgupta and fixed by szetszwo <br>
<b>defaults created for web keytab and principal, these properties should not have defaults</b><br>
<blockquote>the following defaults are set in hdfs-defaults.xml<br><br>&lt;property&gt;<br> &lt;name&gt;dfs.web.authentication.kerberos.principal&lt;/name&gt;<br> &lt;value&gt;HTTP/${dfs.web.hostname}@${kerberos.realm}&lt;/value&gt;<br> &lt;description&gt;<br> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.<br><br> The HTTP Kerberos principal MUST start with &apos;HTTP/&apos; per Kerberos<br> HTTP SPENGO specification.<br> &lt;/description&gt;<br>&lt;/property&gt;<br><br>&lt;property&gt;<br> &lt;name&gt;dfs.web.authentication.kerberos.keytab&lt;/name&gt;<br> &lt;value&gt;${user.home}/dfs.web....</blockquote></li>
<li> <a href="">HDFS-2373</a>.
Major bug reported by arpitgupta and fixed by arpitgupta <br>
<b>Commands using webhdfs and hftp print unnecessary debug information on the console with security enabled</b><br>
<blockquote>run an hdfs command using either hftp or webhdfs and it prints the following line to the console (system out)<br><br>Retrieving token from: https://NN_HOST:50470/getDelegationToken<br><br><br>Probably in the code where we get the delegation token. This should be removed as people using the dfs commands to get a handle to the content such as dfs -cat will now get an extra line that is not part of the actual content. This should either be only in the log or not logged at all.</blockquote></li>
<li> <a href="">HDFS-2375</a>.
Blocker bug reported by sureshms and fixed by sureshms (hdfs client)<br>
<b>TestFileAppend4 fails in 0.20.205 branch</b><br>
<blockquote>TestFileAppend4 fails due to change from HDFS-2333. The test uses reflection to get to the method DFSOutputStream#getNumCurrentReplicas(). Since HDFS-2333 patch change this method from public to private, reflection get the method fails resulting in test failures.</blockquote></li>
<li> <a href="">HDFS-2385</a>.
Major sub-task reported by szetszwo and fixed by szetszwo <br>
<b>Support delegation token renewal in webhdfs</b><br>
<li> <a href="">HDFS-2392</a>.
Critical bug reported by rajsaha and fixed by daryn (name-node)<br>
<b>Dist with hftp is failing again</b><br>
<blockquote>$ hadoop distcp hftp://&lt;NN Hostname&gt;:50070/user/hadoopqa/input1/part-00000 /user/hadoopqa/out3<br>11/09/30 18:57:59 INFO tools.DistCp: srcPaths=[hftp://&lt;NN Hostname&gt;:50070/user/hadoopqa/input1/part-00000]<br>11/09/30 18:57:59 INFO tools.DistCp: destPath=/user/hadoopqa/out3<br>11/09/30 18:58:00 INFO security.TokenCache: Got dt for<br>hftp://&lt;NN Hostname&gt;:50070/user/hadoopqa/input1/part-00000;uri=&lt;NN IP&gt;:50470;t.service=&lt;NN IP&gt;:50470<br>11/09/30 18:58:00 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN toke...</blockquote></li>
<li> <a href="">HDFS-2395</a>.
Critical bug reported by arpitgupta and fixed by szetszwo <br>
<b>webhdfs api&apos;s should return a root element in the json response</b><br>
<li> <a href="">HDFS-2403</a>.
Major bug reported by szetszwo and fixed by szetszwo <br>
<b>The renewer in NamenodeWebHdfsMethods.generateDelegationToken(..) is not used</b><br>
<blockquote>Below are some suggestions from Suresh.<br># renewer not used in #generateDelegationToken<br># put() does not use InputStream in and should not throw URISyntaxException<br># post() does not use InputStream in and should not throw URISyntaxException<br># get() should not throw URISyntaxException<br></blockquote></li>
<li> <a href="">HDFS-2404</a>.
Major bug reported by arpitgupta and fixed by sureshms <br>
<b>webhdfs liststatus json response is not correct</b><br>
<li> <a href="">HDFS-2408</a>.
Blocker bug reported by stack and fixed by stack (hdfs client)<br>
<b>DFSClient#getNumCurrentReplicas is package private in 205 but public in branch-0.20-append</b><br>
<blockquote>The below commit broke hdfs-826 for hbase in 205 rc1. It changes the accessiblity from public to package private on getNumCurrentReplicas and now current shipping hbase&apos;s at least cannot get at this method.<br><br>{code}<br>Revision 1174483 - (view) (download) (annotate) - [select for diffs] <br>Modified Fri Sep 23 01:30:18 2011 UTC (13 days, 4 hours ago) by szetszwo <br>File length: 136876 byte(s) <br>Diff to previous 1174479 (colored)<br>svn merge -c 1171137 from branch-0.20-security for HDFS-2333.<br>{code}<br><br>Her...</blockquote></li>
<li> <a href="">HDFS-2411</a>.
Major bug reported by arpitgupta and fixed by jnp <br>
<b>with webhdfs enabled in secure mode the auth to local mappings are not being respected.</b><br>
<li> <a href="">MAPREDUCE-1734</a>.
Blocker improvement reported by tomwhite and fixed by tlipcon (documentation)<br>
<b>Un-deprecate the old MapReduce API in the 0.20 branch</b><br>
<blockquote>This issue is to un-deprecate the &quot;old&quot; MapReduce API (in o.a.h.mapred) in the next 0.20 release, as discussed at</blockquote></li>
<li> <a href="">MAPREDUCE-2187</a>.
Major bug reported by azaroth and fixed by anupamseth <br>
<b>map tasks timeout during sorting</b><br>
<blockquote> I just committed this. Thanks Anupam!<br><br> <br></blockquote></li>
<li> <a href="">MAPREDUCE-2324</a>.
Major bug reported by tlipcon and fixed by revans2 <br>
<b>Job should fail if a reduce task can&apos;t be scheduled anywhere</b><br>
<blockquote>If there&apos;s a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the &quot;stuck&quot; task wasn&apos;t clear from a user perspective until we looked at the JT logs.<br><br>Probably better to just fail the job if a reduce task goes ...</blockquote></li>
<li> <a href="">MAPREDUCE-2489</a>.
Major bug reported by naisbitt and fixed by naisbitt (jobtracker)<br>
<b>Jobsplits with random hostnames can make the queue unusable</b><br>
<blockquote>We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else.<br><br>I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number...</blockquote></li>
<li> <a href="">MAPREDUCE-2494</a>.
Major improvement reported by revans2 and fixed by revans2 (distributed-cache)<br>
<b>Make the distributed cache delete entires using LRU priority</b><br>
<blockquote> Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the target percentage of the local distributed cache that should be kept in between garbage collection runs. In practice it will delete unused distributed cache entries in LRU order until the size of the cache is less than mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size. This is a floating point value between 0.0 and 1.0. The default is 0.95.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2549</a>.
Major bug reported by devaraj.k and fixed by devaraj.k (contrib/eclipse-plugin, contrib/streaming)<br>
<b>Potential resource leaks in, and</b><br>
<li> <a href="">MAPREDUCE-2610</a>.
Major bug reported by jrottinghuis and fixed by jrottinghuis (client)<br>
<b>Inconsistent API JobClient.getQueueAclsForCurrentUser</b><br>
<blockquote>Client needs access to the current user&apos;s queue name.<br>Public method JobClient.getQueueAclsForCurrentUser() returns QueueAclsInfo[].<br>The QueueAclsInfo class has default access. A public method should not return a package-private class.<br><br>The QueueAclsInfo class, its two constructors, getQueueName, and getOperations methods should be public.</blockquote></li>
<li> <a href="">MAPREDUCE-2650</a>.
Major bug reported by sherri_chen and fixed by sherri_chen <br>
<b>back-port MAPREDUCE-2238 to 0.20-security</b><br>
<blockquote>Dev had seen the attempt directory permission getting set to 000 or 111 in the CI builds and tests run on dev desktops with 0.20-security.<br>MAPREDUCE-2238 reported and fixed the issue for 0.22.0, back-port to 0.20-security is needed.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2705</a>.
Major bug reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed</b><br>
<blockquote>The current TaskLauncher serially launches new tasks one at a time. During the launch it does the localization and then starts the map/reduce task. This can cause any other tasks to be blocked waiting for the current task to be localized and started. In some instances we have seen a task that has a large file to localize (1.2MB) block another task for about 40 minutes. This particular task being blocked was a cleanup task which caused the job to be delayed finishing for the 40 minutes.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2729</a>.
Major improvement reported by sherri_chen and fixed by sherri_chen <br>
<b>Reducers are always counted having &quot;pending tasks&quot; even if they can&apos;t be scheduled yet because not enough of their mappers have completed</b><br>
<blockquote>In capacity scheduler, number of users in a queue needing slots are calculated based on whether users&apos; jobs have any pending tasks.<br>This works fine for map tasks. However, for reduce tasks, jobs do not need reduce slots until the minimum number of map tasks have been completed.<br><br>Here, we add checking whether reduce is ready to schedule (i.e. if a job has completed enough map tasks) when we increment number of users in a queue needing reduce slots.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2764</a>.
Major bug reported by daryn and fixed by owen.omalley <br>
<b>Fix renewal of dfs delegation tokens</b><br>
<blockquote> Generalizes token renewal and canceling to a common interface and provides a plugin interface for adding renewers for new kinds of tokens. Hftp changed to store the tokens as HFTP and renew them over http.<br><br> <br></blockquote></li>
<li> <a href="">MAPREDUCE-2777</a>.
Major new feature reported by jeagles and fixed by amar_kamat <br>
<b>Backport MAPREDUCE-220 to Hadoop 20 security branch</b><br>
<blockquote> Adds cumulative cpu usage and total heap usage to task counters. This is a backport of &lt;a href=&quot;/jira/browse/MAPREDUCE-220&quot; title=&quot;Collecting cpu and memory usage for MapReduce tasks&quot;&gt;&lt;strike&gt;MAPREDUCE-220&lt;/strike&gt;&lt;/a&gt; and &lt;a href=&quot;/jira/browse/MAPREDUCE-2469&quot; title=&quot;Task counters should also report the total heap usage of the task&quot;&gt;&lt;strike&gt;MAPREDUCE-2469&lt;/strike&gt;&lt;/a&gt;.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2780</a>.
Major sub-task reported by daryn and fixed by daryn <br>
<b>Standardize the value of token service</b><br>
<blockquote>The token&apos;s service field must (currently) be set to &quot;ip:port&quot;. All the producers of a token are independently building the service string. This should be done via a common method to reduce the chance of error, and to facilitate the field value being easily changed in the (near) future.</blockquote></li>
<li> <a href="">MAPREDUCE-2852</a>.
Major bug reported by eli and fixed by kihwal (tasktracker)<br>
<b>Jira for YDH bug 2854624 </b><br>
<blockquote>The DefaultTaskController and LinuxTaskController reference Yahoo! internal bug 2854624:<br><br>{code}<br>FileSystem rawFs = FileSystem.getLocal(getConf()).getRaw();<br>long logSize = 0; //TODO: Ref BUG:2854624<br>{code}<br><br>This jira tracks this TODO. If someone w/ access to Yahoo&apos;s bugzilla could update this jira with what the bug is that would be great.</blockquote></li>
<li> <a href="">MAPREDUCE-2915</a>.
Major bug reported by kihwal and fixed by kihwal (task-controller)<br>
<b>LinuxTaskController does not work when JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is enabled</b><br>
<blockquote>When a job is submitted, LinuxTaskController launches the native task-controller binary for job initialization. The native program does a series of prep work and call execv() to run JobLocalizer. It was observed that JobLocalizer does fails to run when JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is enabled, resulting in 100% job failures.<br><br>JobLocalizer normally does not need the native library (libhadoop) for its functioning, but enabling a JNI user-to-group mapping functi...</blockquote></li>
<li> <a href="">MAPREDUCE-2928</a>.
Major sub-task reported by eli and fixed by eli (tasktracker)<br>
<b>MR-2413 improvements</b><br>
<blockquote>Tracks improvements to MR-2413. See [this comment|;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095073].</blockquote></li>
<li> <a href="">MAPREDUCE-2981</a>.
Major improvement reported by matei and fixed by matei (contrib/fair-share)<br>
<b>Backport trunk fairscheduler to 0.20-security branch</b><br>
<blockquote>A lot of improvements have been made to the fair scheduler in 0.21, 0.22 and trunk, but have not been ported back to the new 0.20.20X releases that are currently considered the stable branch of Hadoop.</blockquote></li>
<li> <a href="">MAPREDUCE-3076</a>.
Blocker bug reported by acmurthy and fixed by acmurthy (test)<br>
<b>TestSleepJob fails </b><br>
<blockquote>TestSleepJob fails, it was intended to be used in other tests for MAPREDUCE-2981.</blockquote></li>
<li> <a href="">MAPREDUCE-3081</a>.
Major bug reported by vitthal_gogate and fixed by (contrib/vaidya)<br>
<b>Change the name format for hadoop core and vaidya jar to be hadoop-{core/vaidya}-{version}.jar in</b><br>
<blockquote> contrib/vaidya/bin/ script fixed to use appropriate jars and classpath <br><br> <br></blockquote></li>
<li> <a href="">MAPREDUCE-3112</a>.
Major bug reported by eyang and fixed by eyang (contrib/streaming)<br>
<b>Calling hadoop cli inside mapreduce job leads to errors</b><br>
<blockquote> Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process. &lt;br/&gt;<br><br><br></blockquote></li>
<h2>Changes since Hadoop</h2>
<li> <a href="">MAPREDUCE-2846</a>.
Blocker bug reported by aw and fixed by owen.omalley (task, task-controller, tasktracker)<br>
<b>a small % of all tasks fail with DefaultTaskController</b><br>
<blockquote>Fixed a race condition in writing the log index file that caused tasks to &apos;fail&apos;.</blockquote></li>
<li> <a href="">MAPREDUCE-2804</a>.
Blocker bug reported by aw and fixed by owen.omalley <br>
<b>&quot;Creation of symlink to attempt log dir failed.&quot; message is not useful</b><br>
<blockquote>Removed duplicate chmods of job log dir that were vulnerable to race conditions between tasks. Also improved the messages when the symlinks failed to be created.</blockquote></li>
<li> <a href="">MAPREDUCE-2651</a>.
Major bug reported by bharathm and fixed by bharathm (task-controller)<br>
<b>Race condition in Linux Task Controller for job log directory creation</b><br>
<blockquote>There is a rare race condition in linux task controller when concurrent task processes tries to create job log directory at the same time. </blockquote></li>
<li> <a href="">MAPREDUCE-2621</a>.
Minor bug reported by sherri_chen and fixed by sherri_chen <br>
<b>TestCapacityScheduler fails with &quot;Queue &quot;q1&quot; does not exist&quot;</b><br>
<blockquote>{quote}<br>Error Message<br><br>Queue &quot;q1&quot; does not exist<br><br>Stacktrace<br><br> Queue &quot;q1&quot; does not exist<br> at org.apache.hadoop.mapred.JobInProgress.&lt;init&gt;(<br> at org.apache.hadoop.mapred.TestCapacityScheduler$FakeJobInProgress.&lt;init&gt;(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.submitJobAndInit(<br> at org.apache.hadoop.mapred.TestCapacityScheduler.testMultiTaskAssignmentInMultipleQueues(<br>{quote}<br><br>When queue name is invalid, an exception is thrown now. <br><br></blockquote></li>
<li> <a href="">MAPREDUCE-2558</a>.
Major new feature reported by naisbitt and fixed by naisbitt (jobtracker)<br>
<b>Add queue-level metrics 0.20-security branch</b><br>
<blockquote>We would like to record and present the jobtracker metrics on a per-queue basis.</blockquote></li>
<li> <a href="">MAPREDUCE-2555</a>.
Minor bug reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>JvmInvalidate errors in the gridmix TT logs</b><br>
<blockquote>Observing a lot of jvmValidate exceptions in TT logs for grid mix run<br><br><br><br>************************<br>2011-04-28 02:00:37,578 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 46121, call<br>statusUpdate(attempt_201104270735_5993_m_003305_0, org.apache.hadoop.mapred.MapTaskStatus@1840a9c,<br>org.apache.hadoop.mapred.JvmContext@1d4ab6b) from error: JvmValidate Failed.<br>Ignoring request from task: attempt_201104270735_5993_m_003305_0, with JvmId:<br>jvm_201104270735_5993_m_103399012gsbl20430: JvmValidate Failed. Ignoring request from task:<br>attempt_201104270735_5993_m_003305_0, with JvmId: jvm_201104270735_5993_m_103399012gsbl20430: --<br> at org.apache.hadoop.ipc.Server$Handler$<br> at Method)<br> at<br> at<br> at org.apache.hadoop.ipc.Server$<br><br><br>*********************<br><br></blockquote></li>
<li> <a href="">MAPREDUCE-2529</a>.
Major bug reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>Recognize Jetty bug 1342 and handle it</b><br>
<blockquote>Added 2 new config parameters:<br><br><br><br>mapreduce.reduce.shuffle.catch.exception.stack.regex<br><br>mapreduce.reduce.shuffle.catch.exception.message.regex</blockquote></li>
<li> <a href="">MAPREDUCE-2524</a>.
Minor improvement reported by tgraves and fixed by tgraves (tasktracker)<br>
<b>Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle</b><br>
<blockquote>Added a new configuration option: mapreduce.reduce.shuffle.maxfetchfailures, and removed a no longer used option: mapred.reduce.copy.backoff.</blockquote></li>
<li> <a href="">MAPREDUCE-2514</a>.
Trivial bug reported by jeagles and fixed by jeagles (tasktracker)<br>
<b>ReinitTrackerAction class name misspelled RenitTrackerAction in task tracker log</b><br>
<li> <a href="">MAPREDUCE-2495</a>.
Minor improvement reported by revans2 and fixed by revans2 (distributed-cache)<br>
<b>The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason</b><br>
<blockquote>The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. </blockquote></li>
<li> <a href="">MAPREDUCE-2490</a>.
Trivial improvement reported by jeagles and fixed by jeagles (jobtracker)<br>
<b>Log blacklist debug count</b><br>
<blockquote>Gain some insight into blacklist increments/decrements by enhancing the debug logging</blockquote></li>
<li> <a href="">MAPREDUCE-2479</a>.
Major improvement reported by revans2 and fixed by revans2 (tasktracker)<br>
<b>Backport MAPREDUCE-1568 to hadoop security branch</b><br>
<blockquote>Added mapreduce.tasktracker.distributedcache.checkperiod to the task tracker that defined the period to wait while cleaning up the distributed cache. The default is 1 min.</blockquote></li>
<li> <a href="">MAPREDUCE-2456</a>.
Trivial improvement reported by naisbitt and fixed by naisbitt (jobtracker)<br>
<b>Show the reducer taskid and map/reduce tasktrackers for &quot;Failed fetch notification #_ for task attempt...&quot; log messages</b><br>
<blockquote>This jira is to provide more useful log information for debugging the &quot;Too many fetch-failures&quot; error.<br><br>Looking at the JobTracker node, we see messages like this:<br>&quot;2010-12-14 00:00:06,911 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #8 for task<br>attempt_201011300729_189729_m_007458_0&quot;.<br><br>I would be useful to see which reducer is reporting the error here.<br><br>So, I propose we add the following to these log messages:<br> 1. reduce task ID<br> 2. TaskTracker nodenames for both the mapper and the reducer<br></blockquote></li>
<li> <a href="">MAPREDUCE-2451</a>.
Trivial bug reported by tgraves and fixed by tgraves (jobtracker)<br>
<b>Log the reason string of healthcheck script</b><br>
<blockquote>The information on why a specific TaskTracker got blacklisted is not stored anywhere. The jobtracker web ui will show the detailed reason string until the TT gets unblacklisted. After that it is lost.</blockquote></li>
<li> <a href="">MAPREDUCE-2447</a>.
Minor bug reported by sseth and fixed by sseth <br>
<b>Set JvmContext sooner for a task - MR2429</b><br>
<blockquote>TaskTracker.validateJVM() is throwing NPE when setupWorkDir() throws IOException. This is because<br>taskFinal.setJvmContext() is not executed yet</blockquote></li>
<li> <a href="">MAPREDUCE-2443</a>.
Minor bug reported by sseth and fixed by sseth (test)<br>
<b>Fix FI build - broken after MR-2429</b><br>
<blockquote>src/test/system/aop/org/apache/hadoop/mapred/TaskAspect.aj:72 [warning] advice defined in org.apache.hadoop.mapred.TaskAspect has not been applied [Xlint:adviceDidNotMatch]<br><br>After the fix in MR-2429, the call to ping in TaskAspect needs to be fixed.</blockquote></li>
<li> <a href="">MAPREDUCE-2429</a>.
Major bug reported by acmurthy and fixed by sseth (tasktracker)<br>
<b>Check jvmid during task status report</b><br>
<blockquote>Currently TT doens&apos;t check to ensure jvmid is relevant during communication with the Child via TaskUmbilicalProtocol.</blockquote></li>
<li> <a href="">MAPREDUCE-2418</a>.
Minor bug reported by sseth and fixed by sseth <br>
<b>Errors not shown in the JobHistory servlet (specifically Counter Limit Exceeded)</b><br>
<blockquote>Job error details are not displayed in the JobHistory servlet. e.g. Errors like &apos;Counter limit exceeded for a job&apos;. <br>jobdetails.jsp has &apos;Failure Info&apos;, but this is missing in jobdetailshistory.jsp</blockquote></li>
<li> <a href="">MAPREDUCE-2415</a>.
Major sub-task reported by bharathm and fixed by bharathm (task-controller, tasktracker)<br>
<b>Distribute TaskTracker userlogs onto multiple disks</b><br>
<blockquote>Currently, userlogs directory in TaskTracker is placed under hadoop.log.dir like &lt;hadoop.log.dir&gt;/userlogs. I am proposing to spread these userlogs onto multiple configured mapred.local.dirs to strengthen TaskTracker reliability w.r.t disk failures. </blockquote></li>
<li> <a href="">MAPREDUCE-2413</a>.
Major sub-task reported by bharathm and fixed by ravidotg (task-controller, tasktracker)<br>
<b>TaskTracker should handle disk failures at both startup and runtime</b><br>
<blockquote>At present, TaskTracker doesn&apos;t handle disk failures properly both at startup and runtime.<br><br>(1) Currently TaskTracker doesn&apos;t come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs.<br>(2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn&apos;t do anything special. This results in either<br> (a) TaskTracker continues to &quot;try to use that bad disk&quot; and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR<br> (b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk.<br><br>This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs.<br></blockquote></li>
<li> <a href="">MAPREDUCE-2411</a>.
Minor bug reported by dking and fixed by dking <br>
<b>When you submit a job to a queue with no ACLs you get an inscrutible NPE</b><br>
<blockquote>With this patch we&apos;ll check for that, and print a message in the logs. Then at submission time you find out about it.</blockquote></li>
<li> <a href="">MAPREDUCE-2409</a>.
Major bug reported by sseth and fixed by sseth (distributed-cache)<br>
<b>Distributed Cache does not differentiate between file /archive for files with the same path</b><br>
<blockquote>If a &apos;global&apos; file is specified as a &apos;file&apos; by one job - subsequent jobs cannot override this source file to be an &apos;archive&apos; (until the TT cleans up it&apos;s cache or a TT restart).<br>The other way around as well -&gt; &apos;archive&apos; to &apos;file&apos;<br><br>In case of an accidental submission using the wrong type - some of the tasks for the second job will end up seeing the source file as an archive, others as a file.</blockquote></li>
<li> <a href="">MAPREDUCE-2366</a>.
Major bug reported by owen.omalley and fixed by dking (tasktracker)<br>
<b>TaskTracker can&apos;t retrieve stdout and stderr from web UI</b><br>
<blockquote>Problem where the task browser UI can&apos;t retrieve the stdxxx printouts of streaming jobs that abend in the unix code, in the common case where the containing job doesn&apos;t reuse JVM&apos;s.</blockquote></li>
<li> <a href="">MAPREDUCE-2364</a>.
Major bug reported by owen.omalley and fixed by devaraj (tasktracker)<br>
<b>Shouldn&apos;t hold lock on rjob while localizing resources.</b><br>
<blockquote>There is a deadlock while localizing resources on the TaskTracker.</blockquote></li>
<li> <a href="">MAPREDUCE-2362</a>.
Major bug reported by owen.omalley and fixed by roelofs (test)<br>
<b>Unit test failures: TestBadRecords and TestTaskTrackerMemoryManager</b><br>
<blockquote>Fix unit-test failures: TestBadRecords (NPE due to rearranged MapTask code) and TestTaskTrackerMemoryManager (need hostname in output-string pattern).</blockquote></li>
<li> <a href="">MAPREDUCE-2360</a>.
Major bug reported by owen.omalley and fixed by (client)<br>
<b>Pig fails when using non-default FileSystem</b><br>
<blockquote>The job client strips the file system from the user&apos;s job jar, which causes breakage when it isn&apos;t the default file system.</blockquote></li>
<li> <a href="">MAPREDUCE-2359</a>.
Major bug reported by owen.omalley and fixed by ramach <br>
<b>Distributed cache doesn&apos;t use non-default FileSystems correctly</b><br>
<blockquote>We are passing as viewfs:/// in core site.xml on oozie server.<br>We have default name node in configuration also viewfs:///<br><br>We are using hdfs://path in our path for application.<br>Its giving following error:<br><br>IllegalArgumentException: Wrong FS:<br>hdfs://nn/user/strat_ci/oozie-oozi/0000002-110217014830452-oozie-oozi-W/hadoop1--map-reduce/map-reduce-launcher.jar,<br>expected: viewfs:/</blockquote></li>
<li> <a href="">MAPREDUCE-2358</a>.
Major bug reported by owen.omalley and fixed by ramach <br>
<b>MapReduce assumes HDFS as the default filesystem</b><br>
<blockquote>Mapred assumes hdfs as the default fs even when defined otherwise.</blockquote></li>
<li> <a href="">MAPREDUCE-2357</a>.
Major bug reported by owen.omalley and fixed by vicaya (task)<br>
<b>When extending inputsplit (non-FileSplit), all exceptions are ignored</b><br>
<blockquote>if you&apos;re using a custom RecordReader/InputFormat setup and using an<br>InputSplit that does NOT extend FileSplit, then any exceptions you throw in your RecordReader.nextKeyValue() function<br>are silently ignored.</blockquote></li>
<li> <a href="">MAPREDUCE-2356</a>.
Major bug reported by owen.omalley and fixed by vicaya <br>
<b>A task succeeded even though there were errors on all attempts.</b><br>
<blockquote>From Luke Lu:<br><br>Here is a summary of why the failed map task was considered &quot;successful&quot; (Thanks to Mahadev, Arun and Devaraj<br>for insightful discussions).<br><br>1. The map task was hanging BEFORE being initialized (probably in localization, but it doesn&apos;t matter in this case).<br>Its state is UNASSIGNED.<br><br>2. The jt decided to kill it due to timeout and scheduled a cleanup task on the same node.<br><br>3. The cleanup task has the same attempt id (by design.) but runs in a different JVM. Its initial state is<br>FAILED_UNCLEAN.<br><br>4. The JVM of the original attempt is getting killed, while proceeding to setupWorkDir and throwed an<br>IllegalStateException while FileSystem.getLocal, which causes taskFinal.taskCleanup being called in Child, and<br>triggered the NPE due to the task is not yet initialized (committer is null). Before the NPE, however it sent a<br>statusUpdate to TT, and in tip.reportProgress, changed the task state (currently FAILED_UNCLEAN) to UNASSIGNED.<br><br>5. The cleanup attempt succeeded and report done to TT. In tip.reportDone, the isCleanup() check returned false due to<br>the UNASSIGNED state and set the task state as SUCCEEDED.<br></blockquote></li>
<li> <a href="">MAPREDUCE-517</a>.
Critical bug reported by acmurthy and fixed by acmurthy <br>
<b>The capacity-scheduler should assign multiple tasks per heartbeat</b><br>
<blockquote>HADOOP-3136 changed the default o.a.h.mapred.JobQueueTaskScheduler to assign multiple tasks per TaskTracker heartbeat, the capacity-scheduler should do the same.</blockquote></li>
<li> <a href="">MAPREDUCE-118</a>.
Blocker bug reported by amar_kamat and fixed by amareshwari (client)<br>
<b>Job.getJobID() will always return null</b><br>
<blockquote>JobContext is used for a read-only view of job&apos;s info. Hence all the readonly fields in JobContext are set in the constructor. Job extends JobContext. When a Job is created, jobid is not known and hence there is no way to set JobID once Job is created. JobID is obtained only when the JobClient queries the jobTracker for a job-id., which happens later i.e upon job submission.</blockquote></li>
<li> <a href="">HDFS-2218</a>.
Blocker test reported by mattf and fixed by mattf (contrib/hdfsproxy, test)<br>
<b>Disable TestHdfsProxy.testHdfsProxyInterface in automated test suite for 0.20-security-204 release</b><br>
<blockquote>Test case TestHdfsProxy.testHdfsProxyInterface has been temporarily disabled for this release, due to failure in the Hudson automated test environment.</blockquote></li>
<li> <a href="">HDFS-2057</a>.
Major bug reported by bharathm and fixed by bharathm (data-node)<br>
<b>Wait time to terminate the threads causing unit tests to take longer time</b><br>
<blockquote>As a part of datanode process hang, this part of code was introduced in 0.20.204 to clean up all the waiting threads.<br><br>- try {<br>- readPool.awaitTermination(10, TimeUnit.SECONDS);<br>- } catch (InterruptedException e) {<br>-;Exception occured in doStop:&quot; + e.getMessage());<br>- }<br>- readPool.shutdownNow();<br><br>This was clearly meant for production, but all the unit tests uses minidfscluster and minimrcluster for shutdown which waits on this part of the code. Due to this, we saw increase in unit test run times. So removing this code. <br></blockquote></li>
<li> <a href="">HDFS-2044</a>.
Major test reported by mattf and fixed by mattf (test)<br>
<b>TestQueueProcessingStatistics failing automatic test due to timing issues</b><br>
<blockquote>The test makes assumptions about timing issues that hold true in workstation environments but not in Hudson auto-test.</blockquote></li>
<li> <a href="">HDFS-2023</a>.
Major bug reported by bharathm and fixed by bharathm (data-node)<br>
<b>Backport of NPE for File.list and File.listFiles</b><br>
<blockquote>Since we have multiple Jira&apos;s in trunk for common and hdfs, I am creating another jira for this issue. <br><br>This patch addresses the following:<br><br>1. Provides FileUtil API for list and listFiles which throws IOException for null cases. <br>2. Replaces most of the code where JDK file API with FileUtil API. </blockquote></li>
<li> <a href="">HDFS-1878</a>.
Minor bug reported by mattf and fixed by mattf (name-node)<br>
<b>TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence</b><br>
<blockquote>In 20.204, TestHDFSServerPorts was observed to intermittently throw a NullPointerException. This only happens when FSNamesystem.close() is called, which means system termination for the Namenode, so this is not a serious bug for .204. TestHDFSServerPorts is more likely than normal execution to stimulate the race, because it runs two Namenodes in the same JVM, causing more interleaving and more potential to see a race condition.<br><br>The race is in FSNamesystem.close(), line 566, we have:<br> if (replthread != null) replthread.interrupt();<br> if (replmon != null) replmon = null;<br><br>Since the interrupted replthread is not waited on, there is a potential race condition with replmon being nulled before replthread is dead, but replthread references replmon in computeDatanodeWork() where the NullPointerException occurs.<br><br>The solution is either to wait on replthread or just don&apos;t null replmon. The latter is preferred, since none of the sibling Namenode processing threads are waited on in close().<br><br>I&apos;ll attach a patch for .205.<br></blockquote></li>
<li> <a href="">HDFS-1822</a>.
Blocker bug reported by sureshms and fixed by sureshms (name-node)<br>
<b>Editlog opcodes overlap between 20 security and later releases</b><br>
<blockquote>Same opcode are used for different operations between, 0.22 and 0.23. This results in failure to load editlogs on later release, especially during upgrades.</blockquote></li>
<li> <a href="">HDFS-1773</a>.
Minor improvement reported by tanping and fixed by tanping (name-node)<br>
<b>Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists</b><br>
<blockquote>Our service engineering team who operates the clusters on a daily basis founds it is confusing that after a data node is decommissioned, there is no way to make the cluster forget about this data node and it always remains in the dead node list.</blockquote></li>
<li> <a href="">HDFS-1767</a>.
Major sub-task reported by mattf and fixed by mattf (data-node)<br>
<b>Namenode should ignore non-initial block reports from datanodes when in safemode during startup</b><br>
<blockquote>Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn&apos;t relevant to this discussion). <br><br>As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport().<br><br>The &quot;second Block Report&quot; (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed!<br><br>2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup:<br><br>{noformat}<br> time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min<br>0 1299799498 3042 3042 1969 1969 151 151 0 151<br>1 1299799558 665 3707 1470 3439 248 399 0 248<br>2 1299799618 3707 224 3663 270 669 0 270<br>3 1299799678 3707 14 3677 261 930 3 3 264<br>4 1299799738 3707 23 3700 288 1218 1 4 289<br>5 1299799798 3707 7 3707 258 1476 3 7 261<br>6 1299799858 3707 3707 317 1793 4 11 321<br>7 1299799918 3707 3707 292 2085 6 17 298<br>8 1299799978 3707 3707 292 2377 8 25 300<br>9 1299800038 3707 3707 272 2649 25 272<br>10 1299800098 3707 3707 280 2929 15 40 295<br>11 1299800158 3707 3707 223 3152 14 54 237<br>12 1299800218 3707 3707 143 3295 54 143<br>13 1299800278 3707 3707 141 3436 20 74 161<br>14 1299800338 3707 3707 195 3631 78 152 273<br>15 1299800398 3707 3707 51 3682 209 361 260<br>16 1299800458 3707 3707 25 3707 369 730 394<br>17 1299800518 3707 3707 3707 166 896 166<br>18 1299800578 3707 3707 3707 72 968 72<br>19 1299800638 3707 3707 3707 67 1035 67<br>20 1299800698 3707 3707 3707 75 1110 75<br>21 1299800758 3707 3707 3707 71 1181 71<br>22 1299800818 3707 3707 3707 67 1248 67<br>23 1299800878 3707 3707 3707 62 1310 62<br>24 1299800938 3707 3707 3707 56 1366 56<br>25 1299800998 3707 3707 3707 60 1426 60<br>{noformat}<br><br>This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well.<br><br>The &quot;starts&quot; column shows that all the nodes started up within the first 2 minutes, and the &quot;regs&quot; column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes.<br><br>The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn&apos;t the startup just finish? In the &quot;2nd_BR&quot; column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally finish in minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.<br><br>In order to decrease competition for locks and other resources, to speed up IBR processing during startup, we propose to delay 2BRs until later into the cycle.</blockquote></li>
<li> <a href="">HDFS-1758</a>.
Minor bug reported by tanping and fixed by tanping (tools)<br>
<b>Web UI JSP pages thread safety issue</b><br>
<blockquote>The set of JSP pages that web UI uses are not thread safe. We have observed some problems when requesting Live/Dead/Decommissioning pages from the web UI, incorrect page is displayed. To be more specific, requesting Dead node list page, sometimes, Live node page is returned. Requesting decommissioning page, sometimes, dead page is returned.<br><br>The root cause of this problem is that JSP page is not thread safe by default. When multiple requests come in, each request is assigned to a different thread, multiple threads access the same instance of the servlet class resulted from a JSP page. A class variable is shared by multiple threads. The JSP code in 20 branche, for example, dfsnodelist.jsp has<br>{code}<br>&lt;!%<br> int rowNum = 0;<br> int colNum = 0;<br> String sorterField = null;<br> String sorterOrder = null;<br> String whatNodes = &quot;LIVE&quot;;<br> ...<br>%&gt;<br>{code}<br><br>declared as class variables. ( These set of variables are declared within &lt;%! code %&gt; directives which made them class members. ) Multiple threads share the same set of class member variables, one request would step on anther&apos;s toe. <br><br>However, due to the JSP code refactor, HADOOP-5857, all of these class member variables are moved to become function local variables. So this bug does not appear in Apache trunk. Hence, we have proposed to take a simple fix for this bug on 20 branch alone, to be more specific, branch-0.20-security.<br><br>The simple fix is to add jsp ThreadSafe=&quot;false&quot; directive into the related JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. only on request is processed at each time. <br><br>We did evaluate the thread safety issue for other JSP pages on trunk, we noticed a potential problem is that when we retrieving some statistics from namenode, for example, we make the call to <br>{code}<br>NamenodeJspHelper.getInodeLimitText(fsn);<br>{code}<br>in dfshealth.jsp, which eventuality is <br><br>{code}<br> static String getInodeLimitText(FSNamesystem fsn) {<br> long inodes = fsn.dir.totalInodes();<br> long blocks = fsn.getBlocksTotal();<br> long maxobjects = fsn.getMaxObjects();<br> ....<br>{code}<br><br>some of the function calls are already guarded by readwritelock, e.g. dir.totalInodes, but others are not. As a result of this, the web ui results are not 100% thread safe. But after evaluating the prons and cons of adding a giant lock into the JSP pages, we decided not to issue FSNamesystem ReadWrite locks into JSPs.<br><br></blockquote></li>
<li> <a href="">HDFS-1750</a>.
Major bug reported by szetszwo and fixed by szetszwo <br>
<b>fs -ls hftp://file not working</b><br>
<blockquote>{noformat}<br>hadoop dfs -touchz /tmp/file1 # create file. OK<br>hadoop dfs -ls /tmp/file1 # OK<br>hadoop dfs -ls hftp://namenode:50070/tmp/file1 # FAILED: not seeing the file<br>{noformat}</blockquote></li>
<li> <a href="">HDFS-1692</a>.
Major bug reported by bharathm and fixed by bharathm (data-node)<br>
<b>In secure mode, Datanode process doesn&apos;t exit when disks fail.</b><br>
<blockquote>In secure mode, when disks fail more than volumes tolerated, datanode process doesn&apos;t exit properly and it just hangs even though shutdown method is called. <br><br></blockquote></li>
<li> <a href="">HDFS-1592</a>.
Major bug reported by bharathm and fixed by bharathm <br>
<b>Datanode startup doesn&apos;t honor volumes.tolerated </b><br>
<blockquote>Datanode startup doesn&apos;t honor volumes.tolerated for hadoop 20 version.</blockquote></li>
<li> <a href="">HDFS-1541</a>.
Major sub-task reported by hairong and fixed by hairong (name-node)<br>
<b>Not marking datanodes dead When namenode in safemode</b><br>
<blockquote>In a big cluster, when namenode starts up, it takes a long time for namenode to process block reports from all datanodes. Because heartbeats processing get delayed, some datanodes are erroneously marked as dead, then later on they have to register again, thus wasting time.<br><br>It would speed up starting time if the checking of dead nodes is disabled when namenode in safemode.</blockquote></li>
<li> <a href="">HDFS-1445</a>.
Major sub-task reported by mattf and fixed by mattf (data-node)<br>
<b>Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file</b><br>
<blockquote>Batch hardlinking during &quot;upgrade&quot; snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Depends on prior integration with patch for HADOOP-7133.</blockquote></li>
<li> <a href="">HDFS-1377</a>.
Blocker bug reported by eli and fixed by eli (name-node)<br>
<b>Quota bug for partial blocks allows quotas to be violated </b><br>
<blockquote>There&apos;s a bug in the quota code that causes them not to be respected when a file is not an exact multiple of the block size. Here&apos;s an example:<br><br>{code}<br>$ hadoop fs -mkdir /test<br>$ hadoop dfsadmin -setSpaceQuota 384M /test<br>$ ls dir/ | wc -l # dir contains 101 files<br>101<br>$ du -ms dir # each is 3mb<br>304 dir<br>$ hadoop fs -put dir /test<br>$ hadoop fs -count -q /test<br> none inf 402653184 -550502400 2 101 317718528 hdfs://<br>$ hadoop fs -stat &quot;%o %r&quot; /test/dir/f30<br>134217728 3 # three 128mb blocks<br>{code}<br><br>INodeDirectoryWithQuota caches the number of bytes consumed by it&apos;s children in {{diskspace}}. The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when a file is not an exact multiple of the block size (the value ends up being negative). <br><br>This causes the quota checking code to think that the files in the directory consumes less space than they actually do, so the verifyQuota does not throw a QuotaExceededException even when the directory is over quota. However the bug isn&apos;t visible to users because {{fs count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value.<br><br>In FSDirectory#addBlock the disk space consumed is set conservatively to the full block size * the number of replicas:<br><br>{code}<br>updateCount(inodes, inodes.length-1, 0,<br> fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);<br>{code}<br><br>In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting out the difference between the conservative estimate and what the number of bytes actually stored was:<br><br>{code}<br>//Updated space consumed if required.<br>INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;<br>long diff = (file == null) ? 0 :<br> (file.getPreferredBlockSize() - storedBlock.getNumBytes());<br><br>if (diff &gt; 0 &amp;&amp; file.isUnderConstruction() &amp;&amp;<br> cursize &lt; storedBlock.getNumBytes()) {<br>...<br> dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());<br>{code}<br><br>We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity (I believe the intent here is to correct for the cases when there&apos;s a failure replicating blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will use the preferred block size (vs of Block#getNumBytes used by newnode) so we will again subtract out the difference between the full block size and what the number of bytes actually stored was:<br><br>{code}<br>long dsOld = oldnode.diskspaceConsumed();<br>...<br>//check if disk space needs to be updated.<br>long dsNew = 0;<br>if (updateDiskspace &amp;&amp; (dsNew = newnode.diskspaceConsumed()) != dsOld) {<br> try {<br> updateSpaceConsumed(path, 0, dsNew-dsOld);<br>...<br>{code}<br><br>So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract 375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and yet we can still write more files.<br><br>So a directory with lots of single block files (if you have multiple blocks on the final partial block ends up subtracting from the diskspace used) ends up having a quota that&apos;s way off.<br><br>I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations differ when the old and new INode have the same blocks. I&apos;ll work on a patch which also adds a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary if the computed size does not reflect the cached value.</blockquote></li>
<li> <a href="">HDFS-1258</a>.
Blocker bug reported by atm and fixed by atm (name-node)<br>
<b>Clearing namespace quota on &quot;/&quot; corrupts FS image</b><br>
<blockquote>The HDFS root directory starts out with a default namespace quota of Integer.MAX_VALUE. If you clear this quota (using &quot;hadoop dfsadmin -clrQuota /&quot;), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, and the NN will not come back up from a restart.</blockquote></li>
<li> <a href="">HDFS-1189</a>.
Major bug reported by xiaokang and fixed by johnvijoe (name-node)<br>
<b>Quota counts missed between clear quota and set quota</b><br>
<blockquote>HDFS Quota counts will be missed between a clear quota operation and a set quota.<br><br>When setting quota for a dir, the INodeDirectory will be replaced by INodeDirectoryWithQuota and dir.isQuotaSet() becomes true. When INodeDirectoryWithQuota is newly created, quota counting will be performed. However, when clearing quota, the quota conf is set to -1 and dir.isQuotaSet() becomes false while INodeDirectoryWithQuota will NOT be replaced back to INodeDirectory.<br><br>FSDirectory.updateCount just update the quota count for inodes that isQuotaSet() is true. So after clear quota for a dir, its quota counts will not be updated and it&apos;s reasonable. But when re seting quota for this dir, quota counting will not be performed and some counts will be missed.</blockquote></li>
<li> <a href="">HADOOP-7475</a>.
Blocker bug reported by eyang and fixed by eyang <br>
<b> is broken</b><br>
<blockquote>When running, the system can not find the templates configuration directory:<br><br>{noformat}<br>cat: /usr/libexec/../templates/conf/core-site.xml: No such file or directory<br>cat: /usr/libexec/../templates/conf/hdfs-site.xml: No such file or directory<br>cat: /usr/libexec/../templates/conf/mapred-site.xml: No such file or directory<br>cat: /usr/libexec/../templates/conf/ No such file or directory<br>chown: cannot access `;: No such file or directory<br>chmod: cannot access `;: No such file or directory<br>cp: cannot stat `*.xml&apos;: No such file or directory<br>cp: cannot stat `;: No such file or directory<br>{noformat}</blockquote></li>
<li> <a href="">HADOOP-7398</a>.
Major new feature reported by owen.omalley and fixed by owen.omalley <br>
<b>create a mechanism to suppress the HADOOP_HOME deprecated warning</b><br>
<blockquote>Create a new mechanism to suppress the warning about HADOOP_HOME deprecation.<br><br>I&apos;ll create a HADOOP_HOME_WARN_SUPPRESS environment variable that suppresses the warning.</blockquote></li>
<li> <a href="">HADOOP-7373</a>.
Major bug reported by owen.omalley and fixed by owen.omalley <br>
<b>Tarball deployment doesn&apos;t work with {start,stop}-{dfs,mapred}</b><br>
<blockquote>The overrides the variable &quot;bin&quot;, which makes the scripts use libexec for hadoop-daemon(s).</blockquote></li>
<li> <a href="">HADOOP-7364</a>.
Major bug reported by tgraves and fixed by tgraves (test)<br>
<b>TestMiniMRDFSCaching fails if is set to something other than build/test</b><br>
<blockquote>TestMiniMRDFSCaching fails if is set to something other than build/test. </blockquote></li>
<li> <a href="">HADOOP-7356</a>.
Blocker bug reported by eyang and fixed by eyang <br>
<b>RPM packages broke bin/hadoop script for hadoop 0.20.205</b><br>
<blockquote> has been moved to libexec for binary package, but developers prefers to have in bin. Hadoo shell scripts should be modified to support both scenarios.</blockquote></li>
<li> <a href="">HADOOP-7330</a>.
Major bug reported by vicaya and fixed by vicaya (metrics)<br>
<b>The metrics source mbean implementation should return the attribute value instead of the object</b><br>
<blockquote>The MetricsSourceAdapter#getAttribute in 0.20.203 is returning the attribute object instead of the value.</blockquote></li>
<li> <a href="">HADOOP-7324</a>.
Blocker bug reported by vicaya and fixed by priyomustafi (metrics)<br>
<b>Ganglia plugins for metrics v2</b><br>
<blockquote>Although, all metrics in metrics v2 are exposed via the standard JMX mechanisms, most users are using Ganglia to collect metrics.</blockquote></li>
<li> <a href="">HADOOP-7277</a>.
Minor improvement reported by naisbitt and fixed by naisbitt (build)<br>
<b>Add Eclipse launch tasks for the 0.20-security branch</b><br>
<blockquote>This is to add the eclipse launchers from HADOOP-5911 to the 0.20 security branch.<br><br>Eclipse has a notion of &quot;run configuration&quot;, which encapsulates what&apos;s needed to run or debug an application. I use this quite a bit to start various Hadoop daemons in debug mode, with breakpoints set, to inspect state and what not.<br><br>This is simply configuration, so no tests are provided. After running &quot;ant eclipse&quot; and refreshing your project, you should see entries in the Run Configurations and Debug Configurations for launching the various hadoop daemons from within eclipse. There&apos;s a template for testing a specific test, and also templates to run all the tests, the job tracker, and a task tracker. It&apos;s likely that some parameters need to be further tweaked to have the same behavior as &quot;ant test&quot;, but for most tests, this works.<br><br>This also requires a small change to build.xml for the eclipse classpath.</blockquote></li>
<li> <a href="">HADOOP-7274</a>.
Minor bug reported by jeagles and fixed by jeagles (util)<br>
<b>CLONE - IOUtils.readFully and IOUtils.skipFully have typo in exception creation&apos;s message</b><br>
<blockquote>Same fix as for HADOOP-7057 for the Hadoop security branch<br><br>{noformat}<br> throw new IOException( &quot;Premeture EOF from inputStream&quot;);<br>{noformat}</blockquote></li>
<li> <a href="">HADOOP-7248</a>.
Minor improvement reported by cos and fixed by tgraves (build)<br>
<b>Have a way to automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy for 0.20-* based sources</b><br>
<blockquote>Backport HADOOP-6407 into 0.20 based source trees</blockquote></li>
<li> <a href="">HADOOP-7232</a>.
Blocker bug reported by owen.omalley and fixed by owen.omalley (documentation)<br>
<b>Fix javadoc warnings</b><br>
<blockquote>The javadoc is currently generating 31 warnings.</blockquote></li>
<li> <a href="">HADOOP-7144</a>.
Major new feature reported by vicaya and fixed by revans2 <br>
<b>Expose JMX with something like JMXProxyServlet </b><br>
<blockquote>Much of the Hadoop metrics and status info is available via JMX, especially since 0.20.100, and 0.22+ (HDFS-1318, HADOOP-6728 etc.) For operations staff not familiar JMX setup, especially JMX with SSL and firewall tunnelling, the usage can be daunting. Using a JMXProxyServlet (a la Tomcat) to translate JMX attributes into JSON output would make a lot of non-Java admins happy.<br><br>We could probably use Tomcat&apos;s JMXProxyServlet code directly, if it&apos;s already output some standard format (JSON or XML etc.) The code is simple enough to port over and can probably integrate with the common HttpServer as one of the default servelet (maybe /jmx) for the pluggable security.</blockquote></li>
<li> <a href="">HADOOP-6255</a>.
Major new feature reported by owen.omalley and fixed by eyang <br>
<b>Create an rpm integration project</b><br>
<blockquote>Added RPM/DEB packages to build system.</blockquote></li>
<h2>Changes Since Hadoop 0.20.2</h2>
<li> <a href="">HADOOP-7190</a>. Add metrics v1 back for backwards compatibility. (omalley)
<li> <a href="">MAPREDUCE-2360</a>. Remove stripping of scheme, authority from submit dir in
support of viewfs. (cdouglas)
<li> <a href="">MAPREDUCE-2359</a> Use correct file system to access distributed cache objects.
(Krishna Ramachandran)
<li> <a href="">MAPREDUCE-2361</a>. "Fix Distributed Cache is not adding files to class paths
correctly" - Drop the host/scheme/fragment from URI (cdouglas)
<li> <a href="">MAPREDUCE-2362</a>. Fix unit-test failures: TestBadRecords (NPE due to
rearranged MapTask code) and TestTaskTrackerMemoryManager
(need hostname in output-string pattern). (Greg Roelofs, Krishna
<li> <a href="">HDFS-1729</a>. Add statistics logging for better visibility into
startup time costs. (Matt Foley)
<li> <a href="">MAPREDUCE-2363</a>. When a queue is built without any access rights we
explain the problem. (Richard King)
<li> <a href="">MAPREDUCE-1563</a>. TaskDiagnosticInfo may be missed sometime. (Krishna
<li> <a href="">MAPREDUCE-2364</a>. Don't hold the rjob lock while localizing resources. (ddas
via omalley)
<li> <a href="">HDFS-1598</a>. Directory listing on hftp:// does not show
.*.crc files. (szetszwo)
<li> <a href="">MAPREDUCE-2365</a>. New counters for FileInputFormat (BYTES_READ) and
FileOutputFormat (BYTES_WRITTEN).
New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize.
(Siddharth Seth)
<li> <a href="">HADOOP-7040</a>. Change DiskErrorException to IOException (boryas)
<li> <a href="">HADOOP-7104</a>. Remove unnecessary DNS reverse lookups from RPC layer
<li> <a href="">MAPREDUCE-2366</a>. Fix a problem where the task browser UI can't retrieve the
stdxxx printouts of streaming jobs that abend in the unix code, in
the common case where the containing job doesn't reuse JVM's.
(Richard King)
<li> <a href="">HADOOP-6977</a>. Herriot daemon clients should vend statistics (cos)
<li> <a href="">HADOOP-6971</a>. Clover build doesn't generate per-test coverage (cos)
<li> <a href="">HADOOP-6879</a>. Provide SSH based (Jsch) remote execution API for system
tests. (cos)
<li> <a href="">MAPREDUCE-2355</a>. Add a configuration knob
mapreduce.tasktracker.outofband.heartbeat.damper that limits out of band
heartbeats (acmurthy)
<li> <a href="">MAPREDUCE-2356</a>. Fix a race-condition that corrupted a task's state on the
JobTracker. (Luke Lu)
<li> <a href="">MAPREDUCE-2357</a>. Always propagate IOExceptions that are thrown by
non-FileInputFormat. (Luke Lu)
<li> <a href="">HADOOP-7163</a>. RPC handles SocketTimeOutException during SASL negotiation.
<li> <a href="">MAPREDUCE-2358</a>. MapReduce assumes the default FileSystem is HDFS.
(Krishna Ramachandran)
<li> <a href="">MAPREDUCE-1904</a>. Reducing locking contention in TaskTracker's
MapOutputServlet LocalDirAllocator. (Rajesh Balamohan via acmurthy)
<li> <a href="">HDFS-1626</a>. Make BLOCK_INVALIDATE_LIMIT configurable. (szetszwo)
<li> <a href="">HDFS-1584</a>. Adds a check for whether relogin is needed to
getDelegationToken in HftpFileSystem. (Kan Zhang via ddas)
<li> <a href="">HADOOP-7115</a>. Reduces the number of calls to getpwuid_r and
getpwgid_r, by implementing a cache in NativeIO. (ddas)
<li> <a href="">HADOOP-6882</a>. An XSS security exploit in jetty-6.1.14. jetty upgraded to
6.1.26. (ddas)
<li> <a href="">MAPREDUCE-2278</a>. Fixes a memory leak in the TaskTracker. (cdouglas)
<li> <a href=" redux">HDFS-1353 redux</a>. Modulate original 1353 to not bump RPC version.
<li> <a href="">MAPREDUCE-2082</a> Race condition in writing the jobtoken password file when
launching pipes jobs (jitendra and ddas)
<a href="">HADOOP-6978</a>. Fixes task log servlet vulnerabilities via symlinks.
(Todd Lipcon and Devaraj Das)
<li> <a href="">MAPREDUCE-2178</a>. Write task initialization to avoid race
conditions leading to privilege escalation and resource leakage by
performing more actiions as the user. (Owen O'Malley, Devaraj Das,
Chris Douglas via cdouglas)
<li> <a href="">HDFS-1364</a>. HFTP client should support relogin from keytab
<li> <a href="">HADOOP-6907</a>. Make RPC client to use per-proxy configuration.
(Kan Zhang via ddas)
<li> <a href="">MAPREDUCE-2055</a>. Fix JobTracker to decouple job retirement from copy of
job-history file to HDFS and enhance RetiredJobInfo to carry aggregated
job-counters to prevent a disk roundtrip on job-completion to fetch
counters for the JobClient. (Krishna Ramachandran via acmurthy)
<a href="">HDFS-1353</a>. Remove most of getBlockLocation optimization (jghoman)
<li> <a href="">MAPREDUCE-2023</a>. TestDFSIO read test may not read specified bytes. (htang)
<li> <a href="">HDFS-1340</a>. A null delegation token is appended to the url if security is
disabled when browsing filesystem.(boryas)
<li> <a href="">HDFS-1352</a>. Fix jsvc.location. (jghoman)
<li> <a href="">HADOOP-6860</a>. 'compile-fault-inject' should never be called directly. (cos)
<li> <a href="">MAPREDUCE-2005</a>. TestDelegationTokenRenewal fails (boryas)
<li> <a href="">MAPREDUCE-2000</a>. Rumen is not able to extract counters for Job history logs
from Hadoop 0.20. (htang)
<li> <a href="">MAPREDUCE-1961</a>. ConcurrentModificationException when shutting down Gridmix.
<li> <a href="">HADOOP-6899</a>. RawLocalFileSystem set working directory does
not work for relative names. (suresh)
<li> <a href="">HDFS-495</a>. New clients should be able to take over files lease if the old
client died. (shv)
<li> <a href="">HADOOP-6728</a>. Re-design and overhaul of the Metrics framework. (Luke Lu via
<li> <a href="">MAPREDUCE-1966</a>. Change blacklisting of tasktrackers on task failures to be
a simple graylist to fingerpoint bad tasktrackers. (Greg Roelofs via
<li> <a href="">HADOOP-6864</a>. Add ability to get netgroups (as returned by getent
netgroup command) using native code (JNI) instead of forking. (Erik Steffl)
<li> <a href="">HDFS-1318</a>. HDFS Namenode and Datanode WebUI information needs to be
accessible programmatically for scripts. (Tanping Wang via suresh)
<li> <a href="">HDFS-1315</a>. Add fsck event to audit log and remove other audit log events
corresponding to FSCK listStatus and open calls. (suresh)
<li> <a href="">MAPREDUCE-1941</a>. Provides access to JobHistory file (raw) with job user/acl
permission. (Srikanth Sundarrajan via ddas)
<li> <a href="">MAPREDUCE-291.</a> Optionally a separate daemon should serve JobHistory.
(Srikanth Sundarrajan via ddas)
<li> <a href="">MAPREDUCE-1936</a>. Make Gridmix3 more customizable (sync changes from trunk).
<li> <a href="">HADOOP-5981</a>. Fix variable substitution during parsing of child environment
variables. (Krishna Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-339.</a> Greedily schedule failed tasks to cause early job failure.
<li> <a href="">MAPREDUCE-1872</a>. Hardened CapacityScheduler to have comprehensive, coherent
limits on tasks/jobs for jobs/users/queues. Also, added the ability to
refresh queue definitions without the need to restart the JobTracker.
<li> <a href="">HDFS-1161</a>. Make DN minimum valid volumes configurable. (shv)
<li> <a href="">HDFS-457</a>. Reintroduce volume failure tolerance for DataNodes. (shv)
<li> <a href=" Add start time, end time and total time taken for FSCK
to FSCK report">HDFS-1307 Add start time, end time and total time taken for FSCK
to FSCK report</a>. (suresh)
<li> <a href="">MAPREDUCE-1207</a>. Sanitize user environment of map/reduce tasks and allow
admins to set environment and java options. (Krishna Ramachandran via
<li> <a href=" - Add support in HDFS for new statistics added in FileSystem
to track the file system operations (suresh)
<li> HDFS-1301">HDFS-1298 - Add support in HDFS for new statistics added in FileSystem
to track the file system operations (suresh)
<li> HDFS-1301</a>. TestHDFSProxy need to use server side conf for ProxyUser
<li> <a href="">HADOOP-6859</a> - Introduce additional statistics to FileSystem to track
file system operations (suresh)
<li> <a href="">HADOOP-6818</a>. Provides a JNI implementation of Unix Group resolution. The
config should be set to to enable this
implementation. (ddas)
<li> <a href="">MAPREDUCE-1938</a>. Introduces a configuration for putting user classes before
the system classes during job submission and in task launches. Two things
need to be done in order to use this feature -
(1) mapreduce.user.classpath.first : this should be set to true in the
jobconf, and, (2) HADOOP_USER_CLASSPATH_FIRST : this is relevant for job
submissions done using bin/hadoop shell script. HADOOP_USER_CLASSPATH_FIRST
should be defined in the environment with some non-empty value
(like "true"), and then bin/hadoop should be executed. (ddas)
<li> <a href="">HADOOP-6669</a>. Respect compression configuration when creating DefaultCodec
compressors. (Koji Noguchi via cdouglas)
<li> <a href="">HADOOP-6855</a>. Add support for netgroups, as returned by command
getent netgroup. (Erik Steffl)
<li> <a href="">HDFS-599</a>. Allow NameNode to have a seprate port for service requests from
client requests. (Dmytro Molkov via hairong)
<li> <a href="">HDFS-132</a>. Fix namenode to not report files deleted metrics for deletions
done while replaying edits during startup. (shv)
<li> <a href="">MAPREDUCE-1521</a>. Protection against incorrectly configured reduces
<li> <a href="">MAPREDUCE-1936</a>. Make Gridmix3 more customizable. (htang)
<li> <a href="">MAPREDUCE-517.</a> Enhance the CapacityScheduler to assign multiple tasks
per-heartbeat. (acmurthy)
<li> <a href="">MAPREDUCE-323.</a> Re-factor layout of JobHistory files on HDFS to improve
operability. (Dick King via acmurthy)
<li> <a href="">MAPREDUCE-1921</a>. Ensure exceptions during reading of input data in map
tasks are augmented by information about actual input file which caused
the exception. (Krishna Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-1118</a>. Enhance the JobTracker web-ui to ensure tabular columns
are sortable, also added a /scheduler servlet to CapacityScheduler for
enhanced UI for queue information. (Krishna Ramachandran via acmurthy)
<li> <a href="">HADOOP-5913</a>. Add support for starting/stopping queues. (cdouglas)
<li> <a href="">HADOOP-6835</a>. Add decode support for concatenated gzip files. (Greg Roelofs)
<li> <a href="">HDFS-1158</a>. Revert <a href="">HDFS-457</a>. (shv)
<li> <a href="">MAPREDUCE-1699</a>. Ensure JobHistory isn't disabled for any reason. (Krishna
Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-1682</a>. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
<li> <a href="">MAPREDUCE-1914</a>. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
<li> <a href="">HADOOP-6713</a>. Multiple RPC Reader Threads (Bharathm)
<li> <a href="">HDFS-1250</a>. Namenode should reject block reports and block received
requests from dead datanodes (suresh)
<li> <a href="">MAPREDUCE-1863</a>. [Rumen] Null failedMapAttemptCDFs in job traces generated
by Rumen. (htang)
<li> <a href="">MAPREDUCE-1309</a>. Rumen refactory. (htang)
<li> <a href="">HDFS-1114</a>. Implement LightWeightGSet for BlocksMap in order to reduce
NameNode memory footprint. (szetszwo)
<li> <a href="">MAPREDUCE-572.</a> Fixes DistributedCache.checkURIs to throw error if link is
missing for uri in cache archives. (amareshwari)
<li> <a href="">MAPREDUCE-787.</a> Fix JobSubmitter to honor user given symlink in the path.
<li> <a href="">HADOOP-6815</a>. refreshSuperUserGroupsConfiguration should use
server side configuration for the refresh( boryas)
<li> <a href="">MAPREDUCE-1868</a>. Add a read and connection timeout to JobClient while
pulling tasklogs. (Krishna Ramachandran via acmurthy)
<li> <a href="">HDFS-1119</a>. Introduce a GSet interface to BlocksMap. (szetszwo)
<li> <a href="">MAPREDUCE-1778</a>. Ensure failure to setup CompletedJobStatusStore is not
silently ignored by the JobTracker. (Krishna Ramachandran via acmurthy)
<li> <a href="">MAPREDUCE-1538</a>. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
<li> <a href="">MAPREDUCE-1850</a>. Add information about the host from which a job is
submitted. (Krishna Ramachandran via acmurthy)
<li> <a href="">HDFS-1110</a>. Reuses objects for commonly used file names in namenode to
reduce the heap usage. (suresh)
<li> <a href="">HADOOP-6810</a>. Extract a subset of tests for smoke (DOA) validation. (cos)
<li> <a href="">HADOOP-6642</a>. Remove debug stmt left from original patch. (cdouglas)
<li> <a href="">HADOOP-6808</a>. Add comments on how to setup File/Ganglia Context for
kerberos metrics (Erik Steffl)
<li> <a href="">HDFS-1061</a>. INodeFile memory optimization. (bharathm)
<li> <a href="">HDFS-1109</a>. HFTP supports filenames that contains the character "+".
(Dmytro Molkov via dhruba, backported by szetszwo)
<li> <a href="">HDFS-1085</a>. Check file length and bytes read when reading a file through
hftp in order to detect failure. (szetszwo)
<li> <a href="">HDFS-1311</a>. Running tests with 'testcase' cause triple execution of the
same test case (cos)
<li> <a href="">HDFS-1150</a>.FIX. Verify datanodes' identities to clients in secure clusters.
Update to patch to improve handling of jsvc source in build.xml (jghoman)
<li> <a href="">HADOOP-6752</a>. Remote cluster control functionality needs JavaDocs
improvement. (Balaji Rajagopalan via cos)
<li> <a href="">MAPREDUCE-1288</a>. Fixes TrackerDistributedCacheManager to take into account
the owner of the localized file in the mapping from cache URIs to
CacheStatus objects. (ddas)
<li> <a href="">MAPREDUCE-1682</a>. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
<li> <a href="">MAPREDUCE-1914</a>. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
<li> <a href="">MAPREDUCE-1538</a>. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
<li> <a href="">MAPREDUCE-1900</a>. Fixes a FS leak that i missed in the earlier patch.
<li> <a href="">MAPREDUCE-1900</a>. Makes JobTracker/TaskTracker close filesystems, created
on behalf of users, when they are no longer needed. (ddas)
<li> <a href="">HADOOP-6832</a>. Add a static user plugin for web auth for external users.
<li> <a href="">HDFS-1007</a>. Fixes a bug in SecurityUtil.buildDTServiceName to do
with handling of null hostname. (omalley)
<li> <a href="">HDFS-1007</a>. makes long running servers using hftp work. Also has some
refactoring in the MR code to do with handling of delegation tokens.
(omalley & ddas)
<li> <a href="">HDFS-1178</a>. The NameNode servlets should not use RPC to connect to the
NameNode. (omalley)
<li> <a href="">MAPREDUCE-1807</a>. Re-factor TestQueueManager. (Richard King via acmurthy)
<li> <a href="">HDFS-1150</a>. Fixes the earlier patch to do logging in the right directory
and also adds facility for monitoring processes (via -Dprocname in the
command line). (Jakob Homan via ddas)
<li> <a href="">HADOOP-6781</a>. security audit log shouldn't have exception in it. (boryas)
<li> <a href="">HADOOP-6776</a>. Fixes the javadoc in UGI.createProxyUser. (ddas)
<li> <a href="">HDFS-1150</a>. building jsvc from source tar. source tar is also checked in.
<li> <a href="">HDFS-1150</a>. Bugfix in the hadoop shell script. (ddas)
<li> <a href="">HDFS-1153</a>. The navigation to /dfsnodelist.jsp with invalid input
parameters produces NPE and HTTP 500 error (rphulari)
<a href="">MAPREDUCE-1664</a>. Bugfix to enable queue administrators of a queue to
view job details of jobs submitted to that queue even though they
are not part of acl-view-job.
<li> <a href="">HDFS-1150</a>. Bugfix to add more knobs to secure datanode starter.
<li> <a href="">HDFS-1157</a>. Modifications introduced by <a href=" are breaking aspect's
bindings (cos)
<li> HDFS-1130">HDFS-1150 are breaking aspect's
bindings (cos)
<li> HDFS-1130</a>. Adds a configuration dfs.cluster.administrators for
controlling access to the default servlets in hdfs. (ddas)
<li> <a href="">HADOOP-6706</a>.FIX. Relogin behavior for RPC clients could be improved
<li> <a href="">HDFS-1150</a>. Verify datanodes' identities to clients in secure clusters.
<li> <a href="">MAPREDUCE-1442</a>. Fixed regex in job-history related to parsing Counter
values. (Luke Lu via acmurthy)
<li> <a href="">HADOOP-6760</a>. WebServer shouldn't increase port number in case of negative
port setting caused by Jetty's race. (cos)
<li> <a href="">HDFS-1146</a>. Javadoc for getDelegationTokenSecretManager in FSNamesystem.
<li> <a href="">HADOOP-6706</a>. Fix on top of the earlier patch. Closes the connection
on a SASL connection failure, and retries again with a new
connection. (ddas)
<li> <a href="">MAPREDUCE-1716</a>. Fix on top of earlier patch for logs truncation a.k.a
<a href="">MAPREDUCE-1100</a>. Addresses log truncation issues when binary data is
written to log files and adds a header to a truncated log file to
inform users of the done trucation.
<li> <a href="">HDFS-1383</a>. Improve the error messages when using hftp://.
<li> <a href="">MAPREDUCE-1744</a>. Fixed DistributedCache apis to take a user-supplied
FileSystem to allow for better proxy behaviour for Oozie. (Richard King)
<li> <a href="">MAPREDUCE-1733</a>. Authentication between pipes processes and java
counterparts. (jitendra)
<li> <a href="">MAPREDUCE-1664</a>. Bugfix on top of the previous patch. (ddas)
<li> <a href="">HDFS-1136</a>. FileChecksumServlets.RedirectServlet doesn't carry forward
the delegation token (boryas)
<li> <a href="">HADOOP-6756</a>. Change value of FS_DEFAULT_NAME_KEY from fs.defaultFS
to which is a correct name for 0.20 (steffl)
<li> <a href="">HADOOP-6756</a>. Document (javadoc comments) and cleanup configuration
keys in (steffl)
<li> <a href="">MAPREDUCE-1759</a>. Exception message for unauthorized user doing killJob,
killTask, setJobPriority needs to be improved. (gravi via vinodkv)
<li> <a href="">HADOOP-6715</a>. AccessControlList.toString() returns empty string when
we set acl to "*". (gravi via vinodkv)
<li> <a href="">HADOOP-6757</a>. NullPointerException for hadoop clients launched from
streaming tasks. (amarrk via vinodkv)
<li> <a href="">HADOOP-6631</a>. FileUtil.fullyDelete() should continue to delete other files
despite failure at any level. (vinodkv)
<li> <a href="">MAPREDUCE-1317</a>. NPE in setHostName in Rumen. (rksingh)
<li> <a href="">MAPREDUCE-1754</a>. Replace mapred.persmissions.supergroup with an acl :
mapreduce.cluster.administrators and <a href="">HADOOP-6748</a>.: Remove
hadoop.cluster.administrators. Contributed by Amareshwari Sriramadasu.
<li> <a href="">HADOOP-6701</a>. Incorrect exit codes for "dfs -chown", "dfs -chgrp"
<li> <a href="">HADOOP-6640</a>. FileSystem.get() does RPC retires within a static
synchronized block. (hairong)
<li> <a href="">HDFS-1006</a>. Removes unnecessary logins from the previous patch. (ddas)
<li> <a href="">HADOOP-6745</a>. adding some java doc to Server.RpcMetrics, UGI (boryas)
<li> <a href="">MAPREDUCE-1707</a>. TaskRunner can get NPE in getting ugi from TaskTracker.
<li> <a href="">HDFS-1104</a>. Fsck triggers full GC on NameNode. (hairong)
<li> <a href="">HADOOP-6332</a>. Large-scale Automated Test Framework (sharad, Sreekanth
Ramakrishnan, at all via cos)
<li> <a href="">HADOOP-6526</a>. Additional fix for test context on top of existing one. (cos)
<li> <a href="">HADOOP-6710</a>. Symbolic umask for file creation is not conformant with posix.
<li> <a href="">HADOOP-6693</a>. Added metrics to track kerberos login success and failure.
<li> <a href="">MAPREDUCE-1711</a>. Gridmix should provide an option to submit jobs to the same
queues as specified in the trace. (rksing via htang)
<li> <a href="">MAPREDUCE-1687</a>. Stress submission policy does not always stress the
cluster. (htang)
<li> <a href="">MAPREDUCE-1641</a>. Bug-fix to ensure command line options such as
-files/-archives are checked for duplicate artifacts in the
DistributedCache. (Amareshwari Sreeramadasu via acmurthy)
<li> <a href="">MAPREDUCE-1641</a>. Fix DistributedCache to ensure same files cannot be put in
both the archives and files sections. (Richard King via acmurthy)
<li> <a href="">HADOOP-6670</a>. Fixes a testcase issue introduced by the earlier commit
of the <a href="">HADOOP-6670</a> patch. (ddas)
<li> <a href="">MAPREDUCE-1718</a>. Fixes a problem to do with correctly constructing
service name for the delegation token lookup in HftpFileSystem
(borya via ddas)
<li> <a href="">HADOOP-6674</a>. Fixes the earlier patch to handle pings correctly (ddas).
<li> <a href="">MAPREDUCE-1664</a>. Job Acls affect when Queue Acls are set.
(Ravi Gummadi via vinodkv)
<li> <a href="">HADOOP-6718</a>. Fixes a problem to do with clients not closing RPC
connections on a SASL failure. (ddas)
<li> <a href="">MAPREDUCE-1397</a>. NullPointerException observed during task failures.
(Amareshwari Sriramadasu via vinodkv)
<li> <a href="">HADOOP-6670</a>. Use the UserGroupInformation's Subject as the criteria for
equals and hashCode. (omalley)
<li> <a href="">HADOOP-6716</a>. System won't start in non-secure mode when kerb5.conf
( on Mac) is not present. (boryas)
<li> <a href="">MAPREDUCE-1607</a>. Task controller may not set permissions for a
task cleanup attempt's log directory. (Amareshwari Sreeramadasu via
<li> <a href="">MAPREDUCE-1533</a>. JobTracker performance enhancements. (Amar Kamat via
<li> <a href="">MAPREDUCE-1701</a>. AccessControlException while renewing a delegation token
in not correctly handled in the JobTracker. (boryas)
<li> <a href="">HDFS-481</a>. Incremental patch to fix broken unit test in contrib/hdfsproxy
<li> <a href="">HADOOP-6706</a>. Fixes a bug in the earlier version of the same patch (ddas)
<li> <a href="">HDFS-1096</a>. allow dfsadmin/mradmin refresh of superuser proxy group
<li> <a href="">HDFS-1012</a>. Support for cluster specific path entries in ldap for hdfsproxy
(Srikanth Sundarrajan via Nicholas)
<li> <a href="">HDFS-1011</a>. Improve Logging in HDFSProxy to include cluster name associated
with the request (Srikanth Sundarrajan via Nicholas)
<li> <a href="">HDFS-1010</a>. Retrieve group information from UnixUserGroupInformation
instead of LdapEntry (Srikanth Sundarrajan via Nicholas)
<li> <a href="">HDFS-481</a>. Bug fix - hdfsproxy: Stack overflow + Race conditions
(Srikanth Sundarrajan via Nicholas)
<li> <a href="">MAPREDUCE-1657</a>. After task logs directory is deleted, tasklog servlet
displays wrong error message about job ACLs. (Ravi Gummadi via vinodkv)
<li> <a href="">MAPREDUCE-1692</a>. Remove TestStreamedMerge from the streaming tests.
(Amareshwari Sriramadasu and Sreekanth Ramakrishnan via vinodkv)
<li> <a href="">HDFS-1081</a>. Performance regression in
DistributedFileSystem::getFileBlockLocations in secure systems (jhoman)
<a href="">MAPREDUCE-1656</a>. JobStory should provide queue info. (htang)
<li> <a href="">MAPREDUCE-1317</a>. Reducing memory consumption of rumen objects. (htang)
<li> <a href="">MAPREDUCE-1317</a>. Reverting the patch since it caused build failures. (htang)
<li> <a href="">MAPREDUCE-1683</a>. Fixed jobtracker web-ui to correctly display heap-usage.
<a href="">HADOOP-6706</a>. Fixes exception handling for saslConnect. The ideal
solution is to the Refreshable interface but as Owen noted in
<a href="">HADOOP-6656</a>, it doesn't seem to work as expected. (ddas)
<li> <a href="">MAPREDUCE-1617</a>. TestBadRecords failed once in our test runs. (Amar
Kamat via vinodkv).
<li> <a href="">MAPREDUCE-587.</a> Stream test TestStreamingExitStatus fails with Out of
Memory. (Amar Kamat via vinodkv).
<li> <a href="">HDFS-1096</a>. Reverting the patch since it caused build failures. (ddas)
<li> <a href="">MAPREDUCE-1317</a>. Reducing memory consumption of rumen objects. (htang)
<li> <a href="">MAPREDUCE-1680</a>. Add a metric to track number of heartbeats processed by the
JobTracker. (Richard King via acmurthy)
<li> <a href="">MAPREDUCE-1683</a>. Removes JNI calls to get jvm current/max heap usage in
ClusterStatus by default. (acmurthy)
<li> <a href="">HADOOP-6687</a>. user object in the subject in UGI should be reused in case
of a relogin. (jitendra)
<li> <a href="">HADOOP-5647</a>. TestJobHistory fails if /tmp/_logs is not writable to.
Testcase should not depend on /tmp. (Ravi Gummadi via vinodkv)
<li> <a href="">MAPREDUCE-181.</a> Bug fix for Secure job submission. (Ravi Gummadi via
<li> <a href="">MAPREDUCE-1635</a>. ResourceEstimator does not work after <a href="">MAPREDUCE-842.</a>
(Amareshwari Sriramadasu via vinodkv)
<li> <a href="">MAPREDUCE-1526</a>. Cache the job related information while submitting the
job. (rksingh)
<li> <a href="">HADOOP-6674</a>. Turn off SASL checksums for RPCs. (jitendra via omalley)
<li> <a href="">HADOOP-5958</a>. Replace fork of DF with library call. (cdouglas via omalley)
<li> <a href="">HDFS-999</a>. Secondary namenode should login using kerberos if security
is configured. Bugfix to original patch. (jhoman)
<li> <a href="">MAPREDUCE-1594</a>. Support for SleepJobs in Gridmix (rksingh)
<li> <a href="">HDFS-1007</a>. Fix. ServiceName for delegation token for Hftp has hftp
port and not RPC port.
<a href="">MAPREDUCE-1376</a>. Support for varied user submissions in Gridmix (rksingh)
<li> <a href="">HDFS-1080</a>. SecondaryNameNode image transfer should use the defined
http address rather than local ip address (jhoman)
<a href="">HADOOP-6661</a>. User document for UserGroupInformation.doAs for secure
impersonation. (jitendra)
<li> <a href="">MAPREDUCE-1624</a>. Documents the job credentials and associated details
to do with delegation tokens (ddas)
<a href="">HDFS-1036</a>. Documentation for fetchdt for forrest (boryas)
<a href="">HDFS-1039</a>. New patch on top of previous patch. Gets namenode address
from conf. (jitendra)