Apache Hadoop 0.20.205.0 Release Notes

These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.

HADOOP-7724 | Major | hadoop-setup-conf.sh should put proxy user info into the core-site.xml

Fixed hadoop-setup-conf.sh to put proxy user in core-site.xml. (Arpit Gupta via Eric Yang)

HADOOP-7720 | Major | improve the hadoop-setup-conf.sh to read in the hbase user and setup the configs

Added parameter for HBase user to setup config script. (Arpit Gupta via Eric Yang)

HADOOP-7715 | Major | see log4j Error when running mr jobs and certain dfs calls

Removed unnecessary security logger configuration. (Eric Yang)

HADOOP-7711 | Major | hadoop-env.sh generated from templates has duplicate info

Fixed recursive sourcing of HADOOP_OPTS environment variables (Arpit Gupta via Eric Yang)

HADOOP-7708 | Critical | config generator does not update the properties file if on exists already

Fixed hadoop-setup-conf.sh to handle config file consistently. (Eric Yang)

HADOOP-7707 | Major | improve config generator to allow users to specify proxy user, turn append on or off, turn webhdfs on or off

Added toggle for dfs.support.append, webhdfs and hadoop proxy user to setup config script. (Arpit Gupta via Eric Yang)

HADOOP-7691 | Major | hadoop deb pkg should take a diff group id

Fixed conflict uid for install packages. (Eric Yang)

HADOOP-7684 | Major | jobhistory server and secondarynamenode should have init.d script

Added init.d script for jobhistory server and secondary namenode. (Eric Yang)

HADOOP-7681 | Minor | log4j.properties is missing properties for security audit and hdfs audit should be changed to info

HADOOP-7681. Fixed security and hdfs audit log4j properties (Arpit Gupta via Eric Yang)

HADOOP-7655 | Major | provide a small validation script that smoke tests the installed cluster

Committed to trunk and v23, since code reviewed by Eric.

HADOOP-7603 | Major | Set default hdfs, mapred uid, and hadoop group gid for RPM packages

Set hdfs uid, mapred uid, and hadoop gid to fixed numbers (201, 202, and 123, respectively).

HADOOP-7119 | Major | add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

Adding support for Kerberos HTTP SPNEGO authentication to the Hadoop web-consoles

HDFS-2358 | Major | NPE when the default filesystem's uri has no authority

Give meaningful error message instead of NPE.

HDFS-2338 | Major | Configuration option to enable/disable webhdfs.

Added a conf property dfs.webhdfs.enabled for enabling/disabling webhdfs.

HDFS-2318 | Major | Provide authentication to webhdfs using SPNEGO

Added two new conf properties dfs.web.authentication.kerberos.principal and dfs.web.authentication.kerberos.keytab for the SPNEGO servlet filter.

HDFS-2202 | Major | Changes to balancer bandwidth should not require datanode restart.

New dfsadmin command added: [-setBalancerBandwidth <bandwidth>] where bandwidth is max network bandwidth in bytes per second that the balancer is allowed to use on each datanode during balacing.

This is an incompatible change in 0.23. The versions of ClientProtocol and DatanodeProtocol are changed.

HDFS-1554 | Major | Append 0.20: New semantics for recoverLease

Change recoverLease API to return if the file is closed or not. It also change the semantics of recoverLease to start lease recovery immediately.

HDFS-630 | Major | In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

WARNING: No release note provided for this incompatible change.

MAPREDUCE-3112 | Major | Calling hadoop cli inside mapreduce job leads to errors

Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.

MAPREDUCE-3081 | Major | Change the name format for hadoop core and vaidya jar to be hadoop-{core/vaidya}-{version}.jar in vaidya.sh

contrib/vaidya/bin/vaidya.sh script fixed to use appropriate jars and classpath

MAPREDUCE-2777 | Major | Backport MAPREDUCE-220 to Hadoop 20 security branch

Adds cumulative cpu usage and total heap usage to task counters. This is a backport of MAPREDUCE-220 and MAPREDUCE-2469.

MAPREDUCE-2764 | Major | Fix renewal of dfs delegation tokens

Generalizes token renewal and canceling to a common interface and provides a plugin interface for adding renewers for new kinds of tokens. Hftp changed to store the tokens as HFTP and renew them over http.

MAPREDUCE-2494 | Major | Make the distributed cache delete entires using LRU priority

Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the target percentage of the local distributed cache that should be kept in between garbage collection runs. In practice it will delete unused distributed cache entries in LRU order until the size of the cache is less than mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size. This is a floating point value between 0.0 and 1.0. The default is 0.95.

MAPREDUCE-2187 | Major | map tasks timeout during sorting

I just committed this. Thanks Anupam!