blob: f6eefa23e0463db05366f405b44b6fbc774946c7 [file] [log] [blame]
Hadoop Change Log
Release 0.22.0 - Unreleased
INCOMPATIBLE CHANGES
NEW FEATURES
HADOOP-6791. Refresh for proxy superuser config
(common part for HDFS-1096) (boryas)
HADOOP-6581. Add authenticated TokenIdentifiers to UGI so that
they can be used for authorization (Kan Zhang and Jitendra Pandey
via jghoman)
HADOOP-6584. Provide Kerberized SSL encryption for webservices.
(jghoman and Kan Zhang via jghoman)
HADOOP-6853. Common component of HDFS-1045. (jghoman)
HADOOP-6859 - Introduce additional statistics to FileSystem to track
file system operations (suresh)
HADOOP-6870. Add a new API getFiles to FileSystem and FileContext that
lists all files under the input path or the subtree rooted at the
input path if recursive is true. Block locations are returned together
with each file's status. (hairong)
HADOOP-6888. Add a new FileSystem API closeAllForUGI(..) for closing all
file systems associated with a particular UGI. (Devaraj Das and Kan Zhang
via szetszwo)
HADOOP-6892. Common component of HDFS-1150 (Verify datanodes' identities
to clients in secure clusters) (jghoman)
HADOOP-6889. Make RPC to have an option to timeout. (hairong)
HADOOP-6996. Allow CodecFactory to return a codec object given a codec'
class name. (hairong)
HADOOP-7013. Add boolean field isCorrupt to BlockLocation.
(Patrick Kling via hairong)
HADOOP-6978. Adds support for NativeIO using JNI.
(Todd Lipcon, Devaraj Das & Owen O'Malley via ddas)
IMPROVEMENTS
HADOOP-6644. util.Shell getGROUPS_FOR_USER_COMMAND method name
- should use common naming convention (boryas)
HADOOP-6778. add isRunning() method to
AbstractDelegationTokenSecretManager (for HDFS-1044) (boryas)
HADOOP-6633. normalize property names for JT/NN kerberos principal
names in configuration (boryas)
HADOOP-6627. "Bad Connection to FS" message in FSShell should print
message from the exception (boryas)
HADOOP-6600. mechanism for authorization check for inter-server
protocols. (boryas)
HADOOP-6623. Add StringUtils.split for non-escaped single-character
separator. (Todd Lipcon via tomwhite)
HADOOP-6761. The Trash Emptier has the ability to run more frequently.
(Dmytro Molkov via dhruba)
HADOOP-6714. Resolve compressed files using CodecFactory in FsShell::text.
(Patrick Angeles via cdouglas)
HADOOP-6661. User document for UserGroupInformation.doAs.
(Jitendra Pandey via jghoman)
HADOOP-6674. Makes use of the SASL authentication options in the
SASL RPC. (Jitendra Pandey via ddas)
HADOOP-6526. Need mapping from long principal names to local OS
user names. (boryas)
HADOOP-6814. Adds an API in UserGroupInformation to get the real
authentication method of a passed UGI. (Jitendra Pandey via ddas)
HADOOP-6756. Documentation for common configuration keys.
(Erik Steffl via shv)
HADOOP-6835. Add support for concatenated gzip input. (Greg Roelofs via
cdouglas)
HADOOP-6845. Renames the TokenStorage class to Credentials.
(Jitendra Pandey via ddas)
HADOOP-6826. FileStatus needs unit tests. (Rodrigo Schmidt via Eli
Collins)
HADOOP-6905. add buildDTServiceName method to SecurityUtil
(as part of MAPREDUCE-1718) (boryas)
HADOOP-6632. Adds support for using different keytabs for different
servers in a Hadoop cluster. In the earier implementation, all servers
of a certain type (like TaskTracker), would have the same keytab and the
same principal. Now the principal name is a pattern that has _HOST in it.
(Kan Zhang & Jitendra Pandey via ddas)
HADOOP-6861. Adds new non-static methods in Credentials to read and
write token storage file. (Jitendra Pandey & Owen O'Malley via ddas)
HADOOP-6877. Common part of HDFS-1178 (NameNode servlets should communicate
with NameNode directrly). (Kan Zhang via jghoman)
HADOOP-6475. Adding some javadoc to Server.RpcMetrics, UGI.
(Jitendra Pandey and borya via jghoman)
HADOOP-6656. Adds a thread in the UserGroupInformation to renew TGTs
periodically. (Owen O'Malley and ddas via ddas)
HADOOP-6890. Improve listFiles API introduced by HADOOP-6870. (hairong)
HADOOP-6862. Adds api to add/remove user and group to AccessControlList
(amareshwari)
HADOOP-6911. doc update for DelegationTokenFetcher (boryas)
HADOOP-6900. Make the iterator returned by FileSystem#listLocatedStatus to
throw IOException rather than RuntimeException when there is an IO error
fetching the next file. (hairong)
HADOOP-6905. Better logging messages when a delegation token is invalid.
(Kan Zhang via jghoman)
HADOOP-6693. Add metrics to track kerberol login activity. (suresh)
HADOOP-6803. Add native gzip read/write coverage to TestCodec.
(Eli Collins via tomwhite)
HADOOP-6950. Suggest that HADOOP_CLASSPATH should be preserved in
hadoop-env.sh.template. (Philip Zeyliger via Eli Collins)
HADOOP-6922. Make AccessControlList a writable and update documentation
for Job ACLs. (Ravi Gummadi via vinodkv)
HADOOP-6965. Introduces checks for whether the original tgt is valid
in the reloginFromKeytab method.
HADOOP-6856. Simplify constructors for SequenceFile, and MapFile. (omalley)
HADOOP-6987. Use JUnit Rule to optionally fail test cases that run more
than 10 seconds (jghoman)
HADOOP-7005. Update test-patch.sh to remove callback to Hudson. (nigel)
HADOOP-6985. Suggest that HADOOP_OPTS be preserved in
hadoop-env.sh.template. (Ramkumar Vadali via cutting)
HADOOP-7007. Update the hudson-test-patch ant target to work with the
latest test-patch.sh script (gkesavan)
HADOOP-7010. Typo in FileSystem.java. (Jingguo Yao via eli)
HADOOP-7009. MD5Hash provides a public factory method that creates an
instance of thread local MessageDigest. (hairong)
HADOOP-7008. Enable test-patch.sh to have a configured number of acceptable
findbugs and javadoc warnings. (nigel and gkesavan)
HADOOP-6818. Provides a JNI implementation of group resolution. (ddas)
HADOOP-6943. The GroupMappingServiceProvider interface should be public.
(Aaron T. Myers via tomwhite)
HADOOP-4675. Current Ganglia metrics implementation is incompatible with
Ganglia 3.1. (Brian Bockelman via tomwhite)
HADOOP-6977. Herriot daemon clients should vend statistics (cos)
HADOOP-7024. Create a test method for adding file systems during tests.
(Kan Zhang via jghoman)
HADOOP-6903. Make AbstractFSileSystem methods and some FileContext methods to be public. (Sanjay Radia via Sanjay Radia)
HADOOP-7034. Add TestPath tests to cover dot, dot dot, and slash normalization. (eli)
HADOOP-7032. Assert type constraints in the FileStatus constructor. (eli)
HADOOP-6562. FileContextSymlinkBaseTest should use FileContextTestHelper. (eli)
HADOOP-7028. ant eclipse does not include requisite ant.jar in the
classpath. (Patrick Angeles via eli)
OPTIMIZATIONS
HADOOP-6884. Add LOG.isDebugEnabled() guard for each LOG.debug(..).
(Erik Steffl via szetszwo)
HADOOP-6683. ZlibCompressor does not fully utilize the buffer.
(Kang Xiao via eli)
BUG FIXES
HADOOP-6638. try to relogin in a case of failed RPC connection (expired
tgt) only in case the subject is loginUser or proxyUgi.realUser. (boryas)
HADOOP-6781. security audit log shouldn't have exception in it. (boryas)
HADOOP-6612. Protocols RefreshUserToGroupMappingsProtocol and
RefreshAuthorizationPolicyProtocol will fail with security enabled (boryas)
HADOOP-6764. Remove verbose logging from the Groups class. (Boris Shkolnik)
HADOOP-6730. Bug in FileContext#copy and provide base class for
FileContext tests. (Ravi Phulari via jghoman)
HADOOP-6669. Respect compression configuration when creating DefaultCodec
instances. (Koji Noguchi via cdouglas)
HADOOP-6747. TestNetUtils fails on Mac OS X. (Todd Lipcon via jghoman)
HADOOP-6787. Factor out glob pattern code from FileContext and
Filesystem. Also fix bugs identified in HADOOP-6618 and make the
glob pattern code less restrictive and more POSIX standard
compliant. (Luke Lu via eli)
HADOOP-6649. login object in UGI should be inside the subject (jnp via
boryas)
HADOOP-6687. user object in the subject in UGI should be reused in case
of a relogin. (jnp via boryas)
HADOOP-6603. Provide workaround for issue with Kerberos not resolving
cross-realm principal (Kan Zhang and Jitendra Pandey via jghoman)
HADOOP-6620. NPE if renewer is passed as null in getDelegationToken.
(Jitendra Pandey via jghoman)
HADOOP-6613. Moves the RPC version check ahead of the AuthMethod check.
(Kan Zhang via ddas)
HADOOP-6682. NetUtils:normalizeHostName does not process hostnames starting
with [a-f] correctly. (jghoman)
HADOOP-6652. Removes the unnecessary cache from
ShellBasedUnixGroupsMapping. (ddas)
HADOOP-6815. refreshSuperUserGroupsConfiguration should use server side
configuration for the refresh (boryas)
HADOOP-6648. Adds a check for null tokens in Credentials.addToken api.
(ddas)
HADOOP-6647. balancer fails with "is not authorized for protocol
interface NamenodeProtocol" in secure environment (boryas)
HADOOP-6834. TFile.append compares initial key against null lastKey
(hong tang via mahadev)
HADOOP-6670. Use the UserGroupInformation's Subject as the criteria for
equals and hashCode. (Owen O'Malley and Kan Zhang via ddas)
HADOOP-6536. Fixes FileUtil.fullyDelete() not to delete the contents of
the sym-linked directory. (Ravi Gummadi via amareshwari)
HADOOP-6873. using delegation token over hftp for long
running clients (boryas)
HADOOP-6706. Improves the sasl failure handling due to expired tickets,
and other server detected failures. (Jitendra Pandey and ddas via ddas)
HADOOP-6715. Fixes AccessControlList.toString() to return a descriptive
String representation of the ACL. (Ravi Gummadi via amareshwari)
HADOOP-6885. Fix java doc warnings in Groups and
RefreshUserMappingsProtocol. (Eli Collins via jghoman)
HADOOP-6482. GenericOptionsParser constructor that takes Options and
String[] ignores options. (Eli Collins via jghoman)
HADOOP-6906. FileContext copy() utility doesn't work with recursive
copying of directories. (vinod k v via mahadev)
HADOOP-6453. Hadoop wrapper script shouldn't ignore an existing
JAVA_LIBRARY_PATH. (Chad Metcalf via jghoman)
HADOOP-6932. Namenode start (init) fails because of invalid kerberos
key, even when security set to "simple" (boryas)
HADOOP-6913. Circular initialization between UserGroupInformation and
KerberosName (Kan Zhang via boryas)
HADOOP-6907. Rpc client doesn't use the per-connection conf to figure
out server's Kerberos principal (Kan Zhang via hairong)
HADOOP-6938. ConnectionId.getRemotePrincipal() should check if security
is enabled. (Kan Zhang via hairong)
HADOOP-6930. AvroRpcEngine doesn't work with generated Avro code.
(sharad)
HADOOP-6940. RawLocalFileSystem's markSupported method misnamed
markSupport. (Tom White via eli).
HADOOP-6951. Distinct minicluster services (e.g. NN and JT) overwrite each
other's service policies. (Aaron T. Myers via tomwhite)
HADOOP-6879. Provide SSH based (Jsch) remote execution API for system
tests (cos)
HADOOP-6989. Correct the parameter for SetFile to set the value type
for SetFile to be NullWritable instead of the key. (cdouglas via omalley)
HADOOP-6984. Combine the compress kind and the codec in the same option
for SequenceFiles. (cdouglas via omalley)
HADOOP-6933. TestListFiles is flaky. (Todd Lipcon via tomwhite)
HADOOP-6947. Kerberos relogin should set refreshKrb5Config to true.
(Todd Lipcon via tomwhite)
HADOOP-7006. Fix 'fs -getmerge' command to not be a no-op.
(Chris Nauroth via cutting)
HADOOP-6663. BlockDecompressorStream get EOF exception when decompressing
the file compressed from empty file. (Kang Xiao via tomwhite)
HADOOP-6991. Fix SequenceFile::Reader to honor file lengths and call
openFile (cdouglas via omalley)
HADOOP-7011. Fix KerberosName.main() to not throw an NPE.
(Aaron T. Myers via tomwhite)
HADOOP-6975. Integer overflow in S3InputStream for blocks > 2GB.
(Patrick Kling via tomwhite)
HADOOP-6758. MapFile.fix does not allow index interval definition.
(Gianmarco De Francisci Morales via tomwhite)
HADOOP-6926. SocketInputStream incorrectly implements read().
(Todd Lipcon via tomwhite)
HADOOP-6899 RawLocalFileSystem#setWorkingDir() does not work for relative names
(Sanjay Radia)
HADOOP-6496. HttpServer sends wrong content-type for CSS files
(and others). (Todd Lipcon via tomwhite)
HADOOP-7057. IOUtils.readFully and IOUtils.skipFully have typo in
exception creation's message. (cos)
Release 0.21.1 - Unreleased
IMPROVEMENTS
HADOOP-6934. Test for ByteWritable comparator.
(Johannes Zillmann via Eli Collins)
HADOOP-6786. test-patch needs to verify Herriot integrity (cos)
BUG FIXES
HADOOP-6925. BZip2Codec incorrectly implements read().
(Todd Lipcon via Eli Collins)
HADOOP-6833. IPC leaks call parameters when exceptions thrown.
(Todd Lipcon via Eli Collins)
HADOOP-6971. Clover build doesn't generate per-test coverage (cos)
HADOOP-6993. Broken link on cluster setup page of docs. (eli)
HADOOP-6944. [Herriot] Implement a functionality for getting proxy users
definitions like groups and hosts. (Vinay Thota via cos)
HADOOP-6954. Sources JARs are not correctly published to the Maven
repository. (tomwhite)
HADOOP-7052. misspelling of threshold in conf/log4j.properties.
(Jingguo Yao via eli)
HADOOP-7053. wrong FSNamesystem Audit logging setting in
conf/log4j.properties. (Jingguo Yao via eli)
Release 0.21.0 - 2010-08-13
INCOMPATIBLE CHANGES
HADOOP-4895. Remove deprecated methods DFSClient.getHints(..) and
DFSClient.isDirectory(..). (szetszwo)
HADOOP-4941. Remove deprecated FileSystem methods: getBlockSize(Path f),
getLength(Path f) and getReplication(Path src). (szetszwo)
HADOOP-4648. Remove obsolete, deprecated InMemoryFileSystem and
ChecksumDistributedFileSystem. (cdouglas via szetszwo)
HADOOP-4940. Remove a deprecated method FileSystem.delete(Path f). (Enis
Soztutar via szetszwo)
HADOOP-4010. Change semantics for LineRecordReader to read an additional
line per split- rather than moving back one character in the stream- to
work with splittable compression codecs. (Abdul Qadeer via cdouglas)
HADOOP-5094. Show hostname and separate live/dead datanodes in DFSAdmin
report. (Jakob Homan via szetszwo)
HADOOP-4942. Remove deprecated FileSystem methods getName() and
getNamed(String name, Configuration conf). (Jakob Homan via szetszwo)
HADOOP-5486. Removes the CLASSPATH string from the command line and instead
exports it in the environment. (Amareshwari Sriramadasu via ddas)
HADOOP-2827. Remove deprecated NetUtils::getServerAddress. (cdouglas)
HADOOP-5681. Change examples RandomWriter and RandomTextWriter to
use new mapreduce API. (Amareshwari Sriramadasu via sharad)
HADOOP-5680. Change org.apache.hadoop.examples.SleepJob to use new
mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5699. Change org.apache.hadoop.examples.PiEstimator to use
new mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5720. Introduces new task types - JOB_SETUP, JOB_CLEANUP
and TASK_CLEANUP. Removes the isMap methods from TaskID/TaskAttemptID
classes. (ddas)
HADOOP-5668. Change TotalOrderPartitioner to use new API. (Amareshwari
Sriramadasu via cdouglas)
HADOOP-5738. Split "waiting_tasks" JobTracker metric into waiting maps and
waiting reduces. (Sreekanth Ramakrishnan via cdouglas)
HADOOP-5679. Resolve findbugs warnings in core/streaming/pipes/examples.
(Jothi Padmanabhan via sharad)
HADOOP-4359. Support for data access authorization checking on Datanodes.
(Kan Zhang via rangadi)
HADOOP-5690. Change org.apache.hadoop.examples.DBCountPageView to use
new mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5694. Change org.apache.hadoop.examples.dancing to use new
mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5696. Change org.apache.hadoop.examples.Sort to use new
mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5698. Change org.apache.hadoop.examples.MultiFileWordCount to
use new mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5913. Provide ability to an administrator to stop and start
job queues. (Rahul Kumar Singh and Hemanth Yamijala via yhemanth)
MAPREDUCE-711. Removed Distributed Cache from Common, to move it
under Map/Reduce. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-6201. Change FileSystem::listStatus contract to throw
FileNotFoundException if the directory does not exist, rather than letting
this be implementation-specific. (Jakob Homan via cdouglas)
HADOOP-6230. Moved process tree and memory calculator related classes
from Common to Map/Reduce. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-6203. FsShell rm/rmr error message indicates exceeding Trash quota
and suggests using -skpTrash, when moving to trash fails.
(Boris Shkolnik via suresh)
HADOOP-6303. Eclipse .classpath template has outdated jar files and is
missing some new ones. (cos)
HADOOP-6396. Fix uninformative exception message when unable to parse
umask. (jghoman)
HADOOP-6299. Reimplement the UserGroupInformation to use the OS
specific and Kerberos JAAS login. (omalley)
HADOOP-6686. Remove redundant exception class name from the exception
message for the exceptions thrown at RPC client. (suresh)
HADOOP-6701. Fix incorrect exit codes returned from chmod, chown and chgrp
commands from FsShell. (Ravi Phulari via suresh)
NEW FEATURES
HADOOP-6332. Large-scale Automated Test Framework. (sharad, Sreekanth
Ramakrishnan, at all via cos)
HADOOP-4268. Change fsck to use ClientProtocol methods so that the
corresponding permission requirement for running the ClientProtocol
methods will be enforced. (szetszwo)
HADOOP-3953. Implement sticky bit for directories in HDFS. (Jakob Homan
via szetszwo)
HADOOP-4368. Implement df in FsShell to show the status of a FileSystem.
(Craig Macdonald via szetszwo)
HADOOP-3741. Add a web ui to the SecondaryNameNode for showing its status.
(szetszwo)
HADOOP-5018. Add pipelined writers to Chukwa. (Ari Rabkin via cdouglas)
HADOOP-5052. Add an example computing exact digits of pi using the
Bailey-Borwein-Plouffe algorithm. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4927. Adds a generic wrapper around outputformat to allow creation of
output on demand (Jothi Padmanabhan via ddas)
HADOOP-5144. Add a new DFSAdmin command for changing the setting of restore
failed storage replicas in namenode. (Boris Shkolnik via szetszwo)
HADOOP-5258. Add a new DFSAdmin command to print a tree of the rack and
datanode topology as seen by the namenode. (Jakob Homan via szetszwo)
HADOOP-4756. A command line tool to access JMX properties on NameNode
and DataNode. (Boris Shkolnik via rangadi)
HADOOP-4539. Introduce backup node and checkpoint node. (shv)
HADOOP-5363. Add support for proxying connections to multiple clusters with
different versions to hdfsproxy. (Zhiyong Zhang via cdouglas)
HADOOP-5528. Add a configurable hash partitioner operating on ranges of
BinaryComparable keys. (Klaas Bosteels via shv)
HADOOP-5257. HDFS servers may start and stop external components through
a plugin interface. (Carlos Valiente via dhruba)
HADOOP-5450. Add application-specific data types to streaming's typed bytes
interface. (Klaas Bosteels via omalley)
HADOOP-5518. Add contrib/mrunit, a MapReduce unit test framework.
(Aaron Kimball via cutting)
HADOOP-5469. Add /metrics servlet to daemons, providing metrics
over HTTP as either text or JSON. (Philip Zeyliger via cutting)
HADOOP-5467. Introduce offline fsimage image viewer. (Jakob Homan via shv)
HADOOP-5752. Add a new hdfs image processor, Delimited, to oiv. (Jakob
Homan via szetszwo)
HADOOP-5266. Adds the capability to do mark/reset of the reduce values
iterator in the Context object API. (Jothi Padmanabhan via ddas)
HADOOP-5745. Allow setting the default value of maxRunningJobs for all
pools. (dhruba via matei)
HADOOP-5643. Adds a way to decommission TaskTrackers while the JobTracker
is running. (Amar Kamat via ddas)
HADOOP-4829. Allow FileSystem shutdown hook to be disabled.
(Todd Lipcon via tomwhite)
HADOOP-5815. Sqoop: A database import tool for Hadoop.
(Aaron Kimball via tomwhite)
HADOOP-4861. Add disk usage with human-readable size (-duh).
(Todd Lipcon via tomwhite)
HADOOP-5844. Use mysqldump when connecting to local mysql instance in Sqoop.
(Aaron Kimball via tomwhite)
HADOOP-5976. Add a new command, classpath, to the hadoop script. (Owen
O'Malley and Gary Murry via szetszwo)
HADOOP-6120. Add support for Avro specific and reflect data.
(sharad via cutting)
HADOOP-6226. Moves BoundedByteArrayOutputStream from the tfile package to
the io package and makes it available to other users (MAPREDUCE-318).
(Jothi Padmanabhan via ddas)
HADOOP-6105. Adds support for automatically handling deprecation of
configuration keys. (V.V.Chaitanya Krishna via yhemanth)
HADOOP-6235. Adds new method to FileSystem for clients to get server
defaults. (Kan Zhang via suresh)
HADOOP-6234. Add new option dfs.umaskmode to set umask in configuration
to use octal or symbolic instead of decimal. (Jakob Homan via suresh)
HADOOP-5073. Add annotation mechanism for interface classification.
(Jakob Homan via suresh)
HADOOP-4012. Provide splitting support for bzip2 compressed files. (Abdul
Qadeer via cdouglas)
HADOOP-6246. Add backward compatibility support to use deprecated decimal
umask from old configuration. (Jakob Homan via suresh)
HADOOP-4952. Add new improved file system interface FileContext for the
application writer (Sanjay Radia via suresh)
HADOOP-6170. Add facility to tunnel Avro RPCs through Hadoop RPCs.
This permits one to take advantage of both Avro's RPC versioning
features and Hadoop's proven RPC scalability. (cutting)
HADOOP-6267. Permit building contrib modules located in external
source trees. (Todd Lipcon via cutting)
HADOOP-6240. Add new FileContext rename operation that posix compliant
that allows overwriting existing destination. (suresh)
HADOOP-6204. Implementing aspects development and fault injeciton
framework for Hadoop (cos)
HADOOP-6313. Implement Syncable interface in FSDataOutputStream to expose
flush APIs to application users. (Hairong Kuang via suresh)
HADOOP-6284. Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to
hadoop-config.sh so that it allows setting java command options for
JAVA_PLATFORM. (Koji Noguchi via szetszwo)
HADOOP-6337. Updates FilterInitializer class to be more visible,
and the init of the class is made to take a Configuration argument.
(Jakob Homan via ddas)
Hadoop-6223. Add new file system interface AbstractFileSystem with
implementation of some file systems that delegate to old FileSystem.
(Sanjay Radia via suresh)
HADOOP-6433. Introduce asychronous deletion of files via a pool of
threads. This can be used to delete files in the Distributed
Cache. (Zheng Shao via dhruba)
HADOOP-6415. Adds a common token interface for both job token and
delegation token. (Kan Zhang via ddas)
HADOOP-6408. Add a /conf servlet to dump running configuration.
(Todd Lipcon via tomwhite)
HADOOP-6520. Adds APIs to read/write Token and secret keys. Also
adds the automatic loading of tokens into UserGroupInformation
upon login. The tokens are read from a file specified in the
environment variable. (ddas)
HADOOP-6419. Adds SASL based authentication to RPC.
(Kan Zhang via ddas)
HADOOP-6510. Adds a way for superusers to impersonate other users
in a secure environment. (Jitendra Nath Pandey via ddas)
HADOOP-6421. Adds Symbolic links to FileContext, AbstractFileSystem.
It also adds a limited implementation for the local file system
(RawLocalFs) that allows local symlinks. (Eli Collins via Sanjay Radia)
HADOOP-6577. Add hidden configuration option "ipc.server.max.response.size"
to change the default 1 MB, the maximum size when large IPC handler
response buffer is reset. (suresh)
HADOOP-6568. Adds authorization for the default servlets.
(Vinod Kumar Vavilapalli via ddas)
HADOOP-6586. Log authentication and authorization failures and successes
for RPC (boryas)
HADOOP-6580. UGI should contain authentication method. (jnp via boryas)
HADOOP-6657. Add a capitalization method to StringUtils for MAPREDUCE-1545.
(Luke Lu via Steve Loughran)
HADOOP-6692. Add FileContext#listStatus that returns an iterator.
(hairong)
HADOOP-6869. Functionality to create file or folder on a remote daemon
side (Vinay Thota via cos)
IMPROVEMENTS
HADOOP-6798. Align Ivy version for all Hadoop subprojects. (cos)
HADOOP-6777. Implement a functionality for suspend and resume a process.
(Vinay Thota via cos)
HADOOP-6772. Utilities for system tests specific. (Vinay Thota via cos)
HADOOP-6771. Herriot's artifact id for Maven deployment should be set to
hadoop-core-instrumented (cos)
HADOOP-6752. Remote cluster control functionality needs JavaDocs
improvement (Balaji Rajagopalan via cos).
HADOOP-4565. Added CombineFileInputFormat to use data locality information
to create splits. (dhruba via zshao)
HADOOP-4936. Improvements to TestSafeMode. (shv)
HADOOP-4985. Remove unnecessary "throw IOException" declarations in
FSDirectory related methods. (szetszwo)
HADOOP-5017. Change NameNode.namesystem declaration to private. (szetszwo)
HADOOP-4794. Add branch information from the source version control into
the version information that is compiled into Hadoop. (cdouglas via
omalley)
HADOOP-5070. Increment copyright year to 2009, remove assertions of ASF
copyright to licensed files. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5037. Deprecate static FSNamesystem.getFSNamesystem(). (szetszwo)
HADOOP-5088. Include releaseaudit target as part of developer test-patch
target. (Giridharan Kesavan via nigel)
HADOOP-2721. Uses setsid when creating new tasks so that subprocesses of
this process will be within this new session (and this process will be
the process leader for all the subprocesses). Killing the process leader,
or the main Java task in Hadoop's case, kills the entire subtree of
processes. (Ravi Gummadi via ddas)
HADOOP-5097. Remove static variable JspHelper.fsn, a static reference to
a non-singleton FSNamesystem object. (szetszwo)
HADOOP-3327. Improves handling of READ_TIMEOUT during map output copying.
(Amareshwari Sriramadasu via ddas)
HADOOP-5124. Choose datanodes randomly instead of starting from the first
datanode for providing fairness. (hairong via szetszwo)
HADOOP-4930. Implement a Linux native executable that can be used to
launch tasks as users. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-5122. Fix format of fs.default.name value in libhdfs test conf.
(Craig Macdonald via tomwhite)
HADOOP-5038. Direct daemon trace to debug log instead of stdout. (Jerome
Boulon via cdouglas)
HADOOP-5101. Improve packaging by adding 'all-jars' target building core,
tools, and example jars. Let findbugs depend on this rather than the 'tar'
target. (Giridharan Kesavan via cdouglas)
HADOOP-4868. Splits the hadoop script into three parts - bin/hadoop,
bin/mapred and bin/hdfs. (Sharad Agarwal via ddas)
HADOOP-1722. Adds support for TypedBytes and RawBytes in Streaming.
(Klaas Bosteels via ddas)
HADOOP-4220. Changes the JobTracker restart tests so that they take much
less time. (Amar Kamat via ddas)
HADOOP-4885. Try to restore failed name-node storage directories at
checkpoint time. (Boris Shkolnik via shv)
HADOOP-5209. Update year to 2009 for javadoc. (szetszwo)
HADOOP-5279. Remove unnecessary targets from test-patch.sh.
(Giridharan Kesavan via nigel)
HADOOP-5120. Remove the use of FSNamesystem.getFSNamesystem() from
UpgradeManagerNamenode and UpgradeObjectNamenode. (szetszwo)
HADOOP-5222. Add offset to datanode clienttrace. (Lei Xu via cdouglas)
HADOOP-5240. Skip re-building javadoc when it is already
up-to-date. (Aaron Kimball via cutting)
HADOOP-5042. Add a cleanup stage to log rollover in Chukwa appender.
(Jerome Boulon via cdouglas)
HADOOP-5264. Removes redundant configuration object from the TaskTracker.
(Sharad Agarwal via ddas)
HADOOP-5232. Enable patch testing to occur on more than one host.
(Giri Kesavan via nigel)
HADOOP-4546. Fix DF reporting for AIX. (Bill Habermaas via cdouglas)
HADOOP-5023. Add Tomcat support to HdfsProxy. (Zhiyong Zhang via cdouglas)
HADOOP-5317. Provide documentation for LazyOutput Feature.
(Jothi Padmanabhan via johan)
HADOOP-5455. Document rpc metrics context to the extent dfs, mapred, and
jvm contexts are documented. (Philip Zeyliger via cdouglas)
HADOOP-5358. Provide scripting functionality to the synthetic load
generator. (Jakob Homan via hairong)
HADOOP-5442. Paginate jobhistory display and added some search
capabilities. (Amar Kamat via acmurthy)
HADOOP-4842. Streaming now allows specifiying a command for the combiner.
(Amareshwari Sriramadasu via ddas)
HADOOP-5196. avoiding unnecessary byte[] allocation in
SequenceFile.CompressedBytes and SequenceFile.UncompressedBytes.
(hong tang via mahadev)
HADOOP-4655. New method FileSystem.newInstance() that always returns
a newly allocated FileSystem object. (dhruba)
HADOOP-4788. Set Fair scheduler to assign both a map and a reduce on each
heartbeat by default. (matei)
HADOOP-5491. In contrib/index, better control memory usage.
(Ning Li via cutting)
HADOOP-5423. Include option of preserving file metadata in
SequenceFile::sort. (Michael Tamm via cdouglas)
HADOOP-5331. Add support for KFS appends. (Sriram Rao via cdouglas)
HADOOP-4365. Make Configuration::getProps protected in support of
meaningful subclassing. (Steve Loughran via cdouglas)
HADOOP-2413. Remove the static variable FSNamesystem.fsNamesystemObject.
(Konstantin Shvachko via szetszwo)
HADOOP-4584. Improve datanode block reports and associated file system
scan to avoid interefering with normal datanode operations.
(Suresh Srinivas via rangadi)
HADOOP-5502. Documentation for backup and checkpoint nodes.
(Jakob Homan via shv)
HADOOP-5485. Mask actions in the fair scheduler's servlet UI based on
value of webinterface.private.actions.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5581. HDFS should throw FileNotFoundException when while opening
a file that does not exist. (Brian Bockelman via rangadi)
HADOOP-5509. PendingReplicationBlocks does not start monitor in the
constructor. (shv)
HADOOP-5494. Modify sorted map output merger to lazily read values,
rather than buffering at least one record for each segment. (Devaraj Das
via cdouglas)
HADOOP-5396. Provide ability to refresh queue ACLs in the JobTracker
without having to restart the daemon.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4490. Provide ability to run tasks as job owners.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5697. Change org.apache.hadoop.examples.Grep to use new
mapreduce api. (Amareshwari Sriramadasu via sharad)
HADOOP-5625. Add operation duration to clienttrace. (Lei Xu via cdouglas)
HADOOP-5705. Improve TotalOrderPartitioner efficiency by updating the trie
construction. (Dick King via cdouglas)
HADOOP-5589. Eliminate source limit of 64 for map-side joins imposed by
TupleWritable encoding. (Jingkei Ly via cdouglas)
HADOOP-5734. Correct block placement policy description in HDFS
Design document. (Konstantin Boudnik via shv)
HADOOP-5657. Validate data in TestReduceFetch to improve merge test
coverage. (cdouglas)
HADOOP-5613. Change S3Exception to checked exception.
(Andrew Hitchcock via tomwhite)
HADOOP-5717. Create public enum class for the Framework counters in
org.apache.hadoop.mapreduce. (Amareshwari Sriramadasu via sharad)
HADOOP-5217. Split AllTestDriver for core, hdfs and mapred. (sharad)
HADOOP-5364. Add certificate expiration warning to HsftpFileSystem and HDFS
proxy. (Zhiyong Zhang via cdouglas)
HADOOP-5733. Add map/reduce slot capacity and blacklisted capacity to
JobTracker metrics. (Sreekanth Ramakrishnan via cdouglas)
HADOOP-5596. Add EnumSetWritable. (He Yongqiang via szetszwo)
HADOOP-5727. Simplify hashcode for ID types. (Shevek via cdouglas)
HADOOP-5500. In DBOutputFormat, where field names are absent permit the
number of fields to be sufficient to construct the select query. (Enis
Soztutar via cdouglas)
HADOOP-5081. Split TestCLI into HDFS, Mapred and Core tests. (sharad)
HADOOP-5015. Separate block management code from FSNamesystem. (Suresh
Srinivas via szetszwo)
HADOOP-5080. Add new test cases to TestMRCLI and TestHDFSCLI
(V.Karthikeyan via nigel)
HADOOP-5135. Splits the tests into different directories based on the
package. Four new test targets have been defined - run-test-core,
run-test-mapred, run-test-hdfs and run-test-hdfs-with-mr.
(Sharad Agarwal via ddas)
HADOOP-5771. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5419. Provide a facility to query the Queue ACLs for the
current user.
(Rahul Kumar Singh via yhemanth)
HADOOP-5780. Improve per block message prited by "-metaSave" in HDFS.
(Raghu Angadi)
HADOOP-5823. Added a new class DeprecatedUTF8 to help with removing
UTF8 related javac warnings. These warnings are removed in
FSEditLog.java as a use case. (Raghu Angadi)
HADOOP-5824. Deprecate DataTransferProtocol.OP_READ_METADATA and remove
the corresponding unused codes. (Kan Zhang via szetszwo)
HADOOP-5721. Factor out EditLogFileInputStream and EditLogFileOutputStream
into independent classes. (Luca Telloli & Flavio Junqueira via shv)
HADOOP-5838. Fix a few javac warnings in HDFS. (Raghu Angadi)
HADOOP-5854. Fix a few "Inconsistent Synchronization" warnings in HDFS.
(Raghu Angadi)
HADOOP-5369. Small tweaks to reduce MapFile index size. (Ben Maurer
via sharad)
HADOOP-5858. Eliminate UTF8 and fix warnings in test/hdfs-with-mr package.
(shv)
HADOOP-5866. Move DeprecatedUTF8 from o.a.h.io to o.a.h.hdfs since it may
not be used outside hdfs. (Raghu Angadi)
HADOOP-5857. Move normal java methods from hdfs .jsp files to .java files.
(szetszwo)
HADOOP-5873. Remove deprecated methods randomDataNode() and
getDatanodeByIndex(..) in FSNamesystem. (szetszwo)
HADOOP-5572. Improves the progress reporting for the sort phase for both
maps and reduces. (Ravi Gummadi via ddas)
HADOOP-5839. Fix EC2 scripts to allow remote job submission.
(Joydeep Sen Sarma via tomwhite)
HADOOP-5877. Fix javac warnings in TestHDFSServerPorts, TestCheckpoint,
TestNameEditsConfig, TestStartup and TestStorageRestore.
(Jakob Homan via shv)
HADOOP-5438. Provide a single FileSystem method to create or
open-for-append to a file. (He Yongqiang via dhruba)
HADOOP-5472. Change DistCp to support globbing of input paths. (Dhruba
Borthakur and Rodrigo Schmidt via szetszwo)
HADOOP-5175. Don't unpack libjars on classpath. (Todd Lipcon via tomwhite)
HADOOP-5620. Add an option to DistCp for preserving modification and access
times. (Rodrigo Schmidt via szetszwo)
HADOOP-5664. Change map serialization so a lock is obtained only where
contention is possible, rather than for each write. (cdouglas)
HADOOP-5896. Remove the dependency of GenericOptionsParser on
Option.withArgPattern. (Giridharan Kesavan and Sharad Agarwal via
sharad)
HADOOP-5784. Makes the number of heartbeats that should arrive a second
at the JobTracker configurable. (Amareshwari Sriramadasu via ddas)
HADOOP-5955. Changes TestFileOuputFormat so that is uses LOCAL_MR
instead of CLUSTER_MR. (Jothi Padmanabhan via das)
HADOOP-5948. Changes TestJavaSerialization to use LocalJobRunner
instead of MiniMR/DFS cluster. (Jothi Padmanabhan via das)
HADOOP-2838. Add mapred.child.env to pass environment variables to
tasktracker's child processes. (Amar Kamat via sharad)
HADOOP-5961. DataNode process understand generic hadoop command line
options (like -Ddfs.property=value). (Raghu Angadi)
HADOOP-5938. Change org.apache.hadoop.mapred.jobcontrol to use new
api. (Amareshwari Sriramadasu via sharad)
HADOOP-2141. Improves the speculative execution heuristic. The heuristic
is currently based on the progress-rates of tasks and the expected time
to complete. Also, statistics about trackers are collected, and speculative
tasks are not given to the ones deduced to be slow.
(Andy Konwinski and ddas)
HADOOP-5952. Change "-1 tests included" wording in test-patch.sh.
(Gary Murry via szetszwo)
HADOOP-6106. Provides an option in ShellCommandExecutor to timeout
commands that do not complete within a certain amount of time.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5925. EC2 scripts should exit on error. (tomwhite)
HADOOP-6109. Change Text to grow its internal buffer exponentially, rather
than the max of the current length and the proposed length to improve
performance reading large values. (thushara wijeratna via cdouglas)
HADOOP-2366. Support trimmed strings in Configuration. (Michele Catasta
via szetszwo)
HADOOP-6099. The RPC module can be configured to not send period pings.
The default behaviour of sending periodic pings remain unchanged. (dhruba)
HADOOP-6142. Update documentation and use of harchives for relative paths
added in MAPREDUCE-739. (Mahadev Konar via cdouglas)
HADOOP-6148. Implement a fast, pure Java CRC32 calculator which outperforms
java.util.zip.CRC32. (Todd Lipcon and Scott Carey via szetszwo)
HADOOP-6146. Upgrade to JetS3t version 0.7.1. (tomwhite)
HADOOP-6161. Add get/setEnum methods to Configuration. (cdouglas)
HADOOP-6160. Fix releaseaudit target to run on specific directories.
(gkesavan)
HADOOP-6169. Removing deprecated method calls in TFile. (hong tang via
mahadev)
HADOOP-6176. Add a couple package private methods to AccessTokenHandler
for testing. (Kan Zhang via szetszwo)
HADOOP-6182. Fix ReleaseAudit warnings (Giridharan Kesavan and Lee Tucker
via gkesavan)
HADOOP-6173. Change src/native/packageNativeHadoop.sh to package all
native library files. (Hong Tang via szetszwo)
HADOOP-6184. Provide an API to dump Configuration in a JSON format.
(V.V.Chaitanya Krishna via yhemanth)
HADOOP-6224. Add a method to WritableUtils performing a bounded read of an
encoded String. (Jothi Padmanabhan via cdouglas)
HADOOP-6133. Add a caching layer to Configuration::getClassByName to
alleviate a performance regression introduced in a compatibility layer.
(Todd Lipcon via cdouglas)
HADOOP-6252. Provide a method to determine if a deprecated key is set in
config file. (Jakob Homan via suresh)
HADOOP-5879. Read compression level and strategy from Configuration for
gzip compression. (He Yongqiang via cdouglas)
HADOOP-6216. Support comments in host files. (Ravi Phulari and Dmytro
Molkov via szetszwo)
HADOOP-6217. Update documentation for project split. (Corinne Chandel via
omalley)
HADOOP-6268. Add ivy jar to .gitignore. (Todd Lipcon via cdouglas)
HADOOP-6270. Support deleteOnExit in FileContext. (Suresh Srinivas via
szetszwo)
HADOOP-6233. Rename configuration keys towards API standardization and
backward compatibility. (Jithendra Pandey via suresh)
HADOOP-6260. Add additional unit tests for FileContext util methods.
(Gary Murry via suresh).
HADOOP-6309. Change build.xml to run tests with java asserts. (Eli
Collins via szetszwo)
HADOOP-6326. Hundson runs should check for AspectJ warnings and report
failure if any is present (cos)
HADOOP-6329. Add build-fi directory to the ignore lists. (szetszwo)
HADOOP-5107. Use Maven ant tasks to publish the subproject jars.
(Giridharan Kesavan via omalley)
HADOOP-6343. Log unexpected throwable object caught in RPC. (Jitendra Nath
Pandey via szetszwo)
HADOOP-6367. Removes Access Token implementation from common.
(Kan Zhang via ddas)
HADOOP-6395. Upgrade some libraries to be consistent across common, hdfs,
and mapreduce. (omalley)
HADOOP-6398. Build is broken after HADOOP-6395 patch has been applied (cos)
HADOOP-6413. Move TestReflectionUtils to Common. (Todd Lipcon via tomwhite)
HADOOP-6283. Improve the exception messages thrown by
FileUtil$HardLink.getLinkCount(..). (szetszwo)
HADOOP-6279. Add Runtime::maxMemory to JVM metrics. (Todd Lipcon via
cdouglas)
HADOOP-6305. Unify build property names to facilitate cross-projects
modifications (cos)
HADOOP-6312. Remove unnecessary debug logging in Configuration constructor.
(Aaron Kimball via cdouglas)
HADOOP-6366. Reduce ivy console output to ovservable level (cos)
HADOOP-6400. Log errors getting Unix UGI. (Todd Lipcon via tomwhite)
HADOOP-6346. Add support for specifying unpack pattern regex to
RunJar.unJar. (Todd Lipcon via tomwhite)
HADOOP-6422. Make RPC backend plugable, protocol-by-protocol, to
ease evolution towards Avro. (cutting)
HADOOP-5958. Use JDK 1.6 File APIs in DF.java wherever possible.
(Aaron Kimball via tomwhite)
HADOOP-6222. Core doesn't have TestCommonCLI facility. (cos)
HADOOP-6394. Add a helper class to simplify FileContext related tests and
improve code reusability. (Jitendra Nath Pandey via suresh)
HADOOP-4656. Add a user to groups mapping service. (boryas, acmurthy)
HADOOP-6435. Make RPC.waitForProxy with timeout public. (Steve Loughran
via tomwhite)
HADOOP-6472. add tokenCache option to GenericOptionsParser for passing
file with secret keys to a map reduce job. (boryas)
HADOOP-3205. Read multiple chunks directly from FSInputChecker subclass
into user buffers. (Todd Lipcon via tomwhite)
HADOOP-6479. TestUTF8 assertions could fail with better text.
(Steve Loughran via tomwhite)
HADOOP-6155. Deprecate RecordIO anticipating Avro. (Tom White via cdouglas)
HADOOP-6492. Make some Avro serialization APIs public.
(Aaron Kimball via cutting)
HADOOP-6497. Add an adapter for Avro's SeekableInput interface, so
that Avro can read FileSystem data.
(Aaron Kimball via cutting)
HADOOP-6495. Identifier should be serialized after the password is
created In Token constructor (jnp via boryas)
HADOOP-6518. Makes the UGI honor the env var KRB5CCNAME.
(Owen O'Malley via ddas)
HADOOP-6531. Enhance FileUtil with an API to delete all contents of a
directory. (Amareshwari Sriramadasu via yhemanth)
HADOOP-6547. Move DelegationToken into Common, so that it can be used by
MapReduce also. (devaraj via omalley)
HADOOP-6552. Puts renewTGT=true and useTicketCache=true for the keytab
kerberos options. (ddas)
HADOOP-6534. Trim whitespace from directory lists initializing
LocalDirAllocator. (Todd Lipcon via cdouglas)
HADOOP-6559. Makes the RPC client automatically re-login when the SASL
connection setup fails. This is applicable only to keytab based logins.
(Devaraj Das)
HADOOP-6551. Delegation token renewing and cancelling should provide
meaningful exceptions when there are failures instead of returning
false. (omalley)
HADOOP-6583. Captures authentication and authorization metrics. (ddas)
HADOOP-6543. Allows secure clients to talk to unsecure clusters.
(Kan Zhang via ddas)
HADOOP-6579. Provide a mechanism for encoding/decoding Tokens from
a url-safe string and change the commons-code library to 1.4. (omalley)
HADOOP-6596. Add a version field to the AbstractDelegationTokenIdentifier's
serialized value. (omalley)
HADOOP-6573. Support for persistent delegation tokens.
(Jitendra Pandey via shv)
HADOOP-6594. Provide a fetchdt tool via bin/hdfs. (jhoman via acmurthy)
HADOOP-6589. Provide better error messages when RPC authentication fails.
(Kan Zhang via omalley)
HADOOP-6599 Split existing RpcMetrics into RpcMetrics & RpcDetailedMetrics.
(Suresh Srinivas via Sanjay Radia)
HADOOP-6537 Declare more detailed exceptions in FileContext and
AbstractFileSystem (Suresh Srinivas via Sanjay Radia)
HADOOP-6486. fix common classes to work with Avro 1.3 reflection.
(cutting via tomwhite)
HADOOP-6591. HarFileSystem can handle paths with the whitespace characters.
(Rodrigo Schmidt via dhruba)
HADOOP-6407. Have a way to automatically update Eclipse .classpath file
when new libs are added to the classpath through Ivy. (tomwhite)
HADOOP-3659. Patch to allow hadoop native to compile on Mac OS X.
(Colin Evans and Allen Wittenauer via tomwhite)
HADOOP-6471. StringBuffer -> StringBuilder - conversion of references
as necessary. (Kay Kay via tomwhite)
HADOOP-6646. Move HarfileSystem out of Hadoop Common. (mahadev)
HADOOP-6566. Add methods supporting, enforcing narrower permissions on
local daemon directories. (Arun Murthy and Luke Lu via cdouglas)
HADOOP-6705. Fix to work with 1.5 version of jiracli
(Giridharan Kesavan)
HADOOP-6658. Exclude Private elements from generated Javadoc. (tomwhite)
HADOOP-6635. Install/deploy source jars to Maven repo.
(Patrick Angeles via jghoman)
HADOOP-6717. Log levels in o.a.h.security.Groups too high
(Todd Lipcon via jghoman)
HADOOP-6667. RPC.waitForProxy should retry through NoRouteToHostException.
(Todd Lipcon via tomwhite)
HADOOP-6677. InterfaceAudience.LimitedPrivate should take a string not an
enum. (tomwhite)
HADOOP-678. Remove FileContext#isFile, isDirectory, and exists.
(Eli Collins via hairong)
HADOOP-6515. Make maximum number of http threads configurable.
(Scott Chen via zshao)
HADOOP-6563. Add more symlink tests to cover intermediate symlinks
in paths. (Eli Collins via suresh)
HADOOP-6585. Add FileStatus#isDirectory and isFile. (Eli Collins via
tomwhite)
HADOOP-6738. Move cluster_setup.xml from MapReduce to Common.
(Tom White via tomwhite)
HADOOP-6794. Move configuration and script files post split. (tomwhite)
HADOOP-6403. Deprecate EC2 bash scripts. (tomwhite)
HADOOP-6769. Add an API in FileSystem to get FileSystem instances based
on users(ddas via boryas)
HADOOP-6813. Add a new newInstance method in FileSystem that takes
a "user" as argument (ddas via boryas)
HADOOP-6668. Apply audience and stability annotations to classes in
common. (tomwhite)
HADOOP-6821. Document changes to memory monitoring. (Hemanth Yamijala
via tomwhite)
OPTIMIZATIONS
HADOOP-5595. NameNode does not need to run a replicator to choose a
random DataNode. (hairong)
HADOOP-5603. Improve NameNode's block placement performance. (hairong)
HADOOP-5638. More improvement on block placement performance. (hairong)
HADOOP-6180. NameNode slowed down when many files with same filename
were moved to Trash. (Boris Shkolnik via hairong)
HADOOP-6166. Further improve the performance of the pure-Java CRC32
implementation. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-6271. Add recursive and non recursive create and mkdir to
FileContext. (Sanjay Radia via suresh)
HADOOP-6261. Add URI based tests for FileContext.
(Ravi Pulari via suresh).
HADOOP-6307. Add a new SequenceFile.Reader constructor in order to support
reading on un-closed file. (szetszwo)
HADOOP-6467. Improve the performance on HarFileSystem.listStatus(..).
(mahadev via szetszwo)
HADOOP-6569. FsShell#cat should avoid calling unecessary getFileStatus
before opening a file to read. (hairong)
HADOOP-6689. Add directory renaming test to existing FileContext tests.
(Eli Collins via suresh)
HADOOP-6713. The RPC server Listener thread is a scalability bottleneck.
(Dmytro Molkov via hairong)
BUG FIXES
HADOOP-6748. Removes hadoop.cluster.administrators, cluster administrators
acl is passed as parameter in constructor. (amareshwari)
HADOOP-6828. Herrior uses old way of accessing logs directories (Sreekanth
Ramakrishnan via cos)
HADOOP-6788. [Herriot] Exception exclusion functionality is not working
correctly. (Vinay Thota via cos)
HADOOP-6773. Ivy folder contains redundant files (cos)
HADOOP-5379. CBZip2InputStream to throw IOException on data crc error.
(Rodrigo Schmidt via zshao)
HADOOP-5326. Fixes CBZip2OutputStream data corruption problem.
(Rodrigo Schmidt via zshao)
HADOOP-4963. Fixes a logging to do with getting the location of
map output file. (Amareshwari Sriramadasu via ddas)
HADOOP-2337. Trash should close FileSystem on exit and should not start
emtying thread if disabled. (shv)
HADOOP-5072. Fix failure in TestCodec because testSequenceFileGzipCodec
won't pass without native gzip codec. (Zheng Shao via dhruba)
HADOOP-5050. TestDFSShell.testFilePermissions should not assume umask
setting. (Jakob Homan via szetszwo)
HADOOP-4975. Set classloader for nested mapred.join configs. (Jingkei Ly
via cdouglas)
HADOOP-5078. Remove invalid AMI kernel in EC2 scripts. (tomwhite)
HADOOP-5045. FileSystem.isDirectory() should not be deprecated. (Suresh
Srinivas via szetszwo)
HADOOP-4960. Use datasource time, rather than system time, during metrics
demux. (Eric Yang via cdouglas)
HADOOP-5032. Export conf dir set in config script. (Eric Yang via cdouglas)
HADOOP-5176. Fix a typo in TestDFSIO. (Ravi Phulari via szetszwo)
HADOOP-4859. Distinguish daily rolling output dir by adding a timestamp.
(Jerome Boulon via cdouglas)
HADOOP-4959. Correct system metric collection from top on Redhat 5.1. (Eric
Yang via cdouglas)
HADOOP-5039. Fix log rolling regex to process only the relevant
subdirectories. (Jerome Boulon via cdouglas)
HADOOP-5095. Update Chukwa watchdog to accept config parameter. (Jerome
Boulon via cdouglas)
HADOOP-5147. Correct reference to agent list in Chukwa bin scripts. (Ari
Rabkin via cdouglas)
HADOOP-5148. Fix logic disabling watchdog timer in Chukwa daemon scripts.
(Ari Rabkin via cdouglas)
HADOOP-5100. Append, rather than truncate, when creating log4j metrics in
Chukwa. (Jerome Boulon via cdouglas)
HADOOP-5204. Fix broken trunk compilation on Hudson by letting
task-controller be an independent target in build.xml.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5212. Fix the path translation problem introduced by HADOOP-4868
running on cygwin. (Sharad Agarwal via omalley)
HADOOP-5226. Add license headers to html and jsp files. (szetszwo)
HADOOP-5172. Disable misbehaving Chukwa unit test until it can be fixed.
(Jerome Boulon via nigel)
HADOOP-4933. Fixes a ConcurrentModificationException problem that shows up
when the history viewer is accessed concurrently.
(Amar Kamat via ddas)
HADOOP-5253. Remove duplicate call to cn-docs target.
(Giri Kesavan via nigel)
HADOOP-5251. Fix classpath for contrib unit tests to include clover jar.
(nigel)
HADOOP-5206. Synchronize "unprotected*" methods of FSDirectory on the root.
(Jakob Homan via shv)
HADOOP-5292. Fix NPE in KFS::getBlockLocations. (Sriram Rao via lohit)
HADOOP-5219. Adds a new property io.seqfile.local.dir for use by
SequenceFile, which earlier used mapred.local.dir. (Sharad Agarwal
via ddas)
HADOOP-5300. Fix ant javadoc-dev target and the typo in the class name
NameNodeActivtyMBean. (szetszwo)
HADOOP-5218. libhdfs unit test failed because it was unable to
start namenode/datanode. Fixed. (dhruba)
HADOOP-5273. Add license header to TestJobInProgress.java. (Jakob Homan
via szetszwo)
HADOOP-5229. Remove duplicate version variables in build files
(Stefan Groschupf via johan)
HADOOP-5383. Avoid building an unused string in NameNode's
verifyReplication(). (Raghu Angadi)
HADOOP-5347. Create a job output directory for the bbp examples. (szetszwo)
HADOOP-5341. Make hadoop-daemon scripts backwards compatible with the
changes in HADOOP-4868. (Sharad Agarwal via yhemanth)
HADOOP-5456. Fix javadoc links to ClientProtocol#restoreFailedStorage(..).
(Boris Shkolnik via szetszwo)
HADOOP-5458. Remove leftover Chukwa entries from build, etc. (cdouglas)
HADOOP-5386. Modify hdfsproxy unit test to start on a random port,
implement clover instrumentation. (Zhiyong Zhang via cdouglas)
HADOOP-5511. Add Apache License to EditLogBackupOutputStream. (shv)
HADOOP-5507. Fix JMXGet javadoc warnings. (Boris Shkolnik via szetszwo)
HADOOP-5191. Accessing HDFS with any ip or hostname should work as long
as it points to the interface NameNode is listening on. (Raghu Angadi)
HADOOP-5561. Add javadoc.maxmemory parameter to build, preventing OOM
exceptions from javadoc-dev. (Jakob Homan via cdouglas)
HADOOP-5149. Modify HistoryViewer to ignore unfamiliar files in the log
directory. (Hong Tang via cdouglas)
HADOOP-5477. Fix rare failure in TestCLI for hosts returning variations of
'localhost'. (Jakob Homan via cdouglas)
HADOOP-5194. Disables setsid for tasks run on cygwin.
(Ravi Gummadi via ddas)
HADOOP-5322. Fix misleading/outdated comments in JobInProgress.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-5198. Fixes a problem to do with the task PID file being absent and
the JvmManager trying to look for it. (Amareshwari Sriramadasu via ddas)
HADOOP-5464. DFSClient did not treat write timeout of 0 properly.
(Raghu Angadi)
HADOOP-4045. Fix processing of IO errors in EditsLog.
(Boris Shkolnik via shv)
HADOOP-5462. Fixed a double free bug in the task-controller
executable. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-5652. Fix a bug where in-memory segments are incorrectly retained in
memory. (cdouglas)
HADOOP-5533. Recovery duration shown on the jobtracker webpage is
inaccurate. (Amar Kamat via sharad)
HADOOP-5647. Fix TestJobHistory to not depend on /tmp. (Ravi Gummadi
via sharad)
HADOOP-5661. Fixes some findbugs warnings in o.a.h.mapred* packages and
supresses a bunch of them. (Jothi Padmanabhan via ddas)
HADOOP-5704. Fix compilation problems in TestFairScheduler and
TestCapacityScheduler. (Chris Douglas via szetszwo)
HADOOP-5650. Fix safemode messages in the Namenode log. (Suresh Srinivas
via szetszwo)
HADOOP-5488. Removes the pidfile management for the Task JVM from the
framework and instead passes the PID back and forth between the
TaskTracker and the Task processes. (Ravi Gummadi via ddas)
HADOOP-5658. Fix Eclipse templates. (Philip Zeyliger via shv)
HADOOP-5709. Remove redundant synchronization added in HADOOP-5661. (Jothi
Padmanabhan via cdouglas)
HADOOP-5715. Add conf/mapred-queue-acls.xml to the ignore lists.
(szetszwo)
HADOOP-5592. Fix typo in Streaming doc in reference to GzipCodec.
(Corinne Chandel via tomwhite)
HADOOP-5656. Counter for S3N Read Bytes does not work. (Ian Nowland
via tomwhite)
HADOOP-5406. Fix JNI binding for ZlibCompressor::setDictionary. (Lars
Francke via cdouglas)
HADOOP-3426. Fix/provide handling when DNS lookup fails on the loopback
address. Also cache the result of the lookup. (Steve Loughran via cdouglas)
HADOOP-5476. Close the underlying InputStream in SequenceFile::Reader when
the constructor throws an exception. (Michael Tamm via cdouglas)
HADOOP-5675. Do not launch a job if DistCp has no work to do. (Tsz Wo
(Nicholas), SZE via cdouglas)
HADOOP-5737. Fixes a problem in the way the JobTracker used to talk to
other daemons like the NameNode to get the job's files. Also adds APIs
in the JobTracker to get the FileSystem objects as per the JobTracker's
configuration. (Amar Kamat via ddas)
HADOOP-5648. Not able to generate gridmix.jar on the already compiled
version of hadoop. (gkesavan)
HADOOP-5808. Fix import never used javac warnings in hdfs. (szetszwo)
HADOOP-5203. TT's version build is too restrictive. (Rick Cox via sharad)
HADOOP-5818. Revert the renaming from FSNamesystem.checkSuperuserPrivilege
to checkAccess by HADOOP-5643. (Amar Kamat via szetszwo)
HADOOP-5820. Fix findbugs warnings for http related codes in hdfs.
(szetszwo)
HADOOP-5822. Fix javac warnings in several dfs tests related to unncessary
casts. (Jakob Homan via szetszwo)
HADOOP-5842. Fix a few javac warnings under packages fs and util.
(Hairong Kuang via szetszwo)
HADOOP-5845. Build successful despite test failure on test-core target.
(sharad)
HADOOP-5314. Prevent unnecessary saving of the file system image during
name-node startup. (Jakob Homan via shv)
HADOOP-5855. Fix javac warnings for DisallowedDatanodeException and
UnsupportedActionException. (szetszwo)
HADOOP-5582. Fixes a problem in Hadoop Vaidya to do with reading
counters from job history files. (Suhas Gogate via ddas)
HADOOP-5829. Fix javac warnings found in ReplicationTargetChooser,
FSImage, Checkpointer, SecondaryNameNode and a few other hdfs classes.
(Suresh Srinivas via szetszwo)
HADOOP-5835. Fix findbugs warnings found in Block, DataNode, NameNode and
a few other hdfs classes. (Suresh Srinivas via szetszwo)
HADOOP-5853. Undeprecate HttpServer.addInternalServlet method. (Suresh
Srinivas via szetszwo)
HADOOP-5801. Fixes the problem: If the hosts file is changed across restart
then it should be refreshed upon recovery so that the excluded hosts are
lost and the maps are re-executed. (Amar Kamat via ddas)
HADOOP-5841. Resolve findbugs warnings in DistributedFileSystem,
DatanodeInfo, BlocksMap, DataNodeDescriptor. (Jakob Homan via szetszwo)
HADOOP-5878. Fix import and Serializable javac warnings found in hdfs jsp.
(szetszwo)
HADOOP-5782. Revert a few formatting changes introduced in HADOOP-5015.
(Suresh Srinivas via rangadi)
HADOOP-5687. NameNode throws NPE if fs.default.name is the default value.
(Philip Zeyliger via shv)
HADOOP-5867. Fix javac warnings found in NNBench and NNBenchWithoutMR.
(Konstantin Boudnik via szetszwo)
HADOOP-5728. Fixed FSEditLog.printStatistics IndexOutOfBoundsException.
(Wang Xu via johan)
HADOOP-5847. Fixed failing Streaming unit tests (gkesavan)
HADOOP-5252. Streaming overrides -inputformat option (Klaas Bosteels
via sharad)
HADOOP-5710. Counter MAP_INPUT_BYTES missing from new mapreduce api.
(Amareshwari Sriramadasu via sharad)
HADOOP-5809. Fix job submission, broken by errant directory creation.
(Sreekanth Ramakrishnan and Jothi Padmanabhan via cdouglas)
HADOOP-5635. Change distributed cache to work with other distributed file
systems. (Andrew Hitchcock via tomwhite)
HADOOP-5856. Fix "unsafe multithreaded use of DateFormat" findbugs warning
in DataBlockScanner. (Kan Zhang via szetszwo)
HADOOP-4864. Fixes a problem to do with -libjars with multiple jars when
client and cluster reside on different OSs. (Amareshwari Sriramadasu via
ddas)
HADOOP-5623. Fixes a problem to do with status messages getting overwritten
in streaming jobs. (Rick Cox and Jothi Padmanabhan via ddas)
HADOOP-5895. Fixes computation of count of merged bytes for logging.
(Ravi Gummadi via ddas)
HADOOP-5805. problem using top level s3 buckets as input/output
directories. (Ian Nowland via tomwhite)
HADOOP-5940. trunk eclipse-plugin build fails while trying to copy
commons-cli jar from the lib dir (Giridharan Kesavan via gkesavan)
HADOOP-5864. Fix DMI and OBL findbugs in packages hdfs and metrics.
(hairong)
HADOOP-5935. Fix Hudson's release audit warnings link is broken.
(Giridharan Kesavan via gkesavan)
HADOOP-5947. Delete empty TestCombineFileInputFormat.java
HADOOP-5899. Move a log message in FSEditLog to the right place for
avoiding unnecessary log. (Suresh Srinivas via szetszwo)
HADOOP-5944. Add Apache license header to BlockManager.java. (Suresh
Srinivas via szetszwo)
HADOOP-5891. SecondaryNamenode is able to converse with the NameNode
even when the default value of dfs.http.address is not overridden.
(Todd Lipcon via dhruba)
HADOOP-5953. The isDirectory(..) and isFile(..) methods in KosmosFileSystem
should not be deprecated. (szetszwo)
HADOOP-5954. Fix javac warnings in TestFileCreation, TestSmallBlock,
TestFileStatus, TestDFSShellGenericOptions, TestSeekBug and
TestDFSStartupVersions. (szetszwo)
HADOOP-5956. Fix ivy dependency in hdfsproxy and capacity-scheduler.
(Giridharan Kesavan via szetszwo)
HADOOP-5836. Bug in S3N handling of directory markers using an object with
a trailing "/" causes jobs to fail. (Ian Nowland via tomwhite)
HADOOP-5861. s3n files are not getting split by default. (tomwhite)
HADOOP-5762. Fix a problem that DistCp does not copy empty directory.
(Rodrigo Schmidt via szetszwo)
HADOOP-5859. Fix "wait() or sleep() with locks held" findbugs warnings in
DFSClient. (Kan Zhang via szetszwo)
HADOOP-5457. Fix to continue to run builds even if contrib test fails
(Giridharan Kesavan via gkesavan)
HADOOP-5963. Remove an unnecessary exception catch in NNBench. (Boris
Shkolnik via szetszwo)
HADOOP-5989. Fix streaming test failure. (gkesavan)
HADOOP-5981. Fix a bug in HADOOP-2838 in parsing mapred.child.env.
(Amar Kamat via sharad)
HADOOP-5420. Fix LinuxTaskController to kill tasks using the process
groups they are launched with.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-6031. Remove @author tags from Java source files. (Ravi Phulari
via szetszwo)
HADOOP-5980. Fix LinuxTaskController so tasks get passed
LD_LIBRARY_PATH and other environment variables.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4041. IsolationRunner does not work as documented.
(Philip Zeyliger via tomwhite)
HADOOP-6004. Fixes BlockLocation deserialization. (Jakob Homan via
szetszwo)
HADOOP-6079. Serialize proxySource as DatanodeInfo in DataTransferProtocol.
(szetszwo)
HADOOP-6096. Fix Eclipse project and classpath files following project
split. (tomwhite)
HADOOP-6122. The great than operator in test-patch.sh should be "-gt" but
not ">". (szetszwo)
HADOOP-6114. Fix javadoc documentation for FileStatus.getLen.
(Dmitry Rzhevskiy via dhruba)
HADOOP-6131. A sysproperty should not be set unless the property
is set on the ant command line in build.xml (hong tang via mahadev)
HADOOP-6137. Fix project specific test-patch requirements
(Giridharan Kesavan)
HADOOP-6138. Eliminate the deprecated warnings introduced by H-5438.
(He Yongqiang via szetszwo)
HADOOP-6132. RPC client create an extra connection because of incorrect
key for connection cache. (Kan Zhang via rangadi)
HADOOP-6123. Add missing classpaths in hadoop-config.sh. (Sharad Agarwal
via szetszwo)
HADOOP-6172. Fix jar file names in hadoop-config.sh and include
${build.src} as a part of the source list in build.xml. (Hong Tang via
szetszwo)
HADOOP-6124. Fix javac warning detection in test-patch.sh. (Giridharan
Kesavan via szetszwo)
HADOOP-6177. FSInputChecker.getPos() would return position greater
than the file size. (Hong Tang via hairong)
HADOOP-6188. TestTrash uses java.io.File api but not hadoop FileSystem api.
(Boris Shkolnik via szetszwo)
HADOOP-6192. Fix Shell.getUlimitMemoryCommand to not rely on Map-Reduce
specific configs. (acmurthy)
HADOOP-6103. Clones the classloader as part of Configuration clone.
(Amareshwari Sriramadasu via ddas)
HADOOP-6152. Fix classpath variables in bin/hadoop-config.sh and some
other scripts. (Aaron Kimball via szetszwo)
HADOOP-6215. fix GenericOptionParser to deal with -D with '=' in the
value. (Amar Kamat via sharad)
HADOOP-6227. Fix Configuration to allow final parameters to be set to null
and prevent them from being overridden.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-6199. Move io.map.skip.index property to core-default from mapred.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-6229. Attempt to make a directory under an existing file on
LocalFileSystem should throw an Exception. (Boris Shkolnik via tomwhite)
HADOOP-6243. Fix a NullPointerException in processing deprecated keys.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-6009. S3N listStatus incorrectly returns null instead of empty
array when called on empty root. (Ian Nowland via tomwhite)
HADOOP-6181. Fix .eclipse.templates/.classpath for avro and jets3t jar
files. (Carlos Valiente via szetszwo)
HADOOP-6196. Fix a bug in SequenceFile.Reader where syncing within the
header would cause the reader to read the sync marker as a record. (Jay
Booth via cdouglas)
HADOOP-6250. Modify test-patch to delete copied XML files before running
patch build. (Rahul Kumar Singh via yhemanth)
HADOOP-6257. Two TestFileSystem classes are confusing
hadoop-hdfs-hdfwithmr. (Philip Zeyliger via tomwhite)
HADOOP-6151. Added a input filter to all of the http servlets that quotes
html characters in the parameters, to prevent cross site scripting
attacks. (omalley)
HADOOP-6274. Fix TestLocalFSFileContextMainOperations test failure.
(Gary Murry via suresh).
HADOOP-6281. Avoid null pointer exceptions when the jsps don't have
paramaters (omalley)
HADOOP-6285. Fix the result type of the getParameterMap method in the
HttpServer.QuotingInputFilter. (omalley)
HADOOP-6286. Fix bugs in related to URI handling in glob methods in
FileContext. (Boris Shkolnik via suresh)
HADOOP-6292. Update native libraries guide. (Corinne Chandel via cdouglas)
HADOOP-6327. FileContext tests should not use /tmp and should clean up
files. (Sanjay Radia via szetszwo)
HADOOP-6318. Upgrade to Avro 1.2.0. (cutting)
HADOOP-6334. Fix GenericOptionsParser to understand URI for -files,
-libjars and -archives options and fix Path to support URI with fragment.
(Amareshwari Sriramadasu via szetszwo)
HADOOP-6344. Fix rm and rmr immediately delete files rather than sending
to trash, if a user is over-quota. (Jakob Homan via suresh)
HADOOP-6347. run-test-core-fault-inject runs a test case twice if
-Dtestcase is set (cos)
HADOOP-6375. Sync documentation for FsShell du with its implementation.
(Todd Lipcon via cdouglas)
HADOOP-6441. Protect web ui from cross site scripting attacks (XSS) on
the host http header and using encoded utf-7. (omalley)
HADOOP-6451. Fix build to run contrib unit tests. (Tom White via cdouglas)
HADOOP-6374. JUnit tests should never depend on anything in conf.
(Anatoli Fomenko via cos)
HADOOP-6290. Prevent duplicate slf4j-simple jar via Avro's classpath.
(Owen O'Malley via cdouglas)
HADOOP-6293. Fix FsShell -text to work on filesystems other than the
default. (cdouglas)
HADOOP-6341. Fix test-patch.sh for checkTests function. (gkesavan)
HADOOP-6314. Fix "fs -help" for the "-count" commond. (Ravi Phulari via
szetszwo)
HADOOP-6405. Update Eclipse configuration to match changes to Ivy
configuration (Edwin Chan via cos)
HADOOP-6411. Remove deprecated file src/test/hadoop-site.xml. (cos)
HADOOP-6386. NameNode's HttpServer can't instantiate InetSocketAddress:
IllegalArgumentException is thrown (cos)
HADOOP-6254. Slow reads cause s3n to fail with SocketTimeoutException.
(Andrew Hitchcock via tomwhite)
HADOOP-6428. HttpServer sleeps with negative values. (cos)
HADOOP-6414. Add command line help for -expunge command.
(Ravi Phulari via tomwhite)
HADOOP-6391. Classpath should not be part of command line arguments.
(Cristian Ivascu via tomwhite)
HADOOP-6462. Target "compile" does not exist in contrib/cloud. (tomwhite)
HADOOP-6402. testConf.xsl is not well-formed XML. (Steve Loughran
via tomwhite)
HADOOP-6489. Fix 3 findbugs warnings. (Erik Steffl via suresh)
HADOOP-6517. Fix UserGroupInformation so that tokens are saved/retrieved
to/from the embedded Subject (Owen O'Malley & Kan Zhang via ddas)
HADOOP-6538. Sets hadoop.security.authentication to simple by default.
(ddas)
HADOOP-6540. Contrib unit tests have invalid XML for core-site, etc.
(Aaron Kimball via tomwhite)
HADOOP-6521. User specified umask using deprecated dfs.umask must override
server configured using new dfs.umaskmode for backward compatibility.
(suresh)
HADOOP-6522. Fix decoding of codepoint zero in UTF8. (cutting)
HADOOP-6505. Use tr rather than sed to effect literal substitution in the
build script. (Allen Wittenauer via cdouglas)
HADOOP-6548. Replace mortbay imports with commons logging. (cdouglas)
HADOOP-6560. Handle invalid har:// uri in HarFileSystem. (szetszwo)
HADOOP-6549. TestDoAsEffectiveUser should use ip address of the host
for superuser ip check(jnp via boryas)
HADOOP-6570. RPC#stopProxy throws NPE if getProxyEngine(proxy) returns
null. (hairong)
HADOOP-6558. Return null in HarFileSystem.getFileChecksum(..) since no
checksum algorithm is implemented. (szetszwo)
HADOOP-6572. Makes sure that SASL encryption and push to responder
queue for the RPC response happens atomically. (Kan Zhang via ddas)
HADOOP-6545. Changes the Key for the FileSystem cache to be UGI (ddas)
HADOOP-6609. Fixed deadlock in RPC by replacing shared static
DataOutputBuffer in the UTF8 class with a thread local variable. (omalley)
HADOOP-6504. Invalid example in the documentation of
org.apache.hadoop.util.Tool. (Benoit Sigoure via tomwhite)
HADOOP-6546. BloomMapFile can return false negatives. (Clark Jefcoat
via tomwhite)
HADOOP-6593. TextRecordInputStream doesn't close SequenceFile.Reader.
(Chase Bradford via tomwhite)
HADOOP-6175. Incorrect version compilation with es_ES.ISO8859-15 locale
on Solaris 10. (Urko Benito via tomwhite)
HADOOP-6645. Bugs on listStatus for HarFileSystem (rodrigo via mahadev)
HADOOP-6645. Re: Bugs on listStatus for HarFileSystem (rodrigo via
mahadev)
HADOOP-6654. Fix code example in WritableComparable javadoc. (Tom White
via szetszwo)
HADOOP-6640. FileSystem.get() does RPC retries within a static
synchronized block. (hairong)
HADOOP-6691. TestFileSystemCaching sometimes hangs. (hairong)
HADOOP-6507. Hadoop Common Docs - delete 3 doc files that do not belong
under Common. (Corinne Chandel via tomwhite)
HADOOP-6439. Fixes handling of deprecated keys to follow order in which
keys are defined. (V.V.Chaitanya Krishna via yhemanth)
HADOOP-6690. FilterFileSystem correctly handles setTimes call.
(Rodrigo Schmidt via dhruba)
HADOOP-6703. Prevent renaming a file, directory or symbolic link to
itself. (Eli Collins via suresh)
HADOOP-6710. Symbolic umask for file creation is not conformant with posix.
(suresh)
HADOOP-6719. Insert all missing methods in FilterFs.
(Rodrigo Schmidt via dhruba)
HADOOP-6724. IPC doesn't properly handle IOEs thrown by socket factory.
(Todd Lipcon via tomwhite)
HADOOP-6722. NetUtils.connect should check that it hasn't connected a socket
to itself. (Todd Lipcon via tomwhite)
HADOOP-6634. Fix AccessControlList to use short names to verify access
control. (Vinod Kumar Vavilapalli via sharad)
HADOOP-6709. Re-instate deprecated FileSystem methods that were removed
after 0.20. (tomwhite)
HADOOP-6630. hadoop-config.sh fails to get executed if hadoop wrapper
scripts are in path. (Allen Wittenauer via tomwhite)
HADOOP-6742. Add methods HADOOP-6709 from to TestFilterFileSystem.
(Eli Collins via tomwhite)
HADOOP-6727. Remove UnresolvedLinkException from public FileContext APIs.
(Eli Collins via tomwhite)
HADOOP-6631. Fix FileUtil.fullyDelete() to continue deleting other files
despite failure at any level. (Contributed by Ravi Gummadi and
Vinod Kumar Vavilapalli)
HADOOP-6723. Unchecked exceptions thrown in IPC Connection should not
orphan clients. (Todd Lipcon via tomwhite)
HADOOP-6404. Rename the generated artifacts to common instead of core.
(tomwhite)
HADOOP-6461. Webapps aren't located correctly post-split.
(Todd Lipcon and Steve Loughran via tomwhite)
HADOOP-6826. Revert FileSystem create method that takes CreateFlags.
(tomwhite)
HADOOP-6800. Harmonize JAR library versions. (tomwhite)
HADOOP-6847. Problem staging 0.21.0 artifacts to Apache Nexus Maven
Repository (Giridharan Kesavan via cos)
HADOOP-6819. [Herriot] Shell command for getting the new exceptions in
the logs returning exitcode 1 after executing successfully. (Vinay Thota
via cos)
HADOOP-6839. [Herriot] Implement a functionality for getting the user list
for creating proxy users. (Vinay Thota via cos)
HADOOP-6836. [Herriot]: Generic method for adding/modifying the attributes
for new configuration. (Vinay Thota via cos)
HADOOP-6860. 'compile-fault-inject' should never be called directly.
(Konstantin Boudnik)
HADOOP-6790. Instrumented (Herriot) build uses too wide mask to include
aspect files. (Konstantin Boudnik)
HADOOP-6875. [Herriot] Cleanup of temp. configurations is needed upon
restart of a cluster (Vinay Thota via cos)
Release 0.20.3 - Unreleased
NEW FEATURES
HADOOP-6637. Benchmark for establishing RPC session. (shv)
BUG FIXES
HADOOP-6760. WebServer shouldn't increase port number in case of negative
port setting caused by Jetty's race (cos)
HADOOP-6881. Make WritableComparator intialize classes when
looking for their raw comparator, as classes often register raw
comparators in initializers, which are no longer automatically run
in Java 6 when a class is referenced. (cutting via omalley)
Release 0.20.2 - 2010-2-16
NEW FEATURES
HADOOP-6218. Adds a feature where TFile can be split by Record
Sequence number. (Hong Tang and Raghu Angadi via ddas)
BUG FIXES
HADOOP-6231. Allow caching of filesystem instances to be disabled on a
per-instance basis. (tomwhite)
HADOOP-5759. Fix for IllegalArgumentException when CombineFileInputFormat
is used as job InputFormat. (Amareshwari Sriramadasu via dhruba)
HADOOP-6097. Fix Path conversion in makeQualified and reset LineReader byte
count at the start of each block in Hadoop archives. (Ben Slusky, Tom
White, and Mahadev Konar via cdouglas)
HADOOP-6269. Fix threading issue with defaultResource in Configuration.
(Sreekanth Ramakrishnan via cdouglas)
HADOOP-6460. Reinitializes buffers used for serializing responses in ipc
server on exceeding maximum response size to free up Java heap. (suresh)
HADOOP-6315. Avoid incorrect use of BuiltInflater/BuiltInDeflater in
GzipCodec. (Aaron Kimball via cdouglas)
HADOOP-6498. IPC client bug may cause rpc call hang. (Ruyue Ma and
hairong via hairong)
IMPROVEMENTS
HADOOP-5611. Fix C++ libraries to build on Debian Lenny. (Todd Lipcon
via tomwhite)
HADOOP-5612. Some c++ scripts are not chmodded before ant execution.
(Todd Lipcon via tomwhite)
HADOOP-1849. Add undocumented configuration parameter for per handler
call queue size in IPC Server. (shv)
Release 0.20.1 - 2009-09-01
INCOMPATIBLE CHANGES
HADOOP-5726. Remove pre-emption from capacity scheduler code base.
(Rahul Kumar Singh via yhemanth)
HADOOP-5881. Simplify memory monitoring and scheduling related
configuration. (Vinod Kumar Vavilapalli via yhemanth)
NEW FEATURES
HADOOP-6080. Introduce -skipTrash option to rm and rmr.
(Jakob Homan via shv)
HADOOP-3315. Add a new, binary file foramt, TFile. (Hong Tang via cdouglas)
IMPROVEMENTS
HADOOP-5711. Change Namenode file close log to info. (szetszwo)
HADOOP-5736. Update the capacity scheduler documentation for features
like memory based scheduling, job initialization and removal of pre-emption.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5714. Add a metric for NameNode getFileInfo operation. (Jakob Homan
via szetszwo)
HADOOP-4372. Improves the way history filenames are obtained and manipulated.
(Amar Kamat via ddas)
HADOOP-5897. Add name-node metrics to capture java heap usage.
(Suresh Srinivas via shv)
OPTIMIZATIONS
BUG FIXES
HADOOP-5691. Makes org.apache.hadoop.mapreduce.Reducer concrete class
instead of abstract. (Amareshwari Sriramadasu via sharad)
HADOOP-5646. Fixes a problem in TestQueueCapacities.
(Vinod Kumar Vavilapalli via ddas)
HADOOP-5655. TestMRServerPorts fails on java.net.BindException. (Devaraj
Das via hairong)
HADOOP-5654. TestReplicationPolicy.<init> fails on java.net.BindException.
(hairong)
HADOOP-5688. Fix HftpFileSystem checksum path construction. (Tsz Wo
(Nicholas) Sze via cdouglas)
HADOOP-4674. Fix fs help messages for -test, -text, -tail, -stat
and -touchz options. (Ravi Phulari via szetszwo)
HADOOP-5718. Remove the check for the default queue in capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5719. Remove jobs that failed initialization from the waiting queue
in the capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4744. Attaching another fix to the jetty port issue. The TaskTracker
kills itself if it ever discovers that the port to which jetty is actually
bound is invalid (-1). (ddas)
HADOOP-5349. Fixes a problem in LocalDirAllocator to check for the return
path value that is returned for the case where the file we want to write
is of an unknown size. (Vinod Kumar Vavilapalli via ddas)
HADOOP-5636. Prevents a job from going to RUNNING state after it has been
KILLED (this used to happen when the SetupTask would come back with a
success after the job has been killed). (Amar Kamat via ddas)
HADOOP-5641. Fix a NullPointerException in capacity scheduler's memory
based scheduling code when jobs get retired. (yhemanth)
HADOOP-5828. Use absolute path for mapred.local.dir of JobTracker in
MiniMRCluster. (yhemanth)
HADOOP-4981. Fix capacity scheduler to schedule speculative tasks
correctly in the presence of High RAM jobs.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5210. Solves a problem in the progress report of the reduce task.
(Ravi Gummadi via ddas)
HADOOP-5850. Fixes a problem to do with not being able to jobs with
0 maps/reduces. (Vinod K V via ddas)
HADOOP-4626. Correct the API links in hdfs forrest doc so that they
point to the same version of hadoop. (szetszwo)
HADOOP-5883. Fixed tasktracker memory monitoring to account for
momentary spurts in memory usage due to java's fork() model.
(yhemanth)
HADOOP-5539. Fixes a problem to do with not preserving intermediate
output compression for merged data.
(Jothi Padmanabhan and Billy Pearson via ddas)
HADOOP-5932. Fixes a problem in capacity scheduler in computing
available memory on a tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5908. Fixes a problem to do with ArithmeticException in the
JobTracker when there are jobs with 0 maps. (Amar Kamat via ddas)
HADOOP-5924. Fixes a corner case problem to do with job recovery with
empty history files. Also, after a JT restart, sends KillTaskAction to
tasks that report back but the corresponding job hasn't been initialized
yet. (Amar Kamat via ddas)
HADOOP-5882. Fixes a reducer progress update problem for new mapreduce
api. (Amareshwari Sriramadasu via sharad)
HADOOP-5746. Fixes a corner case problem in Streaming, where if an exception
happens in MROutputThread after the last call to the map/reduce method, the
exception goes undetected. (Amar Kamat via ddas)
HADOOP-5884. Fixes accounting in capacity scheduler so that high RAM jobs
take more slots. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5937. Correct a safemode message in FSNamesystem. (Ravi Phulari
via szetszwo)
HADOOP-5869. Fix bug in assignment of setup / cleanup task that was
causing TestQueueCapacities to fail.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5921. Fixes a problem in the JobTracker where it sometimes never used
to come up due to a system file creation on JobTracker's system-dir failing.
This problem would sometimes show up only when the FS for the system-dir
(usually HDFS) is started at nearly the same time as the JobTracker.
(Amar Kamat via ddas)
HADOOP-5920. Fixes a testcase failure for TestJobHistory.
(Amar Kamat via ddas)
HADOOP-6139. Fix the FsShell help messages for rm and rmr. (Jakob Homan
via szetszwo)
HADOOP-6145. Fix FsShell rm/rmr error messages when there is a FNFE.
(Jakob Homan via szetszwo)
HADOOP-6150. Users should be able to instantiate comparator using TFile
API. (Hong Tang via rangadi)
Release 0.20.0 - 2009-04-15
INCOMPATIBLE CHANGES
HADOOP-4210. Fix findbugs warnings for equals implementations of mapred ID
classes. Removed public, static ID::read and ID::forName; made ID an
abstract class. (Suresh Srinivas via cdouglas)
HADOOP-4253. Fix various warnings generated by findbugs.
Following deprecated methods in RawLocalFileSystem are removed:
public String getName()
public void lock(Path p, boolean shared)
public void release(Path p)
(Suresh Srinivas via johan)
HADOOP-4618. Move http server from FSNamesystem into NameNode.
FSNamesystem.getNameNodeInfoPort() is removed.
FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort()
replaced by FSNamesystem.getDFSNameNodeAddress().
NameNode(bindAddress, conf) is removed.
(shv)
HADOOP-4567. GetFileBlockLocations returns the NetworkTopology
information of the machines where the blocks reside. (dhruba)
HADOOP-4435. The JobTracker WebUI displays the amount of heap memory
in use. (dhruba)
HADOOP-4628. Move Hive into a standalone subproject. (omalley)
HADOOP-4188. Removes task's dependency on concrete filesystems.
(Sharad Agarwal via ddas)
HADOOP-1650. Upgrade to Jetty 6. (cdouglas)
HADOOP-3986. Remove static Configuration from JobClient. (Amareshwari
Sriramadasu via cdouglas)
JobClient::setCommandLineConfig is removed
JobClient::getCommandLineConfig is removed
JobShell, TestJobShell classes are removed
HADOOP-4422. S3 file systems should not create bucket.
(David Phillips via tomwhite)
HADOOP-4035. Support memory based scheduling in capacity scheduler.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-3497. Fix bug in overly restrictive file globbing with a
PathFilter. (tomwhite)
HADOOP-4445. Replace running task counts with running task
percentage in capacity scheduler UI. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4631. Splits the configuration into three parts - one for core,
one for mapred and the last one for HDFS. (Sharad Agarwal via cdouglas)
HADOOP-3344. Fix libhdfs build to use autoconf and build the same
architecture (32 vs 64 bit) of the JVM running Ant. The libraries for
pipes, utils, and libhdfs are now all in c++/<os_osarch_jvmdatamodel>/lib.
(Giridharan Kesavan via nigel)
HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
HADOOP-4970. The full path name of a file is preserved inside Trash.
(Prasad Chakka via dhruba)
HADOOP-4103. NameNode keeps a count of missing blocks. It warns on
WebUI if there are such blocks. '-report' and '-metaSave' have extra
info to track such blocks. (Raghu Angadi)
HADOOP-4783. Change permissions on history files on the jobtracker
to be only group readable instead of world readable.
(Amareshwari Sriramadasu via yhemanth)
NEW FEATURES
HADOOP-4575. Add a proxy service for relaying HsftpFileSystem requests.
Includes client authentication via user certificates and config-based
access control. (Kan Zhang via cdouglas)
HADOOP-4661. Add DistCh, a new tool for distributed ch{mod,own,grp}.
(szetszwo)
HADOOP-4709. Add several new features and bug fixes to Chukwa.
Added Hadoop Infrastructure Care Center (UI for visualize data collected
by Chukwa)
Added FileAdaptor for streaming small file in one chunk
Added compression to archive and demux output
Added unit tests and validation for agent, collector, and demux map
reduce job
Added database loader for loading demux output (sequence file) to jdbc
connected database
Added algorithm to distribute collector load more evenly
(Jerome Boulon, Eric Yang, Andy Konwinski, Ariel Rabkin via cdouglas)
HADOOP-4179. Add Vaidya tool to analyze map/reduce job logs for performanc
problems. (Suhas Gogate via omalley)
HADOOP-4029. Add NameNode storage information to the dfshealth page and
move DataNode information to a separated page. (Boris Shkolnik via
szetszwo)
HADOOP-4348. Add service-level authorization for Hadoop. (acmurthy)
HADOOP-4826. Introduce admin command saveNamespace. (shv)
HADOOP-3063 BloomMapFile - fail-fast version of MapFile for sparsely
populated key space (Andrzej Bialecki via stack)
HADOOP-1230. Add new map/reduce API and deprecate the old one. Generally,
the old code should work without problem. The new api is in
org.apache.hadoop.mapreduce and the old classes in org.apache.hadoop.mapred
are deprecated. Differences in the new API:
1. All of the methods take Context objects that allow us to add new
methods without breaking compatability.
2. Mapper and Reducer now have a "run" method that is called once and
contains the control loop for the task, which lets applications
replace it.
3. Mapper and Reducer by default are Identity Mapper and Reducer.
4. The FileOutputFormats use part-r-00000 for the output of reduce 0 and
part-m-00000 for the output of map 0.
5. The reduce grouping comparator now uses the raw compare instead of
object compare.
6. The number of maps in FileInputFormat is controlled by min and max
split size rather than min size and the desired number of maps.
(omalley)
HADOOP-3305. Use Ivy to manage dependencies. (Giridharan Kesavan
and Steve Loughran via cutting)
IMPROVEMENTS
HADOOP-4749. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via
zshao)
HADOOP-4234. Fix KFS "glue" layer to allow applications to interface
with multiple KFS metaservers. (Sriram Rao via lohit)
HADOOP-4245. Update to latest version of KFS "glue" library jar.
(Sriram Rao via lohit)
HADOOP-4244. Change test-patch.sh to check Eclipse classpath no matter
it is run by Hudson or not. (szetszwo)
HADOOP-3180. Add name of missing class to WritableName.getClass
IOException. (Pete Wyckoff via omalley)
HADOOP-4178. Make the capacity scheduler's default values configurable.
(Sreekanth Ramakrishnan via omalley)
HADOOP-4262. Generate better error message when client exception has null
message. (stevel via omalley)
HADOOP-4226. Refactor and document LineReader to make it more readily
understandable. (Yuri Pradkin via cdouglas)
HADOOP-4238. When listing jobs, if scheduling information isn't available
print NA instead of empty output. (Sreekanth Ramakrishnan via johan)
HADOOP-4284. Support filters that apply to all requests, or global filters,
to HttpServer. (Kan Zhang via cdouglas)
HADOOP-4276. Improve the hashing functions and deserialization of the
mapred ID classes. (omalley)
HADOOP-4485. Add a compile-native ant task, as a shorthand. (enis)
HADOOP-4454. Allow # comments in slaves file. (Rama Ramasamy via omalley)
HADOOP-3461. Remove hdfs.StringBytesWritable. (szetszwo)
HADOOP-4437. Use Halton sequence instead of java.util.Random in
PiEstimator. (szetszwo)
HADOOP-4572. Change INode and its sub-classes to package private.
(szetszwo)
HADOOP-4187. Does a runtime lookup for JobConf/JobConfigurable, and if
found, invokes the appropriate configure method. (Sharad Agarwal via ddas)
HADOOP-4453. Improve ssl configuration and handling in HsftpFileSystem,
particularly when used with DistCp. (Kan Zhang via cdouglas)
HADOOP-4583. Several code optimizations in HDFS. (Suresh Srinivas via
szetszwo)
HADOOP-3923. Remove org.apache.hadoop.mapred.StatusHttpServer. (szetszwo)
HADOOP-4622. Explicitly specify interpretor for non-native
pipes binaries. (Fredrik Hedberg via johan)
HADOOP-4505. Add a unit test to test faulty setup task and cleanup
task killing the job. (Amareshwari Sriramadasu via johan)
HADOOP-4608. Don't print a stack trace when the example driver gets an
unknown program to run. (Edward Yoon via omalley)
HADOOP-4645. Package HdfsProxy contrib project without the extra level
of directories. (Kan Zhang via omalley)
HADOOP-4126. Allow access to HDFS web UI on EC2 (tomwhite via omalley)
HADOOP-4612. Removes RunJar's dependency on JobClient.
(Sharad Agarwal via ddas)
HADOOP-4185. Adds setVerifyChecksum() method to FileSystem.
(Sharad Agarwal via ddas)
HADOOP-4523. Prevent too many tasks scheduled on a node from bringing
it down by monitoring for cumulative memory usage across tasks.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4640. Adds an input format that can split lzo compressed
text files. (johan)
HADOOP-4666. Launch reduces only after a few maps have run in the
Fair Scheduler. (Matei Zaharia via johan)
HADOOP-4339. Remove redundant calls from FileSystem/FsShell when
generating/processing ContentSummary. (David Phillips via cdouglas)
HADOOP-2774. Add counters tracking records spilled to disk in MapTask and
ReduceTask. (Ravi Gummadi via cdouglas)
HADOOP-4513. Initialize jobs asynchronously in the capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
Qi via cdouglas)
HADOOP-4708. Add support for dfsadmin commands in TestCLI. (Boris Shkolnik
via cdouglas)
HADOOP-4758. Add a splitter for metrics contexts to support more than one
type of collector. (cdouglas)
HADOOP-4722. Add tests for dfsadmin quota error messages. (Boris Shkolnik
via cdouglas)
HADOOP-4690. fuse-dfs - create source file/function + utils + config +
main source files. (pete wyckoff via mahadev)
HADOOP-3750. Fix and enforce module dependencies. (Sharad Agarwal via
tomwhite)
HADOOP-4747. Speed up FsShell::ls by removing redundant calls to the
filesystem. (David Phillips via cdouglas)
HADOOP-4305. Improves the blacklisting strategy, whereby, tasktrackers
that are blacklisted are not given tasks to run from other jobs, subject
to the following conditions (all must be met):
1) The TaskTracker has been blacklisted by at least 4 jobs (configurable)
2) The TaskTracker has been blacklisted 50% more number of times than
the average (configurable)
3) The cluster has less than 50% trackers blacklisted
Once in 24 hours, a TaskTracker blacklisted for all jobs is given a chance.
Restarting the TaskTracker moves it out of the blacklist.
(Amareshwari Sriramadasu via ddas)
HADOOP-4688. Modify the MiniMRDFSSort unit test to spill multiple times,
exercising the map-side merge code. (cdouglas)
HADOOP-4737. Adds the KILLED notification when jobs get killed.
(Amareshwari Sriramadasu via ddas)
HADOOP-4728. Add a test exercising different namenode configurations.
(Boris Shkolnik via cdouglas)
HADOOP-4807. Adds JobClient commands to get the active/blacklisted tracker
names. Also adds commands to display running/completed task attempt IDs.
(ddas)
HADOOP-4699. Remove checksum validation from map output servlet. (cdouglas)
HADOOP-4838. Added a registry to automate metrics and mbeans management.
(Sanjay Radia via acmurthy)
HADOOP-3136. Fixed the default scheduler to assign multiple tasks to each
tasktracker per heartbeat, when feasible. To ensure locality isn't hurt
too badly, the scheudler will not assign more than one off-switch task per
heartbeat. The heartbeat interval is also halved since the task-tracker is
fixed to no longer send out heartbeats on each task completion. A
slow-start for scheduling reduces is introduced to ensure that reduces
aren't started till sufficient number of maps are done, else reduces of
jobs whose maps aren't scheduled might swamp the cluster.
Configuration changes to mapred-default.xml:
add mapred.reduce.slowstart.completed.maps
(acmurthy)
HADOOP-4545. Add example and test case of secondary sort for the reduce.
(omalley)
HADOOP-4753. Refactor gridmix2 to reduce code duplication. (cdouglas)
HADOOP-4909. Fix Javadoc and make some of the API more consistent in their
use of the JobContext instead of Configuration. (omalley)
HADOOP-4920. Stop storing Forrest output in Subversion. (cutting)
HADOOP-4948. Add parameters java5.home and forrest.home to the ant commands
in test-patch.sh. (Giridharan Kesavan via szetszwo)
HADOOP-4830. Add end-to-end test cases for testing queue capacities.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4980. Improve code layout of capacity scheduler to make it
easier to fix some blocker bugs. (Vivek Ratan via yhemanth)
HADOOP-4916. Make user/location of Chukwa installation configurable by an
external properties file. (Eric Yang via cdouglas)
HADOOP-4950. Make the CompressorStream, DecompressorStream,
BlockCompressorStream, and BlockDecompressorStream public to facilitate
non-Hadoop codecs. (omalley)
HADOOP-4843. Collect job history and configuration in Chukwa. (Eric Yang
via cdouglas)
HADOOP-5030. Build Chukwa RPM to install into configured directory. (Eric
Yang via cdouglas)
HADOOP-4828. Updates documents to do with configuration (HADOOP-4631).
(Sharad Agarwal via ddas)
HADOOP-4939. Adds a test that would inject random failures for tasks in
large jobs and would also inject TaskTracker failures. (ddas)
HADOOP-4944. A configuration file can include other configuration
files. (Rama Ramasamy via dhruba)
HADOOP-4804. Provide Forrest documentation for the Fair Scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5248. A testcase that checks for the existence of job directory
after the job completes. Fails if it exists. (ddas)
HADOOP-4664. Introduces multiple job initialization threads, where the
number of threads are configurable via mapred.jobinit.threads.
(Matei Zaharia and Jothi Padmanabhan via ddas)
HADOOP-4191. Adds a testcase for JobHistory. (Ravi Gummadi via ddas)
HADOOP-5466. Change documenation CSS style for headers and code. (Corinne
Chandel via szetszwo)
HADOOP-5275. Add ivy directory and files to built tar.
(Giridharan Kesavan via nigel)
HADOOP-5468. Add sub-menus to forrest documentation and make some minor
edits. (Corinne Chandel via szetszwo)
HADOOP-5437. Fix TestMiniMRDFSSort to properly test jvm-reuse. (omalley)
HADOOP-5521. Removes dependency of TestJobInProgress on RESTART_COUNT
JobHistory tag. (Ravi Gummadi via ddas)
OPTIMIZATIONS
HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
based on the rack/host that has the most number of bytes.
(Jothi Padmanabhan via ddas)
HADOOP-4683. Fixes Reduce shuffle scheduler to invoke
getMapCompletionEvents in a separate thread. (Jothi Padmanabhan
via ddas)
BUG FIXES
HADOOP-4204. Fix findbugs warnings related to unused variables, naive
Number subclass instantiation, Map iteration, and badly scoped inner
classes. (Suresh Srinivas via cdouglas)
HADOOP-4207. Update derby jar file to release 10.4.2 release.
(Prasad Chakka via dhruba)
HADOOP-4325. SocketInputStream.read() should return -1 in case EOF.
(Raghu Angadi)
HADOOP-4408. FsAction functions need not create new objects. (cdouglas)
HADOOP-4440. TestJobInProgressListener tests for jobs killed in queued
state (Amar Kamat via ddas)
HADOOP-4346. Implement blocking connect so that Hadoop is not affected
by selector problem with JDK default implementation. (Raghu Angadi)
HADOOP-4388. If there are invalid blocks in the transfer list, Datanode
should handle them and keep transferring the remaining blocks. (Suresh
Srinivas via szetszwo)
HADOOP-4587. Fix a typo in Mapper javadoc. (Koji Noguchi via szetszwo)
HADOOP-4530. In fsck, HttpServletResponse sendError fails with
IllegalStateException. (hairong)
HADOOP-4377. Fix a race condition in directory creation in
NativeS3FileSystem. (David Phillips via cdouglas)
HADOOP-4621. Fix javadoc warnings caused by duplicate jars. (Kan Zhang via
cdouglas)
HADOOP-4566. Deploy new hive code to support more types.
(Zheng Shao via dhruba)
HADOOP-4571. Add chukwa conf files to svn:ignore list. (Eric Yang via
szetszwo)
HADOOP-4589. Correct PiEstimator output messages and improve the code
readability. (szetszwo)
HADOOP-4650. Correct a mismatch between the default value of
local.cache.size in the config and the source. (Jeff Hammerbacher via
cdouglas)
HADOOP-4606. Fix cygpath error if the log directory does not exist.
(szetszwo via omalley)
HADOOP-4141. Fix bug in ScriptBasedMapping causing potential infinite
loop on misconfigured hadoop-site. (Aaron Kimball via tomwhite)
HADOOP-4691. Correct a link in the javadoc of IndexedSortable. (szetszwo)
HADOOP-4598. '-setrep' command skips under-replicated blocks. (hairong)
HADOOP-4429. Set defaults for user, group in UnixUserGroupInformation so
login fails more predictably when misconfigured. (Alex Loddengaard via
cdouglas)
HADOOP-4676. Fix broken URL in blacklisted tasktrackers page. (Amareshwari
Sriramadasu via cdouglas)
HADOOP-3422 Ganglia counter metrics are all reported with the metric
name "value", so the counter values can not be seen. (Jason Attributor
and Brian Bockelman via stack)
HADOOP-4704. Fix javadoc typos "the the". (szetszwo)
HADOOP-4677. Fix semantics of FileSystem::getBlockLocations to return
meaningful values. (Hong Tang via cdouglas)
HADOOP-4669. Use correct operator when evaluating whether access time is
enabled (Dhruba Borthakur via cdouglas)
HADOOP-4732. Pass connection and read timeouts in the correct order when
setting up fetch in reduce. (Amareshwari Sriramadasu via cdouglas)
HADOOP-4558. Fix capacity reclamation in capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4770. Fix rungridmix_2 script to work with RunJar. (cdouglas)
HADOOP-4738. When using git, the saveVersion script will use only the
commit hash for the version and not the message, which requires escaping.
(cdouglas)
HADOOP-4576. Show pending job count instead of task count in the UI per
queue in capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4623. Maintain running tasks even if speculative execution is off.
(Amar Kamat via yhemanth)
HADOOP-4786. Fix broken compilation error in
TestTrackerBlacklistAcrossJobs. (yhemanth)
HADOOP-4785. Fixes theJobTracker heartbeat to not make two calls to
System.currentTimeMillis(). (Amareshwari Sriramadasu via ddas)
HADOOP-4792. Add generated Chukwa configuration files to version control
ignore lists. (cdouglas)
HADOOP-4796. Fix Chukwa test configuration, remove unused components. (Eric
Yang via cdouglas)
HADOOP-4708. Add binaries missed in the initial checkin for Chukwa. (Eric
Yang via cdouglas)
HADOOP-4805. Remove black list collector from Chukwa Agent HTTP Sender.
(Eric Yang via cdouglas)
HADOOP-4837. Move HADOOP_CONF_DIR configuration to chukwa-env.sh (Jerome
Boulon via cdouglas)
HADOOP-4825. Use ps instead of jps for querying process status in Chukwa.
(Eric Yang via cdouglas)
HADOOP-4844. Fixed javadoc for
org.apache.hadoop.fs.permission.AccessControlException to document that
it's deprecated in favour of
org.apache.hadoop.security.AccessControlException. (acmurthy)
HADOOP-4706. Close the underlying output stream in
IFileOutputStream::close. (Jothi Padmanabhan via cdouglas)
HADOOP-4855. Fixed command-specific help messages for refreshServiceAcl in
DFSAdmin and MRAdmin. (acmurthy)
HADOOP-4820. Remove unused method FSNamesystem::deleteInSafeMode. (Suresh
Srinivas via cdouglas)
HADOOP-4698. Lower io.sort.mb to 10 in the tests and raise the junit memory
limit to 512m from 256m. (Nigel Daley via cdouglas)
HADOOP-4860. Split TestFileTailingAdapters into three separate tests to
avoid contention. (Eric Yang via cdouglas)
HADOOP-3921. Fixed clover (code coverage) target to work with JDK 6.
(tomwhite via nigel)
HADOOP-4845. Modify the reduce input byte counter to record only the
compressed size and add a human-readable label. (Yongqiang He via cdouglas)
HADOOP-4458. Add a test creating symlinks in the working directory.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-4879. Fix org.apache.hadoop.mapred.Counters to correctly define
Object.equals rather than depend on contentEquals api. (omalley via
acmurthy)
HADOOP-4791. Fix rpm build process for Chukwa. (Eric Yang via cdouglas)
HADOOP-4771. Correct initialization of the file count for directories
with quotas. (Ruyue Ma via shv)
HADOOP-4878. Fix eclipse plugin classpath file to point to ivy's resolved
lib directory and added the same to test-patch.sh. (Giridharan Kesavan via
acmurthy)
HADOOP-4774. Fix default values of some capacity scheduler configuration
items which would otherwise not work on a fresh checkout.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4876. Fix capacity scheduler reclamation by updating count of
pending tasks correctly. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4849. Documentation for Service Level Authorization implemented in
HADOOP-4348. (acmurthy)
HADOOP-4827. Replace Consolidator with Aggregator macros in Chukwa (Eric
Yang via cdouglas)
HADOOP-4894. Correctly parse ps output in Chukwa jettyCollector.sh. (Ari
Rabkin via cdouglas)
HADOOP-4892. Close fds out of Chukwa ExecPlugin. (Ari Rabkin via cdouglas)
HADOOP-4889. Fix permissions in RPM packaging. (Eric Yang via cdouglas)
HADOOP-4869. Fixes the TT-JT heartbeat to have an explicit flag for
restart apart from the initialContact flag that there was earlier.
(Amareshwari Sriramadasu via ddas)
HADOOP-4716. Fixes ReduceTask.java to clear out the mapping between
hosts and MapOutputLocation upon a JT restart (Amar Kamat via ddas)
HADOOP-4880. Removes an unnecessary testcase from TestJobTrackerRestart.
(Amar Kamat via ddas)
HADOOP-4924. Fixes a race condition in TaskTracker re-init. (ddas)
HADOOP-4854. Read reclaim capacity interval from capacity scheduler
configuration. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4896. HDFS Fsck does not load HDFS configuration. (Raghu Angadi)
HADOOP-4956. Creates TaskStatus for failed tasks with an empty Counters
object instead of null. (ddas)
HADOOP-4979. Fix capacity scheduler to block cluster for failed high
RAM requirements across task types. (Vivek Ratan via yhemanth)
HADOOP-4949. Fix native compilation. (Chris Douglas via acmurthy)
HADOOP-4787. Fixes the testcase TestTrackerBlacklistAcrossJobs which was
earlier failing randomly. (Amareshwari Sriramadasu via ddas)
HADOOP-4914. Add description fields to Chukwa init.d scripts (Eric Yang via
cdouglas)
HADOOP-4884. Make tool tip date format match standard HICC format. (Eric
Yang via cdouglas)
HADOOP-4925. Make Chukwa sender properties configurable. (Ari Rabkin via
cdouglas)
HADOOP-4947. Make Chukwa command parsing more forgiving of whitespace. (Ari
Rabkin via cdouglas)
HADOOP-5026. Make chukwa/bin scripts executable in repository. (Andy
Konwinski via cdouglas)
HADOOP-4977. Fix a deadlock between the reclaimCapacity and assignTasks
in capacity scheduler. (Vivek Ratan via yhemanth)
HADOOP-4988. Fix reclaim capacity to work even when there are queues with
no capacity. (Vivek Ratan via yhemanth)
HADOOP-5065. Remove generic parameters from argument to
setIn/OutputFormatClass so that it works with SequenceIn/OutputFormat.
(cdouglas via omalley)
HADOOP-4818. Pass user config to instrumentation API. (Eric Yang via
cdouglas)
HADOOP-4993. Fix Chukwa agent configuration and startup to make it both
more modular and testable. (Ari Rabkin via cdouglas)
HADOOP-5048. Fix capacity scheduler to correctly cleanup jobs that are
killed after initialization, but before running.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4671. Mark loop control variables shared between threads as
volatile. (cdouglas)
HADOOP-5079. HashFunction inadvertently destroys some randomness
(Jonathan Ellis via stack)
HADOOP-4999. A failure to write to FsEditsLog results in
IndexOutOfBounds exception. (Boris Shkolnik via rangadi)
HADOOP-5139. Catch IllegalArgumentException during metrics registration
in RPC. (Hairong Kuang via szetszwo)
HADOOP-5085. Copying a file to local with Crc throws an exception.
(hairong)
HADOOP-5211. Fix check for job completion in TestSetupAndCleanupFailure.
(enis)
HADOOP-5254. The Configuration class should be able to work with XML
parsers that do not support xmlinclude. (Steve Loughran via dhruba)
HADOOP-4692. Namenode in infinite loop for replicating/deleting corrupt
blocks. (hairong)
HADOOP-5255. Fix use of Math.abs to avoid overflow. (Jonathan Ellis via
cdouglas)
HADOOP-5269. Fixes a problem to do with tasktracker holding on to
FAILED_UNCLEAN or KILLED_UNCLEAN tasks forever. (Amareshwari Sriramadasu
via ddas)
HADOOP-5214. Fixes a ConcurrentModificationException while the Fairshare
Scheduler accesses the tasktrackers stored by the JobTracker.
(Rahul Kumar Singh via yhemanth)
HADOOP-5233. Addresses the three issues - Race condition in updating
status, NPE in TaskTracker task localization when the conf file is missing
(HADOOP-5234) and NPE in handling KillTaskAction of a cleanup task
(HADOOP-5235). (Amareshwari Sriramadasu via ddas)
HADOOP-5247. Introduces a broadcast of KillJobAction to all trackers when
a job finishes. This fixes a bunch of problems to do with NPE when a
completed job is not in memory and a tasktracker comes to the jobtracker
with a status report of a task belonging to that job. (Amar Kamat via ddas)
HADOOP-5282. Fixed job history logs for task attempts that are
failed by the JobTracker, say due to lost task trackers. (Amar
Kamat via yhemanth)
HADOOP-5241. Fixes a bug in disk-space resource estimation. Makes
the estimation formula linear where blowUp =
Total-Output/Total-Input. (Sharad Agarwal via ddas)
HADOOP-5142. Fix MapWritable#putAll to store key/value classes.
(Do??acan G??ney via enis)
HADOOP-4744. Workaround for jetty6 returning -1 when getLocalPort
is invoked on the connector. The workaround patch retries a few
times before failing. (Jothi Padmanabhan via yhemanth)
HADOOP-5280. Adds a check to prevent a task state transition from
FAILED to any of UNASSIGNED, RUNNING, COMMIT_PENDING or
SUCCEEDED. (ddas)
HADOOP-5272. Fixes a problem to do with detecting whether an
attempt is the first attempt of a Task. This affects JobTracker
restart. (Amar Kamat via ddas)
HADOOP-5306. Fixes a problem to do with logging/parsing the http port of a
lost tracker. Affects JobTracker restart. (Amar Kamat via ddas)
HADOOP-5111. Fix Job::set* methods to work with generics. (cdouglas)
HADOOP-5274. Fix gridmix2 dependency on wordcount example. (cdouglas)
HADOOP-5145. Balancer sometimes runs out of memory after running
days or weeks. (hairong)
HADOOP-5338. Fix jobtracker restart to clear task completion
events cached by tasktrackers forcing them to fetch all events
afresh, thus avoiding missed task completion events on the
tasktrackers. (Amar Kamat via yhemanth)
HADOOP-4695. Change TestGlobalFilter so that it allows a web page to be
filtered more than once for a single access. (Kan Zhang via szetszwo)
HADOOP-5298. Change TestServletFilter so that it allows a web page to be
filtered more than once for a single access. (szetszwo)
HADOOP-5432. Disable ssl during unit tests in hdfsproxy, as it is unused
and causes failures. (cdouglas)
HADOOP-5416. Correct the shell command "fs -test" forrest doc description.
(Ravi Phulari via szetszwo)
HADOOP-5327. Fixed job tracker to remove files from system directory on
ACL check failures and also check ACLs on restart.
(Amar Kamat via yhemanth)
HADOOP-5395. Change the exception message when a job is submitted to an
invalid queue. (Rahul Kumar Singh via yhemanth)
HADOOP-5276. Fixes a problem to do with updating the start time of
a task when the tracker that ran the task is lost. (Amar Kamat via
ddas)
HADOOP-5278. Fixes a problem to do with logging the finish time of
a task during recovery (after a JobTracker restart). (Amar Kamat
via ddas)
HADOOP-5490. Fixes a synchronization problem in the
EagerTaskInitializationListener class. (Jothi Padmanabhan via
ddas)
HADOOP-5493. The shuffle copier threads return the codecs back to
the pool when the shuffle completes. (Jothi Padmanabhan via ddas)
HADOOP-5414. Fixes IO exception while executing hadoop fs -touchz
fileName by making sure that lease renewal thread exits before dfs
client exits. (hairong)
HADOOP-5103. FileInputFormat now reuses the clusterMap network
topology object and that brings down the log messages in the
JobClient to do with NetworkTopology.add significantly. (Jothi
Padmanabhan via ddas)
HADOOP-5483. Fixes a problem in the Directory Cleanup Thread due to which
TestMiniMRWithDFS sometimes used to fail. (ddas)
HADOOP-5281. Prevent sharing incompatible ZlibCompressor instances between
GzipCodec and DefaultCodec. (cdouglas)
HADOOP-5463. Balancer throws "Not a host:port pair" unless port is
specified in fs.default.name. (Stuart White via hairong)
HADOOP-5514. Fix JobTracker metrics and add metrics for wating, failed
tasks. (cdouglas)
HADOOP-5516. Fix NullPointerException in TaskMemoryManagerThread
that comes when monitored processes disappear when the thread is
running. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5382. Support combiners in the new context object API. (omalley)
HADOOP-5471. Fixes a problem to do with updating the log.index file in the
case where a cleanup task is run. (Amareshwari Sriramadasu via ddas)
HADOOP-5534. Fixed a deadlock in Fair scheduler's servlet.
(Rahul Kumar Singh via yhemanth)
HADOOP-5328. Fixes a problem in the renaming of job history files during
job recovery. (Amar Kamat via ddas)
HADOOP-5417. Don't ignore InterruptedExceptions that happen when calling
into rpc. (omalley)
HADOOP-5320. Add a close() in TestMapReduceLocal. (Jothi Padmanabhan
via szetszwo)
HADOOP-5520. Fix a typo in disk quota help message. (Ravi Phulari
via szetszwo)
HADOOP-5519. Remove claims from mapred-default.xml that prime numbers
of tasks are helpful. (Owen O'Malley via szetszwo)
HADOOP-5484. TestRecoveryManager fails wtih FileAlreadyExistsException.
(Amar Kamat via hairong)
HADOOP-5564. Limit the JVM heap size in the java command for initializing
JAVA_PLATFORM. (Suresh Srinivas via szetszwo)
HADOOP-5565. Add API for failing/finalized jobs to the JT metrics
instrumentation. (Jerome Boulon via cdouglas)
HADOOP-5390. Remove duplicate jars from tarball, src from binary tarball
added by hdfsproxy. (Zhiyong Zhang via cdouglas)
HADOOP-5066. Building binary tarball should not build docs/javadocs, copy
src, or run jdiff. (Giridharan Kesavan via cdouglas)
HADOOP-5459. Fix undetected CRC errors where intermediate output is closed
before it has been completely consumed. (cdouglas)
HADOOP-5571. Remove widening primitive conversion in TupleWritable mask
manipulation. (Jingkei Ly via cdouglas)
HADOOP-5588. Remove an unnecessary call to listStatus(..) in
FileSystem.globStatusInternal(..). (Hairong Kuang via szetszwo)
HADOOP-5473. Solves a race condition in killing a task - the state is KILLED
if there is a user request pending to kill the task and the TT reported
the state as SUCCESS. (Amareshwari Sriramadasu via ddas)
HADOOP-5576. Fix LocalRunner to work with the new context object API in
mapreduce. (Tom White via omalley)
HADOOP-4374. Installs a shutdown hook in the Task JVM so that log.index is
updated before the JVM exits. Also makes the update to log.index atomic.
(Ravi Gummadi via ddas)
HADOOP-5577. Add a verbose flag to mapreduce.Job.waitForCompletion to get
the running job's information printed to the user's stdout as it runs.
(omalley)
HADOOP-5607. Fix NPE in TestCapacityScheduler. (cdouglas)
HADOOP-5605. All the replicas incorrectly got marked as corrupt. (hairong)
HADOOP-5337. JobTracker, upon restart, now waits for the TaskTrackers to
join back before scheduling new tasks. This fixes race conditions associated
with greedy scheduling as was the case earlier. (Amar Kamat via ddas)
HADOOP-5227. Fix distcp so -update and -delete can be meaningfully
combined. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5305. Increase number of files and print debug messages in
TestCopyFiles. (szetszwo)
HADOOP-5548. Add synchronization for JobTracker methods in RecoveryManager.
(Amareshwari Sriramadasu via sharad)
HADOOP-3810. NameNode seems unstable on a cluster with little space left.
(hairong)
HADOOP-5068. Fix NPE in TestCapacityScheduler. (Vinod Kumar Vavilapalli
via szetszwo)
HADOOP-5585. Clear FileSystem statistics between tasks when jvm-reuse
is enabled. (omalley)
HADOOP-5394. JobTracker might schedule 2 attempts of the same task
with the same attempt id across restarts. (Amar Kamat via sharad)
HADOOP-5645. After HADOOP-4920 we need a place to checkin
releasenotes.html. (nigel)
Release 0.19.2 - 2009-06-30
BUG FIXES
HADOOP-5154. Fixes a deadlock in the fairshare scheduler.
(Matei Zaharia via yhemanth)
HADOOP-5146. Fixes a race condition that causes LocalDirAllocator to miss
files. (Devaraj Das via yhemanth)
HADOOP-4638. Fixes job recovery to not crash the job tracker for problems
with a single job file. (Amar Kamat via yhemanth)
HADOOP-5384. Fix a problem that DataNodeCluster creates blocks with
generationStamp == 1. (szetszwo)
HADOOP-5376. Fixes the code handling lost tasktrackers to set the task state
to KILLED_UNCLEAN only for relevant type of tasks.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5285. Fixes the issues - (1) obtainTaskCleanupTask checks whether job is
inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class
outside the TaskTracker and makes it a generic class that is used by the
JobTracker also for deleting the paths on the job's output fs. (3) Moves the
references to completedJobStore outside the block where the JobTracker is locked.
(ddas)
HADOOP-5392. Fixes a problem to do with JT crashing during recovery when
the job files are garbled. (Amar Kamat via ddas)
HADOOP-5332. Appending to files is not allowed (by default) unless
dfs.support.append is set to true. (dhruba)
HADOOP-5333. libhdfs supports appending to files. (dhruba)
HADOOP-3998. Fix dfsclient exception when JVM is shutdown. (dhruba)
HADOOP-5440. Fixes a problem to do with removing a taskId from the list
of taskIds that the TaskTracker's TaskMemoryManager manages.
(Amareshwari Sriramadasu via ddas)
HADOOP-5446. Restore TaskTracker metrics. (cdouglas)
HADOOP-5449. Fixes the history cleaner thread.
(Amareshwari Sriramadasu via ddas)
HADOOP-5479. NameNode should not send empty block replication request to
DataNode. (hairong)
HADOOP-5259. Job with output hdfs:/user/<username>/outputpath (no
authority) fails with Wrong FS. (Doug Cutting via hairong)
HADOOP-5522. Documents the setup/cleanup tasks in the mapred tutorial.
(Amareshwari Sriramadasu via ddas)
HADOOP-5549. ReplicationMonitor should schedule both replication and
deletion work in one iteration. (hairong)
HADOOP-5554. DataNodeCluster and CreateEditsLog should create blocks with
the same generation stamp value. (hairong via szetszwo)
HADOOP-5231. Clones the TaskStatus before passing it to the JobInProgress.
(Amareshwari Sriramadasu via ddas)
HADOOP-4719. Fix documentation of 'ls' format for FsShell. (Ravi Phulari
via cdouglas)
HADOOP-5374. Fixes a NPE problem in getTasksToSave method.
(Amareshwari Sriramadasu via ddas)
HADOOP-4780. Cache the size of directories in DistributedCache, avoiding
long delays in recalculating it. (He Yongqiang via cdouglas)
HADOOP-5551. Prevent directory destruction on file create.
(Brian Bockelman via shv)
HADOOP-5671. Fix FNF exceptions when copying from old versions of
HftpFileSystem. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5213. Fix Null pointer exception caused when bzip2compression
was used and user closed a output stream without writing any data.
(Zheng Shao via dhruba)
HADOOP-5579. Set errno correctly in libhdfs for permission, quota, and FNF
conditions. (Brian Bockelman via cdouglas)
HADOOP-5816. Fixes a problem in the KeyFieldBasedComparator to do with
ArrayIndexOutOfBounds exception. (He Yongqiang via ddas)
HADOOP-5951. Add Apache license header to StorageInfo.java. (Suresh
Srinivas via szetszwo)
Release 0.19.1 - 2009-02-23
IMPROVEMENTS
HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
mapred tutorial. (Vivek Ratan via cdouglas)
HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
BUG FIXES
HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
blocks correctly. (Sriram Rao via cdouglas)
HADOOP-4420. Add null checks for job, caused by invalid job IDs.
(Aaron Kimball via tomwhite)
HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
current workding directory for scratch space. (Amar Kamat via cdouglas)
HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
szetszwo)
HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
fuse-dfs. (Brian Bockelman via szetszwo)
HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
via szetszwo)
HADOOP-4821. Usage description in the Quotas guide documentations are
incorrect. (Boris Shkolnik via hairong)
HADOOP-4847. Moves the loading of OutputCommitter to the Task.
(Amareshwari Sriramadasu via ddas)
HADOOP-4966. Marks completed setup tasks for removal.
(Amareshwari Sriramadasu via ddas)
HADOOP-4982. TestFsck should run in Eclipse. (shv)
HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
fd unclosed. (hairong)
HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
(Zheng Shao via dhruba).
HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
tasks. (ddas)
HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
verification log unclosed. (hairong)
HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
HADOOP-4955. Make DBOutputFormat us column names from setOutput().
(Kevin Peterson via enis)
HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
spurious IOExceptions logged by DataNode. (Raghu Angadi)
HADOOP-5034. NameNode should send both replication and deletion requests
to DataNode in one reply to a heartbeat. (hairong)
HADOOP-4759. Removes temporary output directory for failed and killed
tasks by launching special CLEANUP tasks for the same.
(Amareshwari Sriramadasu via ddas)
HADOOP-5161. Accepted sockets do not get placed in
DataXceiverServer#childSockets. (hairong)
HADOOP-5193. Correct calculation of edits modification time. (shv)
HADOOP-4494. Allow libhdfs to append to files.
(Pete Wyckoff via dhruba)
HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
for the JobTracker. (Amar Kamat via yhemanth).
HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
killed tasks correctly. (Amareshwari Sriramadasu via ddas)
HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
(enis)
Release 0.19.0 - 2008-11-18
INCOMPATIBLE CHANGES
HADOOP-3595. Remove deprecated methods for mapred.combine.once
functionality, which was necessary to providing backwards
compatible combiner semantics for 0.18. (cdouglas via omalley)
HADOOP-3667. Remove the following deprecated methods from JobConf:
addInputPath(Path)
getInputPaths()
getMapOutputCompressionType()
getOutputPath()
getSystemDir()
setInputPath(Path)
setMapOutputCompressionType(CompressionType style)
setOutputPath(Path)
(Amareshwari Sriramadasu via omalley)
HADOOP-3652. Remove deprecated class OutputFormatBase.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-2885. Break the hadoop.dfs package into separate packages under
hadoop.hdfs that reflect whether they are client, server, protocol,
etc. DistributedFileSystem and DFSClient have moved and are now
considered package private. (Sanjay Radia via omalley)
HADOOP-2325. Require Java 6. (cutting)
HADOOP-372. Add support for multiple input paths with a different
InputFormat and Mapper for each path. (Chris Smith via tomwhite)
HADOOP-1700. Support appending to file in HDFS. (dhruba)
HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
zero for true and non-zero for false. (Ben Slusky via cdouglas)
HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
which is no longer needed. (tomwhite via omalley)
HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
EACCES is returned for permission problems. (Ben Slusky via omalley)
HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
so increment the InterTrackerProtocol version. (Hemanth Yamijala via
omalley)
HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
committing output files. Moves job setup to jobclient, and moves jobcleanup
to a separate task. (Amareshwari Sriramadasu via ddas)
HADOOP-3446. Keep map outputs in memory during the reduce. Remove
fs.inmemory.size.mb and replace with properties defining in memory map
output retention during the shuffle and reduce relative to maximum heap
usage. (cdouglas)
HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
jobs can be recovered from the history file. The history file format has
been modified to support recovery. The task attempt ID now has the
JobTracker start time to disinguish attempts of the same TIP across
restarts. (Amar Ramesh Kamat via ddas)
HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
(Sanjay Radia via hairong)
HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
HADOOP-2816. Cluster summary at name node web reports the space
utilization as:
Configured Capacity: capacity of all the data directories - Reserved space
Present Capacity: Space available for dfs,i.e. remaining+used space
DFS Used%: DFS used space/Present Capacity
(Suresh Srinivas via hairong)
HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
quotas in 0.18. (rangadi)
HADOOP-4293. Make Configuration Writable and remove unreleased
WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
HADOOP-4281. Change dfsadmin to report available disk space in a format
consistent with the web interface as defined in HADOOP-2816. (Suresh
Srinivas via cdouglas)
HADOOP-4430. Further change the cluster summary at name node web that was
changed in HADOOP-2816:
Non DFS Used - This indicates the disk space taken by non DFS file from
the Configured capacity
DFS Used % - DFS Used % of Configured Capacity
DFS Remaining % - Remaing % Configured Capacity available for DFS use
DFS command line report reflects the same change. Config parameter
dfs.datanode.du.pct is no longer used and is removed from the
hadoop-default.xml. (Suresh Srinivas via hairong)
HADOOP-4116. Balancer should provide better resource management. (hairong)
HADOOP-4599. BlocksMap and BlockInfo made package private. (shv)
NEW FEATURES
HADOOP-3341. Allow streaming jobs to specify the field separator for map
and reduce input and output. The new configuration values are:
stream.map.input.field.separator
stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
All of them default to "\t". (Zheng Shao via omalley)
HADOOP-3479. Defines the configuration file for the resource manager in
Hadoop. You can configure various parameters related to scheduling, such
as queues and queue properties here. The properties for a queue follow a
naming convention,such as, hadoop.rm.queue.queue-name.property-name.
(Hemanth Yamijala via ddas)
HADOOP-3149. Adds a way in which map/reducetasks can create multiple
outputs. (Alejandro Abdelnur via ddas)
HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
bash tab completion for the bin/hadoop script. See the README file
in the contrib directory for the installation. (Chris Smith via enis)
HADOOP-3730. Adds a new JobConf constructor that disables loading
default configurations. (Alejandro Abdelnur via ddas)
HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
(Ari Rabkin via acmurthy)
HADOOP-2302. Provides a comparator for numerical sorting of key fields.
(ddas)
HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
HADOOP-657. Free disk space should be modelled and used by the scheduler
to make scheduling decisions. (Ari Rabkin via omalley)
HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
and Eric Yang)
HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
number of files/bytes copied in a particular run to support incremental
updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3585. FailMon package for hardware failure monitoring and
analysis of anomalies. (Ioannis Koltsidas via dhruba)
HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3759. Provides ability to run memory intensive jobs without
affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
HADOOP-3828. Provides a way to write skipped records to DFS.
(Sharad Agarwal via ddas)
HADOOP-3948. Separate name-node edits and fsimage directories.
(Lohit Vijayarenu via shv)
HADOOP-3939. Add an option to DistCp to delete files at the destination
not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
(Craig Weisenfluh via omalley)
HADOOP-3698. Add access control to control who is allowed to submit or
modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
HADOOP-1869. Support access times for HDFS files. (dhruba)
HADOOP-3941. Extend FileSystem API to return file-checksums.
(szetszwo)
HADOOP-3581. Prevents memory intensive user tasks from taking down
nodes. (Vinod K V via ddas)
HADOOP-3970. Provides a way to recover counters written to JobHistory.
(Amar Kamat via ddas)
HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
chains of Maps and Reduces in a single Map/Reduce job, something like
MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
queues as a percentage of the cluster. (Vivek Ratan via omalley)
HADOOP-3992. Add a synthetic load generation facility to the test
directory. (hairong via szetszwo)
HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
and change DistCp to use file checksum for comparing src and dst files
(szetszwo)
HADOOP-3829. Narrown down skipped records based on user acceptable value.
(Sharad Agarwal via ddas)
HADOOP-3930. Add common interfaces for the pluggable schedulers and the
cli & gui clients. (Sreekanth Ramakrishnan via omalley)
HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
Configuration changes to hadoop-default.xml:
add mapred.job.reuse.jvm.num.tasks
(Devaraj Das via acmurthy)
HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
query language. (tomwhite)
HADOOP-2536. Implement a JDBC based database input and output formats to
allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
Enis Soztutar via acmurthy)
HADOOP-3019. A new library to support total order partitions.
(cdouglas via omalley)
HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
acmurthy)
IMPROVEMENTS
HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
(zshao)
HADOOP-4106. libhdfs: add time, permission and user attribute support
(part 2). (Pete Wyckoff through zshao)
HADOOP-4104. libhdfs: add time, permission and user attribute support.
(Pete Wyckoff through zshao)
HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
(Pete Wyckoff through zshao)
HADOOP-3732. Delay intialization of datanode block verification till
the verification thread is started. (rangadi)
HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
(rangadi)
HADOOP-3577. Tools to inject blocks into name node and simulated
data nodes for testing. (Sanjay Radia via hairong)
HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. (cdouglas via omalley)
HADOOP-3655. Add additional ant properties to control junit. (Steve
Loughran via omalley)
HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
HADOOP-3587. Add a unit test for the contrib/data_join framework.
(cdouglas)
HADOOP-3402. Add terasort example program (omalley)
HADOOP-3660. Add replication factor for injecting blocks in simulated
datanodes. (Sanjay Radia via cdouglas)
HADOOP-3684. Add a cloning function to the contrib/data_join framework
permitting users to define a more efficient method for cloning values from
the reduce than serialization/deserialization. (Runping Qi via cdouglas)
HADOOP-3478. Improves the handling of map output fetching. Now the
randomization is by the hosts (and not the map outputs themselves).
(Jothi Padmanabhan via ddas)
HADOOP-3617. Removed redundant checks of accounting space in MapTask and
makes the spill thread persistent so as to avoid creating a new one for
each spill. (Chris Douglas via acmurthy)
HADOOP-3412. Factor the scheduler out of the JobTracker and make
it pluggable. (Tom White and Brice Arnould via omalley)
HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
hadoop-default.xml. (rangadi)
HADOOP-3747. Adds counter suport for MultipleOutputs.
(Alejandro Abdelnur via ddas)
HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
constructor. (TszWo (Nicholas), SZE via hairong)
HADOOP-3824. Move base functionality of StatusHttpServer to a core
package. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
may be processed by map/reduce. (Abdul Qadeer via cdouglas)
HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
(tomwhite via omalley)
HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
cdouglas)
HADOOP-3694. Improve unit test performance by changing
MiniDFSCluster to listen only on 127.0.0.1. (cutting)
HADOOP-3620. Namenode should synchronously resolve a datanode's network
location when the datanode registers. (hairong)
HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
benchmarks. (shv)
HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
HADOOP-3875. Change the time period between heartbeats to be relative to
the end of the heartbeat rpc, rather than the start. This causes better
behavior if the JobTracker is overloaded. (acmurthy via omalley)
HADOOP-3853. Move multiple input format (HADOOP-372) extension to
library package. (tomwhite via johan)
HADOOP-9. Use roulette scheduling for temporary space when the size
is not known. (Ari Rabkin via omalley)
HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
(Amareshwari Sriramadasu via omalley)
HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
via omalley)
HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
via omalley)
HADOOP-3780. Remove asynchronous resolution of network topology in the
JobTracker (Amar Kamat via omalley)
HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
error messages. (Steve Loughran via omalley)
HADOOP-3844. Include message of local exception in RPC client failures.
(Steve Loughran via omalley)
HADOOP-3935. Split out inner classes from DataNode.java. (johan)
HADOOP-3905. Create generic interfaces for edit log streams. (shv)
HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling. (cdouglas)
HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
generative javadoc for developers. (Sanjay Radia via omalley)
HADOOP-3944. Improve documentation for public TupleWritable class in
join package. (Chris Douglas via enis)
HADOOP-2330. Preallocate HDFS transaction log to improve performance.
(dhruba and hairong)
HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
Groshupf and Craig Macdonald via omalley)
HADOOP-3342. Change the kill task actions to require http post instead of
get to prevent accidental crawls from triggering it. (enis via omalley)
HADOOP-3937. Limit the job name in the job history filename to 50
characters. (Matei Zaharia via omalley)
HADOOP-3943. Remove unnecessary synchronization in
NetworkTopology.pseudoSortByDistance. (hairong via omalley)
HADOOP-3498. File globbing alternation should be able to span path
components. (tomwhite)
HADOOP-3361. Implement renames for NativeS3FileSystem.
(Albert Chern via tomwhite)
HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
unset. (Al Hoang via tomwhite)
HADOOP-4147. Remove unused class JobWithTaskContext from class
JobInProgress. (Amareshwari Sriramadasu via johan)
HADOOP-4151. Add a byte-comparable interface that both Text and
BytesWritable implement. (cdouglas via omalley)
HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
NamenodeProtocol. (shv via szetszwo)
HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
developing under git. (omalley)
HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
omalley)
HADOOP-4184. Break the module dependencies between core, hdfs, and
mapred. (tomwhite via omalley)
HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
(Ramya R via nigel)
HADOOP-4117. Improve configurability of Hadoop EC2 instances.
(tomwhite)
HADOOP-2411. Add support for larger CPU EC2 instance types.
(Chris K Wensel via tomwhite)
HADOOP-4083. Changed the configuration attribute queue.name to
mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
HADOOP-4194. Added the JobConf and JobID to job-related methods in
JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
HADOOP-3975. Change test-patch script to report working the dir
modifications preventing the suite from being run. (Ramya R via cdouglas)
HADOOP-4124. Added a command-line switch to allow users to set job
priorities, also allow it to be manipulated via the web-ui. (Hemanth
Yamijala via acmurthy)
HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
userlogs. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4062. Remove the synchronization on the output stream when a
connection is closed and also remove an undesirable exception when
a client is stoped while there is no pending RPC request. (hairong)
HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
(szetszwo)
HADOOP-4006. Clean up FSConstants and move some of the constants to
better places. (Sanjay Radia via rangadi)
HADOOP-4279. Trace the seeds of random sequences in append unit tests to
make itermitant failures reproducible. (szetszwo via cdouglas)
HADOOP-4209. Remove the change to the format of task attempt id by
incrementing the task attempt numbers by 1000 when the job restarts.
(Amar Kamat via omalley)
HADOOP-4301. Adds forrest doc for the skip bad records feature.
(Sharad Agarwal via ddas)
HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
(szetszwo)
HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
Yamijala via omalley)
HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
via cdouglas).
HADOOP-4438. Update forrest documentation to include missing FsShell
commands. (Suresh Srinivas via cdouglas)
HADOOP-4105. Add forrest documentation for libhdfs.
(Pete Wyckoff via cutting)
HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
OPTIMIZATIONS
HADOOP-3556. Removed lock contention in MD5Hash by changing the
singleton MessageDigester by an instance per Thread using
ThreadLocal. (Iv?n de Prado via omalley)
HADOOP-3328. When client is writing data to DFS, only the last
datanode in the pipeline needs to verify the checksum. Saves around
30% CPU on intermediate datanodes. (rangadi)
HADOOP-3863. Use a thread-local string encoder rather than a static one
that is protected by a lock. (acmurthy via omalley)
HADOOP-3864. Prevent the JobTracker from locking up when a job is being
initialized. (acmurthy via omalley)
HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
HADOOP-2130. Pipes submit job should have both blocking and non-blocking
versions. (acmurthy via omalley)
HADOOP-3769. Make the SampleMapper and SampleReducer from
GenericMRLoadGenerator public, so they can be used in other contexts.
(Lingyun Yang via omalley)
HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
it from a different .crc file. (Jothi Padmanabhan via ddas)
HADOOP-3638. Caches the iFile index files in memory to reduce seeks
(Jothi Padmanabhan via ddas)