blob: 1df6ffac74c1bc853bf19bed54ee3f9fd32b5f75 [file] [log] [blame]
Hadoop Change Log
Release 0.20.203.0 - unreleased
MAPREDUCE-1280. Update Eclipse plugin to the new eclipse.jdt API.
(Alex Kozlov via szetszwo)
HADOOP-7259. Contrib modules should include the build.properties from
the enclosing hadoop directory. (omalley)
HADOOP-7253. Update the default configuration to fix security audit log
and metrics2 property configuration warnings. (omalley)
HADOOP-7247. Update documentation to match current jar names. (omalley)
HADOOP-7246. Update the log4j configuration to match the EventCounter
package. (Luke Lu via omalley)
HADOOP-7143. Restore HadoopArchives. (Joep Rottinghuis via omalley)
MAPREDUCE-2316. Updated CapacityScheduler documentation. (acmurthy)
HADOOP-7243. Fix contrib unit tests missing dependencies. (omalley)
HADOOP-7190. Add metrics v1 back for backwards compatibility. (omalley)
MAPREDUCE-2360. Remove stripping of scheme, authority from submit dir in
support of viewfs. (cdouglas)
MAPREDUCE-2359 Use correct file system to access distributed cache objects.
(Krishna Ramachandran)
MAPREDUCE-2361. "Fix Distributed Cache is not adding files to class paths
correctly" - Drop the host/scheme/fragment from URI (cdouglas)
MAPREDUCE-2362. Fix unit-test failures: TestBadRecords (NPE due to
rearranged MapTask code) and TestTaskTrackerMemoryManager
(need hostname in output-string pattern). (Greg Roelofs, Krishna
Ramachandran)
HDFS-1729. Add statistics logging for better visibility into
startup time costs. (Matt Foley)
MAPREDUCE-2363. When a queue is built without any access rights we
explain the problem. (Richard King)
MAPREDUCE-1563. TaskDiagnosticInfo may be missed sometime. (Krishna
Ramachandran)
MAPREDUCE-2364. Don't hold the rjob lock while localizing resources. (ddas
via omalley)
MAPREDUCE-2365. New counters for FileInputFormat (BYTES_READ) and
FileOutputFormat (BYTES_WRITTEN).
New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize.
(Siddharth Seth)
HADOOP-7040. Change DiskErrorException to IOException (boryas)
HADOOP-7104. Remove unnecessary DNS reverse lookups from RPC layer
(kzhang)
MAPREDUCE-2366. Fix a problem where the task browser UI can't retrieve the
stdxxx printouts of streaming jobs that abend in the unix code, in
the common case where the containing job doesn't reuse JVM's.
(Richard King)
HADOOP-6977. Herriot daemon clients should vend statistics (cos)
HADOOP-6971. Clover build doesn't generate per-test coverage (cos)
HADOOP-6879. Provide SSH based (Jsch) remote execution API for system
tests. (cos)
HADOOP-7215. RPC clients must use network interface corresponding to
the host in the client's kerberos principal key. (suresh)
HDFS-1842. Change the layout version to -31 to disallow upgrade from
and to 0.21 release. (suresh)
HADOOP-7232. Fix Javadoc warnings. (omalley)
HADOOP-7258. The Gzip codec should not return null decompressors. (omalley)
Release 0.20.202.0 - unreleased
MAPREDUCE-2355. Add a configuration knob
mapreduce.tasktracker.outofband.heartbeat.damper that limits out of band
heartbeats (acmurthy)
MAPREDUCE-2356. Fix a race-condition that corrupted a task's state on the
JobTracker. (Luke Lu)
MAPREDUCE-2357. Always propagate IOExceptions that are thrown by
non-FileInputFormat. (Luke Lu)
HADOOP-7163. RPC handles SocketTimeOutException during SASL negotiation.
(ddas)
MAPREDUCE-2358. MapReduce assumes the default FileSystem is HDFS.
(Krishna Ramachandran)
MAPREDUCE-1904. Reducing locking contention in TaskTracker's
MapOutputServlet LocalDirAllocator. (Rajesh Balamohan via acmurthy)
HDFS-1626. Make BLOCK_INVALIDATE_LIMIT configurable. (szetszwo)
HDFS-1584. Adds a check for whether relogin is needed to
getDelegationToken in HftpFileSystem. (Kan Zhang via ddas)
HADOOP-7115. Reduces the number of calls to getpwuid_r and
getpwgid_r, by implementing a cache in NativeIO. (ddas)
HADOOP-6882. An XSS security exploit in jetty-6.1.14. jetty upgraded to
6.1.26. (ddas)
MAPREDUCE-2278. Fixes a memory leak in the TaskTracker. (cdouglas)
HDFS-1353 redux. Modulate original 1353 to not bump RPC version.
(jhoman)
MAPREDUCE-2082 Race condition in writing the jobtoken password file when
launching pipes jobs (jitendra and ddas)
HADOOP-6978. Fixes task log servlet vulnerabilities via symlinks.
(Todd Lipcon and Devaraj Das)
MAPREDUCE-2178. Write task initialization to avoid race
conditions leading to privilege escalation and resource leakage by
performing more actiions as the user. (Owen O'Malley, Devaraj Das,
Chris Douglas via cdouglas)
HDFS-1364. HFTP client should support relogin from keytab
HADOOP-6907. Make RPC client to use per-proxy configuration.
(Kan Zhang via ddas)
MAPREDUCE-2055. Fix JobTracker to decouple job retirement from copy of
job-history file to HDFS and enhance RetiredJobInfo to carry aggregated
job-counters to prevent a disk roundtrip on job-completion to fetch
counters for the JobClient. (Krishna Ramachandran via acmurthy)
HDFS-1353. Remove most of getBlockLocation optimization (jghoman)
MAPREDUCE-2023. TestDFSIO read test may not read specified bytes. (htang)
HDFS-1340. A null delegation token is appended to the url if security is
disabled when browsing filesystem.(boryas)
HDFS-1352. Fix jsvc.location. (jghoman)
HADOOP-6860. 'compile-fault-inject' should never be called directly. (cos)
MAPREDUCE-2005. TestDelegationTokenRenewal fails (boryas)
MAPREDUCE-2000. Rumen is not able to extract counters for Job history logs
from Hadoop 0.20. (htang)
MAPREDUCE-1961. ConcurrentModificationException when shutting down Gridmix.
(htang)
HADOOP-6899. RawLocalFileSystem set working directory does
not work for relative names. (suresh)
HDFS-495. New clients should be able to take over files lease if the old
client died. (shv)
HADOOP-6728. Re-design and overhaul of the Metrics framework. (Luke Lu via
acmurthy)
MAPREDUCE-1966. Change blacklisting of tasktrackers on task failures to be
a simple graylist to fingerpoint bad tasktrackers. (Greg Roelofs via
acmurthy)
HADOOP-6864. Add ability to get netgroups (as returned by getent
netgroup command) using native code (JNI) instead of forking. (Erik Steffl)
HDFS-1318. HDFS Namenode and Datanode WebUI information needs to be
accessible programmatically for scripts. (Tanping Wang via suresh)
HDFS-1315. Add fsck event to audit log and remove other audit log events
corresponding to FSCK listStatus and open calls. (suresh)
MAPREDUCE-1941. Provides access to JobHistory file (raw) with job user/acl
permission. (Srikanth Sundarrajan via ddas)
MAPREDUCE-291. Optionally a separate daemon should serve JobHistory.
(Srikanth Sundarrajan via ddas)
MAPREDUCE-1936. Make Gridmix3 more customizable (sync changes from trunk).
(htang)
HADOOP-5981. Fix variable substitution during parsing of child environment
variables. (Krishna Ramachandran via acmurthy)
MAPREDUCE-339. Greedily schedule failed tasks to cause early job failure.
(cdouglas)
MAPREDUCE-1872. Hardened CapacityScheduler to have comprehensive, coherent
limits on tasks/jobs for jobs/users/queues. Also, added the ability to
refresh queue definitions without the need to restart the JobTracker.
(acmurthy)
HDFS-1161. Make DN minimum valid volumes configurable. (shv)
HDFS-457. Reintroduce volume failure tolerance for DataNodes. (shv)
HDFS-1307 Add start time, end time and total time taken for FSCK
to FSCK report. (suresh)
MAPREDUCE-1207. Sanitize user environment of map/reduce tasks and allow
admins to set environment and java options. (Krishna Ramachandran via
acmurthy)
HDFS-1298 - Add support in HDFS for new statistics added in FileSystem
to track the file system operations (suresh)
HDFS-1301. TestHDFSProxy need to use server side conf for ProxyUser
stuff.(boryas)
HADOOP-6859 - Introduce additional statistics to FileSystem to track
file system operations (suresh)
HADOOP-6818. Provides a JNI implementation of Unix Group resolution. The
config hadoop.security.group.mapping should be set to
org.apache.hadoop.security.JniBasedUnixGroupsMapping to enable this
implementation. (ddas)
MAPREDUCE-1938. Introduces a configuration for putting user classes before
the system classes during job submission and in task launches. Two things
need to be done in order to use this feature -
(1) mapreduce.user.classpath.first : this should be set to true in the
jobconf, and, (2) HADOOP_USER_CLASSPATH_FIRST : this is relevant for job
submissions done using bin/hadoop shell script. HADOOP_USER_CLASSPATH_FIRST
should be defined in the environment with some non-empty value
(like "true"), and then bin/hadoop should be executed. (ddas)
HADOOP-6669. Respect compression configuration when creating DefaultCodec
compressors. (Koji Noguchi via cdouglas)
HADOOP-6855. Add support for netgroups, as returned by command
getent netgroup. (Erik Steffl)
HDFS-599. Allow NameNode to have a seprate port for service requests from
client requests. (Dmytro Molkov via hairong)
HDFS-132. Fix namenode to not report files deleted metrics for deletions
done while replaying edits during startup. (shv)
MAPREDUCE-1521. Protection against incorrectly configured reduces
(mahadev)
MAPREDUCE-1936. Make Gridmix3 more customizable. (htang)
MAPREDUCE-517. Enhance the CapacityScheduler to assign multiple tasks
per-heartbeat. (acmurthy)
MAPREDUCE-323. Re-factor layout of JobHistory files on HDFS to improve
operability. (Dick King via acmurthy)
MAPREDUCE-1921. Ensure exceptions during reading of input data in map
tasks are augmented by information about actual input file which caused
the exception. (Krishna Ramachandran via acmurthy)
MAPREDUCE-1118. Enhance the JobTracker web-ui to ensure tabular columns
are sortable, also added a /scheduler servlet to CapacityScheduler for
enhanced UI for queue information. (Krishna Ramachandran via acmurthy)
HADOOP-5913. Add support for starting/stopping queues. (cdouglas)
HADOOP-6835. Add decode support for concatenated gzip files. (Greg Roelofs)
HDFS-1158. Revert HDFS-457. (shv)
MAPREDUCE-1699. Ensure JobHistory isn't disabled for any reason. (Krishna
Ramachandran via acmurthy)
MAPREDUCE-1682. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
MAPREDUCE-1914. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
HADOOP-6713. Multiple RPC Reader Threads (Bharathm)
HDFS-1250. Namenode should reject block reports and block received
requests from dead datanodes (suresh)
MAPREDUCE-1863. [Rumen] Null failedMapAttemptCDFs in job traces generated
by Rumen. (htang)
MAPREDUCE-1309. Rumen refactory. (htang)
HDFS-1114. Implement LightWeightGSet for BlocksMap in order to reduce
NameNode memory footprint. (szetszwo)
MAPREDUCE-572. Fixes DistributedCache.checkURIs to throw error if link is
missing for uri in cache archives. (amareshwari)
MAPREDUCE-787. Fix JobSubmitter to honor user given symlink in the path.
(amareshwari)
HADOOP-6815. refreshSuperUserGroupsConfiguration should use
server side configuration for the refresh( boryas)
MAPREDUCE-1868. Add a read and connection timeout to JobClient while
pulling tasklogs. (Krishna Ramachandran via acmurthy)
HDFS-1119. Introduce a GSet interface to BlocksMap. (szetszwo)
MAPREDUCE-1778. Ensure failure to setup CompletedJobStatusStore is not
silently ignored by the JobTracker. (Krishna Ramachandran via acmurthy)
MAPREDUCE-1538. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
acmurthy)
MAPREDUCE-1850. Add information about the host from which a job is
submitted. (Krishna Ramachandran via acmurthy)
HDFS-1110. Reuses objects for commonly used file names in namenode to
reduce the heap usage. (suresh)
HADOOP-6810. Extract a subset of tests for smoke (DOA) validation. (cos)
HADOOP-6642. Remove debug stmt left from original patch. (cdouglas)
HADOOP-6808. Add comments on how to setup File/Ganglia Context for
kerberos metrics (Erik Steffl)
HDFS-1061. INodeFile memory optimization. (bharathm)
HDFS-1109. HFTP supports filenames that contains the character "+".
(Dmytro Molkov via dhruba, backported by szetszwo)
HDFS-1085. Check file length and bytes read when reading a file through
hftp in order to detect failure. (szetszwo)
HDFS-1311. Running tests with 'testcase' cause triple execution of the
same test case (cos)
HDFS-1150.FIX. Verify datanodes' identities to clients in secure clusters.
Update to patch to improve handling of jsvc source in build.xml (jghoman)
HADOOP-6752. Remote cluster control functionality needs JavaDocs
improvement. (Balaji Rajagopalan via cos)
MAPREDUCE-1288. Fixes TrackerDistributedCacheManager to take into account
the owner of the localized file in the mapping from cache URIs to
CacheStatus objects. (ddas)
MAPREDUCE-1682. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
MAPREDUCE-1914. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
MAPREDUCE-1538. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
acmurthy)
MAPREDUCE-1900. Fixes a FS leak that i missed in the earlier patch.
(ddas)
MAPREDUCE-1900. Makes JobTracker/TaskTracker close filesystems, created
on behalf of users, when they are no longer needed. (ddas)
HADOOP-6832. Add a static user plugin for web auth for external users.
(omalley)
HDFS-1007. Fixes a bug in SecurityUtil.buildDTServiceName to do
with handling of null hostname. (omalley)
HDFS-1007. makes long running servers using hftp work. Also has some
refactoring in the MR code to do with handling of delegation tokens.
(omalley & ddas)
HDFS-1178. The NameNode servlets should not use RPC to connect to the
NameNode. (omalley)
MAPREDUCE-1807. Re-factor TestQueueManager. (Richard King via acmurthy)
HDFS-1150. Fixes the earlier patch to do logging in the right directory
and also adds facility for monitoring processes (via -Dprocname in the
command line). (Jakob Homan via ddas)
HADOOP-6781. security audit log shouldn't have exception in it. (boryas)
HADOOP-6776. Fixes the javadoc in UGI.createProxyUser. (ddas)
HDFS-1150. building jsvc from source tar. source tar is also checked in.
(jitendra)
HDFS-1150. Bugfix in the hadoop shell script. (ddas)
HDFS-1153. The navigation to /dfsnodelist.jsp with invalid input
parameters produces NPE and HTTP 500 error (rphulari)
MAPREDUCE-1664. Bugfix to enable queue administrators of a queue to
view job details of jobs submitted to that queue even though they
are not part of acl-view-job.
HDFS-1150. Bugfix to add more knobs to secure datanode starter.
HDFS-1157. Modifications introduced by HDFS-1150 are breaking aspect's
bindings (cos)
HDFS-1130. Adds a configuration dfs.cluster.administrators for
controlling access to the default servlets in hdfs. (ddas)
HADOOP-6706.FIX. Relogin behavior for RPC clients could be improved
(boryas)
HDFS-1150. Verify datanodes' identities to clients in secure clusters.
(jghoman)
MAPREDUCE-1442. Fixed regex in job-history related to parsing Counter
values. (Luke Lu via acmurthy)
HADOOP-6760. WebServer shouldn't increase port number in case of negative
port setting caused by Jetty's race. (cos)
HDFS-1146. Javadoc for getDelegationTokenSecretManager in FSNamesystem.
(jitendra)
HADOOP-6706. Fix on top of the earlier patch. Closes the connection
on a SASL connection failure, and retries again with a new
connection. (ddas)
MAPREDUCE-1716. Fix on top of earlier patch for logs truncation a.k.a
MAPREDUCE-1100. Addresses log truncation issues when binary data is
written to log files and adds a header to a truncated log file to
inform users of the done trucation.
HDFS-1383. Improve the error messages when using hftp://.
MAPREDUCE-1744. Fixed DistributedCache apis to take a user-supplied
FileSystem to allow for better proxy behaviour for Oozie. (Richard King)
MAPREDUCE-1733. Authentication between pipes processes and java
counterparts. (jitendra)
MAPREDUCE-1664. Bugfix on top of the previous patch. (ddas)
HDFS-1136. FileChecksumServlets.RedirectServlet doesn't carry forward
the delegation token (boryas)
HADOOP-6756. Change value of FS_DEFAULT_NAME_KEY from fs.defaultFS
to fs.default.name which is a correct name for 0.20 (steffl)
HADOOP-6756. Document (javadoc comments) and cleanup configuration
keys in CommonConfigurationKeys.java (steffl)
MAPREDUCE-1759. Exception message for unauthorized user doing killJob,
killTask, setJobPriority needs to be improved. (gravi via vinodkv)
HADOOP-6715. AccessControlList.toString() returns empty string when
we set acl to "*". (gravi via vinodkv)
HADOOP-6757. NullPointerException for hadoop clients launched from
streaming tasks. (amarrk via vinodkv)
HADOOP-6631. FileUtil.fullyDelete() should continue to delete other files
despite failure at any level. (vinodkv)
MAPREDUCE-1317. NPE in setHostName in Rumen. (rksingh)
MAPREDUCE-1754. Replace mapred.persmissions.supergroup with an acl :
mapreduce.cluster.administrators and HADOOP-6748.: Remove
hadoop.cluster.administrators. Contributed by Amareshwari Sriramadasu.
HADOOP-6701. Incorrect exit codes for "dfs -chown", "dfs -chgrp"
(rphulari)
HADOOP-6640. FileSystem.get() does RPC retires within a static
synchronized block. (hairong)
HDFS-1006. Removes unnecessary logins from the previous patch. (ddas)
HADOOP-6745. adding some java doc to Server.RpcMetrics, UGI (boryas)
MAPREDUCE-1707. TaskRunner can get NPE in getting ugi from TaskTracker.
(vinodkv)
HDFS-1104. Fsck triggers full GC on NameNode. (hairong)
HADOOP-6332. Large-scale Automated Test Framework (sharad, Sreekanth
Ramakrishnan, at all via cos)
HADOOP-6526. Additional fix for test context on top of existing one. (cos)
HADOOP-6710. Symbolic umask for file creation is not conformant with posix.
(suresh)
HADOOP-6693. Added metrics to track kerberos login success and failure.
(suresh)
MAPREDUCE-1711. Gridmix should provide an option to submit jobs to the same
queues as specified in the trace. (rksing via htang)
MAPREDUCE-1687. Stress submission policy does not always stress the
cluster. (htang)
MAPREDUCE-1641. Bug-fix to ensure command line options such as
-files/-archives are checked for duplicate artifacts in the
DistributedCache. (Amareshwari Sreeramadasu via acmurthy)
MAPREDUCE-1641. Fix DistributedCache to ensure same files cannot be put in
both the archives and files sections. (Richard King via acmurthy)
HADOOP-6670. Fixes a testcase issue introduced by the earlier commit
of the HADOOP-6670 patch. (ddas)
MAPREDUCE-1718. Fixes a problem to do with correctly constructing
service name for the delegation token lookup in HftpFileSystem
(borya via ddas)
HADOOP-6674. Fixes the earlier patch to handle pings correctly (ddas).
MAPREDUCE-1664. Job Acls affect when Queue Acls are set.
(Ravi Gummadi via vinodkv)
HADOOP-6718. Fixes a problem to do with clients not closing RPC
connections on a SASL failure. (ddas)
MAPREDUCE-1397. NullPointerException observed during task failures.
(Amareshwari Sriramadasu via vinodkv)
HADOOP-6670. Use the UserGroupInformation's Subject as the criteria for
equals and hashCode. (omalley)
HADOOP-6716. System won't start in non-secure mode when kerb5.conf
(edu.mit.kerberos on Mac) is not present. (boryas)
MAPREDUCE-1607. Task controller may not set permissions for a
task cleanup attempt's log directory. (Amareshwari Sreeramadasu via
vinodkv)
MAPREDUCE-1533. JobTracker performance enhancements. (Amar Kamat via
vinodkv)
MAPREDUCE-1701. AccessControlException while renewing a delegation token
in not correctly handled in the JobTracker. (boryas)
HDFS-481. Incremental patch to fix broken unit test in contrib/hdfsproxy
HADOOP-6706. Fixes a bug in the earlier version of the same patch (ddas)
HDFS-1096. allow dfsadmin/mradmin refresh of superuser proxy group
mappings(boryas).
HDFS-1012. Support for cluster specific path entries in ldap for hdfsproxy
(Srikanth Sundarrajan via Nicholas)
HDFS-1011. Improve Logging in HDFSProxy to include cluster name associated
with the request (Srikanth Sundarrajan via Nicholas)
HDFS-1010. Retrieve group information from UnixUserGroupInformation
instead of LdapEntry (Srikanth Sundarrajan via Nicholas)
HDFS-481. Bug fix - hdfsproxy: Stack overflow + Race conditions
(Srikanth Sundarrajan via Nicholas)
MAPREDUCE-1657. After task logs directory is deleted, tasklog servlet
displays wrong error message about job ACLs. (Ravi Gummadi via vinodkv)
MAPREDUCE-1692. Remove TestStreamedMerge from the streaming tests.
(Amareshwari Sriramadasu and Sreekanth Ramakrishnan via vinodkv)
HDFS-1081. Performance regression in
DistributedFileSystem::getFileBlockLocations in secure systems (jhoman)
MAPREDUCE-1656. JobStory should provide queue info. (htang)
MAPREDUCE-1317. Reducing memory consumption of rumen objects. (htang)
MAPREDUCE-1317. Reverting the patch since it caused build failures. (htang)
MAPREDUCE-1683. Fixed jobtracker web-ui to correctly display heap-usage.
(acmurthy)
HADOOP-6706. Fixes exception handling for saslConnect. The ideal
solution is to the Refreshable interface but as Owen noted in
HADOOP-6656, it doesn't seem to work as expected. (ddas)
MAPREDUCE-1617. TestBadRecords failed once in our test runs. (Amar
Kamat via vinodkv).
MAPREDUCE-587. Stream test TestStreamingExitStatus fails with Out of
Memory. (Amar Kamat via vinodkv).
HDFS-1096. Reverting the patch since it caused build failures. (ddas)
MAPREDUCE-1317. Reducing memory consumption of rumen objects. (htang)
MAPREDUCE-1680. Add a metric to track number of heartbeats processed by the
JobTracker. (Richard King via acmurthy)
MAPREDUCE-1683. Removes JNI calls to get jvm current/max heap usage in
ClusterStatus by default. (acmurthy)
HADOOP-6687. user object in the subject in UGI should be reused in case
of a relogin. (jitendra)
HADOOP-5647. TestJobHistory fails if /tmp/_logs is not writable to.
Testcase should not depend on /tmp. (Ravi Gummadi via vinodkv)
MAPREDUCE-181. Bug fix for Secure job submission. (Ravi Gummadi via
vinodkv)
MAPREDUCE-1635. ResourceEstimator does not work after MAPREDUCE-842.
(Amareshwari Sriramadasu via vinodkv)
MAPREDUCE-1526. Cache the job related information while submitting the
job. (rksingh)
HADOOP-6674. Turn off SASL checksums for RPCs. (jitendra via omalley)
HADOOP-5958. Replace fork of DF with library call. (cdouglas via omalley)
HDFS-999. Secondary namenode should login using kerberos if security
is configured. Bugfix to original patch. (jhoman)
MAPREDUCE-1594. Support for SleepJobs in Gridmix (rksingh)
HDFS-1007. Fix. ServiceName for delegation token for Hftp has hftp
port and not RPC port.
MAPREDUCE-1376. Support for varied user submissions in Gridmix (rksingh)
HDFS-1080. SecondaryNameNode image transfer should use the defined
http address rather than local ip address (jhoman)
HADOOP-6661. User document for UserGroupInformation.doAs for secure
impersonation. (jitendra)
MAPREDUCE-1624. Documents the job credentials and associated details
to do with delegation tokens (ddas)
HDFS-1036. Documentation for fetchdt for forrest (boryas)
HDFS-1039. New patch on top of previous patch. Gets namenode address
from conf. (jitendra)
HADOOP-6656. Renew Kerberos TGT when 80% of the renew lifetime has been
used up. (omalley)
HADOOP-6653. Protect against NPE in setupSaslConnection when real user is
null. (omalley)
HADOOP-6649. An error in the previous committed patch. (jitendra)
HADOOP-6652. ShellBasedUnixGroupsMapping shouldn't have a cache.
(ddas)
HADOOP-6649. login object in UGI should be inside the subject
(jitendra)
HADOOP-6637. Benchmark overhead of RPC session establishment
(shv via jitendra)
HADOOP-6648. Credentials must ignore null tokens that can be generated
when using HFTP to talk to insecure clusters. (omalley)
HADOOP-6632. Fix on JobTracker to reuse filesystem handles if possible.
(ddas)
HADOOP-6647. balancer fails with "is not authorized for protocol
interface NamenodeProtocol" in secure environment (boryas)
MAPREDUCE-1612. job conf file is not accessible from job history
web page. (Ravi Gummadi via vinodkv)
MAPREDUCE-1611. Refresh nodes and refresh queues doesnt work with
service authorization enabled. (Amar Kamat via vinodkv)
HADOOP-6644. util.Shell getGROUPS_FOR_USER_COMMAND method
name - should use common naming convention (boryas)
MAPREDUCE-1609. Fixes a problem with localization of job log
directories when tasktracker is re-initialized that can result
in failed tasks. (Amareshwari Sriramadasu via yhemanth)
MAPREDUCE-1610. Update forrest documentation for directory
structure of localized files. (Ravi Gummadi via yhemanth)
MAPREDUCE-1532. Fixes a javadoc and an exception message in JobInProgress
when the authenticated user is different from the user in conf. (ddas)
MAPREDUCE-1417. Update forrest documentation for private
and public distributed cache files. (Ravi Gummadi via yhemanth)
HADOOP-6634. AccessControlList uses full-principal names to verify acls
causing queue-acls to fail (vinodkv)
HADOOP-6642. Fix javac, javadoc, findbugs warnings. (chrisdo via acmurthy)
HDFS-1044. Cannot submit mapreduce job from secure client to
unsecure sever. (boryas)
HADOOP-6638. try to relogin in a case of failed RPC connection
(expired tgt) only in case the subject is loginUser or
proxyUgi.realUser. (boryas)
HADOOP-6632. Support for using different Kerberos keys for different
instances of Hadoop services. (jitendra)
HADOOP-6526. Need mapping from long principal names to local OS
user names. (jitendra)
MAPREDUCE-1604. Update Forrest documentation for job authorization
ACLs. (Amareshwari Sriramadasu via yhemanth)
HDFS-1045. In secure clusters, re-login is necessary for https
clients before opening connections (jhoman)
HADOOP-6603. Addition to original patch to be explicit
about new method not being for general use. (jhoman)
MAPREDUCE-1543. Add audit log messages for job and queue
access control checks. (Amar Kamat via yhemanth)
MAPREDUCE-1606. Fixed occassinal timeout in TestJobACL. (Ravi Gummadi via
acmurthy)
HADOOP-6633. normalize property names for JT/NN kerberos principal
names in configuration. (boryas)
HADOOP-6613. Changes the RPC server so that version is checked first
on an incoming connection. (Kan Zhang via ddas)
HADOOP-5592. Fix typo in Streaming doc in reference to GzipCodec.
(Corinne Chandel via tomwhite)
MAPREDUCE-813. Updates Streaming and M/R tutorial documents.
(Corinne Chandel via ddas)
MAPREDUCE-927. Cleanup of task-logs should happen in TaskTracker instead
of the Child. (Amareshwari Sriramadasu via vinodkv)
HDFS-1039. Service should be set in the token in JspHelper.getUGI.
(jitendra)
MAPREDUCE-1599. MRBench reuses jobConf and credentials there in.
(jitendra)
MAPREDUCE-1522. FileInputFormat may use the default FileSystem for the
input path. (Tsz Wo (Nicholas), SZE via cdouglas)
HDFS-1036. In DelegationTokenFetch pass Configuration object so
getDefaultUri will work correctly.
HDFS-1038. In nn_browsedfscontent.jsp fetch delegation token only if
security is enabled. (jitendra)
HDFS-1036. in DelegationTokenFetch dfs.getURI returns no port (boryas)
HADOOP-6598. Verbose logging from the Group class (one more case)
(boryas)
HADOOP-6627. Bad Connection to FS" message in FSShell should print
message from the exception (boryas)
HDFS-1033. In secure clusters, NN and SNN should verify that the remote
principal during image and edits transfer (jhoman)
HDFS-1005. Fixes a bug to do with calling the cross-realm API in Fsck
client. (ddas)
MAPREDUCE-1422. Fix cleanup of localized job directory to work if files
with non-deletable permissions are created within it.
(Amar Kamat via yhemanth)
HDFS-1007. Fixes bugs to do with 20S cluster talking to 20 over
hftp (borya)
MAPREDUCE:1566. Fixes bugs in the earlier patch. (ddas)
HDFS-992. A bug in backport for HDFS-992. (jitendra)
HADOOP-6598. Remove verbose logging from the Groups class. (borya)
HADOOP-6620. NPE if renewer is passed as null in getDelegationToken.
(jitendra)
HDFS-1023. Second Update to original patch to fix username (jhoman)
MAPREDUCE-1435. Add test cases to already committed patch for this
jira, synchronizing changes with trunk. (yhemanth)
HADOOP-6612. Protocols RefreshUserToGroupMappingsProtocol and
RefreshAuthorizationPolicyProtocol authorization settings thru
KerberosInfo (boryas)
MAPREDUCE-1566. Bugfix for tests on top of the earlier patch. (ddas)
MAPREDUCE-1566. Mechanism to import tokens and secrets from a file in to
the submitted job. (omalley)
HADOOP-6603. Provide workaround for issue with Kerberos not
resolving corss-realm principal. (kan via jhoman)
HDFS-1023. Update to original patch to fix username (jhoman)
HDFS-814. Add an api to get the visible length of a
DFSDataInputStream. (hairong)
HDFS-1023. Allow http server to start as regular user if https
principal is not defined. (jhoman)
HDFS-1022. Merge all three test specs files (common, hdfs, mapred)
into one. (steffl)
HDFS-101. DFS write pipeline: DFSClient sometimes does not detect
second datanode failure. (hairong)
HDFS-1015. Intermittent failure in TestSecurityTokenEditLog. (jitendra)
MAPREDUCE-1550. A bugfix on top of what was committed earlier (ddas).
MAPREDUCE-1155. DISABLING THE TestStreamingExitStatus temporarily. (ddas)
HDFS-1020. Changes the check for renewer from short name to long name
in the cancel/renew delegation token methods. (jitendra via ddas)
HDFS-1019. Fixes values of delegation token parameters in
hdfs-default.xml. (jitendra via ddas)
MAPREDUCE-1430. Fixes a backport issue with the earlier patch. (ddas)
MAPREDUCE-1559. Fixes a problem in DelegationTokenRenewal class to
do with using the right credentials when talking to the NameNode.(ddas)
MAPREDUCE-1550. Fixes a problem to do with creating a filesystem using
the user's UGI in the JobHistory browsing. (ddas)
HADOOP-6609. Fix UTF8 to use a thread local DataOutputBuffer instead of
a static that was causing a deadlock in RPC. (omalley)
HADOOP-6584. Fix javadoc warnings introduced by original HADOOP-6584
patch (jhoman)
HDFS-1017. browsedfs jsp should call JspHelper.getUGI rather than using
createRemoteUser(). (jhoman)
MAPREDUCE-899. Modified LinuxTaskController to check that task-controller
has right permissions and ownership before performing any actions.
(Amareshwari Sriramadasu via yhemanth)
HDFS-204. Revive number of files listed metrics. (hairong)
HADOOP-6569. FsShell#cat should avoid calling uneccessary getFileStatus
before opening a file to read. (hairong)
HDFS-1014. Error in reading delegation tokens from edit logs. (jitendra)
HDFS-458. Add under-10-min tests from 0.22 to 0.20.1xx, only the tests
that already exist in 0.20.1xx (steffl)
MAPREDUCE-1155. Just pulls out the TestStreamingExitStatus part of the
patch from jira (that went to 0.22). (ddas)
HADOOP-6600. Fix for branch backport only. Comparing of user should use
equals. (boryas).
HDFS-1006. Fixes NameNode and SecondaryNameNode to use kerberizedSSL for
the http communication. (Jakob Homan via ddas)
HDFS-1007. Fixes a bug on top of the earlier patch. (ddas)
HDFS-1005. Fsck security. Makes it work over kerberized SSL (boryas and
jhoman)
HDFS-1007. Makes HFTP and Distcp use kerberized SSL. (ddas)
MAPREDUCE-1455. Fixes a testcase in the earlier patch.
(Ravi Gummadi via ddas)
HDFS-992. Refactors block access token implementation to conform to the
generic Token interface. (Kan Zhang via ddas)
HADOOP-6584. Adds KrbSSL connector for jetty. (Jakob Homan via ddas)
HADOOP-6589. Add a framework for better error messages when rpc connections
fail to authenticate. (Kan Zhang via omalley)
HADOOP-6600,HDFS-1003,MAPREDUCE-1539. mechanism for authorization check
for inter-server protocols(boryas)
HADOOP-6580,HDFS-993,MAPREDUCE-1516. UGI should contain authentication
method.
Namenode and JT should issue a delegation token only for kerberos
authenticated clients. (jitendra)
HDFS-984,HADOOP-6573,MAPREDUCE-1537. Delegation Tokens should be persisted
in Namenode, and corresponding changes in common and mr. (jitendra)
HDFS-994. Provide methods for obtaining delegation token from Namenode for
hftp and other uses. Incorporates HADOOP-6594: Update hdfs script to
provide fetchdt tool. (jitendra)
HADOOP-6586. Log authentication and authorization failures and successes
(boryas)
HDFS-991. Allow use of delegation tokens to authenticate to the
HDFS servlets. (omalley)
HADOOP-1849. Add undocumented configuration parameter for per handler
call queue size in IPC Server. (shv)
HADOOP-6599. Split existing RpcMetrics with summary in RpcMetrics and
details information in RpcDetailedMetrics. (suresh)
HDFS-985. HDFS should issue multiple RPCs for listing a large directory.
(hairong)
HDFS-1000. Updates libhdfs to use the new UGI. (ddas)
MAPREDUCE-1532. Ensures all filesystem operations at the client is done
as the job submitter. Also, changes the renewal to maintain list of tokens
to renew. (ddas)
HADOOP-6596. Add a version field to the seialization of the
AbstractDelegationTokenIdentifier. (omalley)
HADOOP-5561. Add javadoc.maxmemory to build.xml to allow larger memory.
(jkhoman via omalley)
HADOOP-6579. Add a mechanism for encoding and decoding Tokens in to
url-safe strings. (omalley)
MAPREDUCE-1354. Make incremental changes in jobtracker for
improving scalability (acmurthy)
HDFS-999.Secondary namenode should login using kerberos if security
is configured(boryas)
MAPREDUCE-1466. Added a private configuration variable
mapreduce.input.num.files, to store number of input files
being processed by M/R job. (Arun Murthy via yhemanth)
MAPREDUCE-1403. Save file-sizes of each of the artifacts in
DistributedCache in the JobConf (Arun Murthy via yhemanth)
HADOOP-6543. Fixes a compilation problem in the original commit. (ddas)
MAPREDUCE-1520. Moves a call to setWorkingDirectory in Child to within
a doAs block. (Amareshwari Sriramadasu via ddas)
HADOOP-6543. Allows secure clients to talk to unsecure clusters.
(Kan Zhang via ddas)
MAPREDUCE-1505. Delays construction of the job client until it is really
required. (Arun C Murthy via ddas)
HADOOP-6549. TestDoAsEffectiveUser should use ip address of the host
for superuser ip check. (jitendra)
HDFS-464. Fix memory leaks in libhdfs. (Christian Kunz via suresh)
HDFS-946. NameNode should not return full path name when lisitng a
diretory or getting the status of a file. (hairong)
MAPREDUCE-1398. Fix TaskLauncher to stop waiting for slots on a TIP
that is killed / failed. (Amareshwari Sriramadasu via yhemanth)
MAPREDUCE-1476. Fix the M/R framework to not call commit for special
tasks like job setup/cleanup and task cleanup.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-6467. Performance improvement for liststatus on directories in
hadoop archives. (mahadev)
HADOOP-6558. archive does not work with distcp -update. (nicholas via
mahadev)
HADOOP-6583. Captures authentication and authorization metrics. (ddas)
MAPREDUCE-1316. Fixes a memory leak of TaskInProgress instances in
the jobtracker. (Amar Kamat via yhemanth)
MAPREDUCE-670. Creates ant target for 10 mins patch test build.
(Jothi Padmanabhan via gkesavan)
MAPREDUCE-1430. JobTracker should be able to renew delegation tokens
for the jobs(boryas)
HADOOP-6551, HDFS-986, MAPREDUCE-1503. Change API for tokens to throw
exceptions instead of returning booleans. (omalley)
HADOOP-6545. Changes the Key for the FileSystem to be UGI. (ddas)
HADOOP-6572. Makes sure that SASL encryption and push to responder queue
for the RPC response happens atomically. (Kan Zhang via ddas)
HDFS-965. Split the HDFS TestDelegationToken into two tests, of which
one proxy users and the other normal users. (jitendra via omalley)
HADOOP-6560. HarFileSystem throws NPE for har://hdfs-/foo (nicholas via
mahadev)
MAPREDUCE-686. Move TestSpeculativeExecution.Fake* into a separate class
so that it can be used by other tests. (Jothi Padmanabhan via sharad)
MAPREDUCE-181. Fixes an issue in the use of the right config. (ddas)
MAPREDUCE-1026. Fixes a bug in the backport. (ddas)
HADOOP-6559. Makes the RPC client automatically re-login when the SASL
connection setup fails. This is applicable to only keytab based logins.
(ddas)
HADOOP-2141. Backport changes made in the original JIRA to aid
fast unit tests in Map/Reduce. (Amar Kamat via yhemanth)
HADOOP-6382. Import the mavenizable pom file structure and adjust
the build targets and bin scripts. (gkesvan via ltucker)
MAPREDUCE-1425. archive throws OutOfMemoryError (mahadev)
MAPREDUCE-1399. The archive command shows a null error message. (nicholas)
HADOOP-6552. Puts renewTGT=true and useTicketCache=true for the keytab
kerberos options. (ddas)
MAPREDUCE-1433. Adds delegation token for MapReduce (ddas)
HADOOP-4359. Fixes a bug in the earlier backport. (ddas)
HADOOP-6547, HDFS-949, MAPREDUCE-1470. Move Delegation token into Common
so that we can use it for MapReduce also. It is a combined patch for
common, hdfs and mr. (jitendra)
HADOOP-6510,HDFS-935,MAPREDUCE-1464. Support for doAs to allow
authenticated superuser to impersonate proxy users. It is a combined
patch with compatible fixes in HDFS and MR. (jitendra)
MAPREDUCE-1435. Fixes the way symlinks are handled when cleaning up
work directory files. (Ravi Gummadi via yhemanth)
MAPREDUCE-6419. Fixes a bug in the backported patch. (ddas)
MAPREDUCE-1457. Fixes JobTracker to get the FileSystem object within
getStagingAreaDir within a privileged block. Fixes Child.java to use the
appropriate UGIs while getting the TaskUmbilicalProtocol proxy and while
executing the task. Contributed by Jakob Homan. (ddas)
MAPREDUCE-1440. Replace the long user name in MapReduce with the local
name. (ddas)
HADOOP-6419. Adds SASL based authentication to RPC. Also includes the
MAPREDUCE-1335 and HDFS-933 patches. Contributed by Kan Zhang.
(ddas)
HADOOP-6538. Sets hadoop.security.authentication to simple by default.
(ddas)
HDFS-938. Replace calls to UGI.getUserName() with
UGI.getShortUserName()(boryas)
HADOOP-6544. fix ivy settings to include JSON jackson.codehause.org
libs for .20 (boryas)
HDFS-907. Add tests for getBlockLocations and totalLoad metrics. (rphulari)
HADOOP-6204. Implementing aspects development and fault injeciton
framework for Hadoop (cos)
MAPREDUCE-1432. Adds hooks in the jobtracker and tasktracker
for loading the tokens in the user's ugi. This is required for
the copying of files from the hdfs. (Devaraj Das vi boryas)
MAPREDUCE-1383. Automates fetching of delegation tokens in File*Formats
Distributed Cache and Distcp. Also, provides a config
mapreduce.job.hdfs-servers that the jobs can populate with a comma
separated list of namenodes. The job client automatically fetches
delegation tokens from those namenodes.
HADOOP-6337. Update FilterInitializer class to be more visible
and take a conf for further development. (jhoman)
HADOOP-6520. UGI should load tokens from the environment. (jitendra)
HADOOP-6517, HADOOP-6518. Ability to add/get tokens from
UserGroupInformation & Kerberos login in UGI should honor KRB5CCNAME
(jitendra)
HADOOP-6299. Reimplement the UserGroupInformation to use the OS
specific and Kerberos JAAS login. (jhoman, ddas, oom)
HADOOP-6524. Contrib tests are failing Clover'ed build. (cos)
MAPREDUCE-842. Fixing a bug in the earlier version of the patch
related to improper localization of the job token file.
(Ravi Gummadi via yhemanth)
HDFS-919. Create test to validate the BlocksVerified metric (Gary Murry
via cos)
MAPREDUCE-1186. Modified code in distributed cache to set
permissions only on required set of localized paths.
(Amareshwari Sriramadasu via yhemanth)
HDFS-899. Delegation Token Implementation. (Jitendra Nath Pandey)
MAPREDUCE-896. Enhance tasktracker to cleanup files that might have
been created by user tasks with non-writable permissions.
(Ravi Gummadi via yhemanth)
HADOOP-5879. Read compression level and strategy from Configuration for
gzip compression. (He Yongqiang via cdouglas)
HADOOP-6161. Add get/setEnum methods to Configuration. (cdouglas)
HADOOP-6382 Mavenize the build.xml targets and update the bin scripts
in preparation for publishing POM files (giri kesavan via ltucker)
HDFS-737. Add full path name of the file to the block information and
summary of total number of files, blocks, live and deadnodes to
metasave output. (Jitendra Nath Pandey via suresh)
HADOOP-6577. Add hidden configuration option "ipc.server.max.response.size"
to change the default 1 MB, the maximum size when large IPC handler
response buffer is reset. (suresh)
HADOOP-6521. Fix backward compatiblity issue with umask when applications
use deprecated param dfs.umask in configuration or use
FsPermission.setUMask(). (suresh)
HDFS-737. Add full path name of the file to the block information and
summary of total number of files, blocks, live and deadnodes to
metasave output. (Jitendra Nath Pandey via suresh)
HADOOP-6521. Fix backward compatiblity issue with umask when applications
use deprecated param dfs.umask in configuration or use
FsPermission.setUMask(). (suresh)
MAPREDUCE-433. Use more reliable counters in TestReduceFetch.
(Christopher Douglas via ddas)
MAPREDUCE-744. Introduces the notion of a public distributed cache.
(ddas)
MAPREDUCE-1140. Fix DistributedCache to not decrement reference counts
for unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
MAPREDUCE-1284. Fix fts_open() call in task-controller that was failing
LinuxTaskController unit tests. (Ravi Gummadi via yhemanth)
MAPREDUCE-1098. Fixed the distributed-cache to not do i/o while
holding a global lock.
(Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-1338. Introduces the notion of token cache using which
tokens and secrets can be sent by the Job client to the JobTracker.
(Boris Shkolnik)
HADOOP-6495. Identifier should be serialized after the password is created
In Token constructor. (Jitendra Nath Pandey)
HADOOP-6506. Failing tests prevent the rest of test targets from
execution. (cos)
HADOOP-5457. Fix to continue to run builds even if contrib test fails.
(gkesavan)
MAPREDUCE-856. Setup secure permissions for distributed cache files.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-871. Fix ownership of Job/Task local files to have correct
group ownership according to the egid of the tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-476. Extend DistributedCache to work locally (LocalJobRunner).
(Philip Zeyliger via tomwhite)
MAPREDUCE-711. Removed Distributed Cache from Common, to move it under
Map/Reduce. (Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-478. Allow map and reduce jvm parameters, environment
variables and ulimit to be set separately. (acmurthy)
MAPREDUCE-842. Setup secure permissions for localized job files,
intermediate outputs and log files on tasktrackers.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-408. Fixes an assertion problem in TestKillSubProcesses.
(Ravi Gummadi via ddas)
HADOOP-4041. IsolationRunner does not work as documented.
(Philip Zeyliger via tomwhite)
MAPREDUCE-181. Changes the job submission process to be secure.
(Devaraj Das)
HADOOP-5737. Fixes a problem in the way the JobTracker used to talk to
other daemons like the NameNode to get the job's files. Also adds APIs
in the JobTracker to get the FileSystem objects as per the JobTracker's
configuration. (Amar Kamat via ddas)
HADOOP-5771. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4656, HDFS-685, MAPREDUCE-1083. Use the user-to-groups mapping
service in the NameNode and JobTracker. Combined patch for these 3 jiras
otherwise tests fail. (Jitendra Nath Pandey)
MAPREDUCE-1250. Refactor job token to use a common token interface.
(Jitendra Nath Pandey)
MAPREDUCE-1026. Shuffle should be secure. (Jitendra Nath Pandey)
HADOOP-4268. Permission checking in fsck. (Jitendra Nath Pandey)
HADOOP-6415. Adding a common token interface for both job token and
delegation token. (Jitendra Nath Pandey)
HADOOP-6367, HDFS-764. Moving Access Token implementation from Common to
HDFS. These two jiras must be committed together otherwise build will
fail. (Jitendra Nath Pandey)
HDFS-409. Add more access token tests
(Jitendra Nath Pandey)
HADOOP-6132. RPC client opens an extra connection for VersionedProtocol.
(Jitendra Nath Pandey)
HDFS-445. pread() fails when cached block locations are no longer valid.
(Jitendra Nath Pandey)
HDFS-195. Need to handle access token expiration when re-establishing the
pipeline for dfs write. (Jitendra Nath Pandey)
HADOOP-6176. Adding a couple private methods to AccessTokenHandler
for testing purposes. (Jitendra Nath Pandey)
HADOOP-5824. remove OP_READ_METADATA functionality from Datanode.
(Jitendra Nath Pandey)
HADOOP-4359. Access Token: Support for data access authorization
checking on DataNodes. (Jitendra Nath Pandey)
MAPREDUCE-1372. Fixed a ConcurrentModificationException in jobtracker.
(Arun C Murthy via yhemanth)
MAPREDUCE-1316. Fix jobs' retirement from the JobTracker to prevent memory
leaks via stale references. (Amar Kamat via acmurthy)
MAPREDUCE-1342. Fixed deadlock in global blacklisting of tasktrackers.
(Amareshwari Sriramadasu via acmurthy)
HADOOP-6460. Reinitializes buffers used for serializing responses in ipc
server on exceeding maximum response size to free up Java heap. (suresh)
MAPREDUCE-1100. Truncate user logs to prevent TaskTrackers' disks from
filling up. (Vinod Kumar Vavilapalli via acmurthy)
MAPREDUCE-1143. Fix running task counters to be updated correctly
when speculative attempts are running for a TIP.
(Rahul Kumar Singh via yhemanth)
HADOOP-6151, 6281, 6285, 6441. Add HTML quoting of the parameters to all
of the servlets to prevent XSS attacks. (omalley)
MAPREDUCE-896. Fix bug in earlier implementation to prevent
spurious logging in tasktracker logs for absent file paths.
(Ravi Gummadi via yhemanth)
MAPREDUCE-676. Fix Hadoop Vaidya to ensure it works for map-only jobs.
(Suhas Gogate via acmurthy)
HADOOP-5582. Fix Hadoop Vaidya to use new Counters in
org.apache.hadoop.mapreduce package. (Suhas Gogate via acmurthy)
HDFS-595. umask settings in configuration may now use octal or
symbolic instead of decimal. Update HDFS tests as such. (jghoman)
MAPREDUCE-1068. Added a verbose error message when user specifies an
incorrect -file parameter. (Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-1171. Allow the read-error notification in shuffle to be
configurable. (Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-353. Allow shuffle read and connection timeouts to be
configurable. (Amareshwari Sriramadasu via acmurthy)
HDFS-781. Namenode metrics PendingDeletionBlocks is not decremented.
(suresh)
MAPREDUCE-1185. Redirect running job url to history url if job is already
retired. (Amareshwari Sriramadasu and Sharad Agarwal via sharad)
MAPREDUCE-754. Fix NPE in expiry thread when a TT is lost. (Amar Kamat
via sharad)
MAPREDUCE-896. Modify permissions for local files on tasktracker before
deletion so they can be deleted cleanly. (Ravi Gummadi via yhemanth)
HADOOP-5771. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-1124. Import Gridmix3 and Rumen. (cdouglas)
MAPREDUCE-1063. Document gridmix benchmark. (cdouglas)
HDFS-758. Changes to report status of decommissioining on the namenode web
UI. (jitendra)
HADOOP-6234. Add new option dfs.umaskmode to set umask in configuration
to use octal or symbolic instead of decimal. (Jakob Homan via suresh)
MAPREDUCE-1147. Add map output counters to new API. (Amar Kamat via
cdouglas)
MAPREDUCE-1182. Fix overflow in reduce causing allocations to exceed the
configured threshold. (cdouglas)
HADOOP-4933. Fixes a ConcurrentModificationException problem that shows up
when the history viewer is accessed concurrently.
(Amar Kamat via ddas)
MAPREDUCE-1140. Fix DistributedCache to not decrement reference counts for
unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-6203. FsShell rm/rmr error message indicates exceeding Trash quota
and suggests using -skpTrash, when moving to trash fails.
(Boris Shkolnik via suresh)
HADOOP-5675. Do not launch a job if DistCp has no work to do. (Tsz Wo
(Nicholas), SZE via cdouglas)
HDFS-457. Better handling of volume failure in Data Node storage,
This fix is a port from hdfs-0.22 to common-0.20 by Boris Shkolnik.
Contributed by Erik Steffl
HDFS-625. Fix NullPointerException thrown from ListPathServlet.
Contributed by Suresh Srinivas.
HADOOP-6343. Log unexpected throwable object caught in RPC.
Contributed by Jitendra Nath Pandey
MAPREDUCE-1186. Fixed DistributedCache to do a recursive chmod on just the
per-cache directory, not all of mapred.local.dir.
(Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-1231. Add an option to distcp to ignore checksums when used with
the upgrade option.
(Jothi Padmanabhan via yhemanth)
MAPREDUCE-1219. Fixed JobTracker to not collect per-job metrics, thus
easing load on it. (Amareshwari Sriramadasu via acmurthy)
HDFS-761. Fix failure to process rename operation from edits log due to
quota verification. (suresh)
MAPREDUCE-1196. Fix FileOutputCommitter to use the deprecated cleanupJob
api correctly. (acmurthy)
HADOOP-6344. rm and rmr immediately delete files rather than sending
to trash, despite trash being enabled, if a user is over-quota. (jhoman)
MAPREDUCE-1160. Reduce verbosity of log lines in some Map/Reduce classes
to avoid filling up jobtracker logs on a busy cluster.
(Ravi Gummadi and Hong Tang via yhemanth)
HDFS-587. Add ability to run HDFS with MR test on non-default queue,
also updated junit dependendcy from junit-3.8.1 to junit-4.5 (to make
it possible to use Configured and Tool to process command line to
be able to specify a queue). Contributed by Erik Steffl.
MAPREDUCE-1158. Fix JT running maps and running reduces metrics.
(sharad)
MAPREDUCE-947. Fix bug in earlier implementation that was
causing unit tests to fail.
(Ravi Gummadi via yhemanth)
MAPREDUCE-1062. Fix MRReliabilityTest to work with retired jobs
(Contributed by Sreekanth Ramakrishnan)
MAPREDUCE-1090. Modified log statement in TaskMemoryManagerThread to
include task attempt id. (yhemanth)
MAPREDUCE-1098. Fixed the distributed-cache to not do i/o while
holding a global lock. (Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-1048. Add occupied/reserved slot usage summary on
jobtracker UI. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-1103. Added more metrics to Jobtracker. (sharad)
MAPREDUCE-947. Added commitJob and abortJob apis to OutputCommitter.
Enhanced FileOutputCommitter to create a _SUCCESS file for successful
jobs. (Amar Kamat & Jothi Padmanabhan via acmurthy)
MAPREDUCE-1105. Remove max limit configuration in capacity scheduler in
favor of max capacity percentage thus allowing the limit to go over
queue capacity. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-1086. Setup Hadoop logging environment for tasks to point to
task related parameters. (Ravi Gummadi via yhemanth)
MAPREDUCE-739. Allow relative paths to be created inside archives.
(mahadev)
HADOOP-6097. Multiple bugs w/ Hadoop archives (mahadev)
HADOOP-6231. Allow caching of filesystem instances to be disabled on a
per-instance basis (ben slusky via mahadev)
MAPREDUCE-826. harchive doesn't use ToolRunner / harchive returns 0 even
if the job fails with exception (koji via mahadev)
HDFS-686. NullPointerException is thrown while merging edit log and
image. (hairong)
HDFS-709. Fix TestDFSShell failure due to rename bug introduced by
HDFS-677. (suresh)
HDFS-677. Rename failure when both source and destination quota exceeds
results in deletion of source. (suresh)
HADOOP-6284. Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to
hadoop-config.sh so that it allows setting java command options for
JAVA_PLATFORM. (Koji Noguchi via szetszwo)
MAPREDUCE-732. Removed spurious log statements in the node
blacklisting logic. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-144. Includes dump of the process tree in task diagnostics when
a task is killed due to exceeding memory limits.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-979. Fixed JobConf APIs related to memory parameters to
return values of new configuration variables when deprecated
variables are disabled. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-277. Makes job history counters available on the job history
viewers. (Jothi Padmanabhan via ddas)
HADOOP-5625. Add operation duration to clienttrace. (Lei Xu
via cdouglas)
HADOOP-5222. Add offset to datanode clienttrace. (Lei Xu via cdouglas)
HADOOP-6218. Adds a feature where TFile can be split by Record
Sequence number. Contributed by Hong Tang and Raghu Angadi.
MAPREDUCE-1088. Changed permissions on JobHistory files on local disk to
0744. Contributed by Arun C. Murthy.
HADOOP-6304. Use java.io.File.set{Readable|Writable|Executable} where
possible in RawLocalFileSystem. Contributed by Arun C. Murthy.
MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band
heartbeat on task-completion for better job-latency. Contributed by
Arun C. Murthy
Configuration changes:
add mapreduce.tasktracker.outofband.heartbeat
MAPREDUCE-1030. Fix capacity-scheduler to assign a map and a reduce task
per-heartbeat. Contributed by Rahuk K Singh.
MAPREDUCE-1028. Fixed number of slots occupied by cleanup tasks to one
irrespective of slot size for the job. Contributed by Ravi Gummadi.
MAPREDUCE-964. Fixed start and finish times of TaskStatus to be
consistent, thereby fixing inconsistencies in metering tasks.
Contributed by Sreekanth Ramakrishnan.
HADOOP-5976. Add a new command, classpath, to the hadoop
script. Contributed by Owen O'Malley and Gary Murry
HADOOP-5784. Makes the number of heartbeats that should arrive
a second at the JobTracker configurable. Contributed by
Amareshwari Sriramadasu.
MAPREDUCE-945. Modifies MRBench and TestMapRed to use
ToolRunner so that options such as queue name can be
passed via command line. Contributed by Sreekanth Ramakrishnan.
HADOOP:5420 Correct bug in earlier implementation
by Arun C. Murthy
HADOOP-5363 Add support for proxying connections to multiple
clusters with different versions to hdfsproxy. Contributed
by Zhiyong Zhang
HADOOP-5780. Improve per block message prited by -metaSave
in HDFS. (Raghu Angadi)
HADOOP-6227. Fix Configuration to allow final parameters to be set
to null and prevent them from being overridden. Contributed by
Amareshwari Sriramadasu.
MAPREDUCE-430 Added patch supplied by Amar Kamat to allow roll forward
on branch to includ externally committed patch.
MAPREDUCE-768. Provide an option to dump jobtracker configuration in
JSON format to standard output. Contributed by V.V.Chaitanya
MAPREDUCE-834 Correct an issue created by merging this issue with
patch attached to external Jira.
HADOOP-6184 Provide an API to dump Configuration in a JSON format.
Contributed by V.V.Chaitanya Krishna.
MAPREDUCE-745 Patch added for this issue to allow branch-0.20 to
merge cleanly.
MAPREDUCE:478 Allow map and reduce jvm parameters, environment
variables and ulimit to be set separately.
MAPREDUCE:682 Removes reservations on tasktrackers which are blacklisted.
Contributed by Sreekanth Ramakrishnan.
HADOOP:5420 Support killing of process groups in LinuxTaskController
binary
HADOOP-5488 Removes the pidfile management for the Task JVM from the
framework and instead passes the PID back and forth between the
TaskTracker and the Task processes. Contributed by Ravi Gummadi.
MAPREDUCE:467 Provide ability to collect statistics about total tasks and
succeeded tasks in different time windows.
MAPREDUCE-817. Add a cache for retired jobs with minimal job
info and provide a way to access history file url
MAPREDUCE-814. Provide a way to configure completed job history
files to be on HDFS.
MAPREDUCE-838 Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would be
declared as successful. Contributed by Amareshwari Sriramadasu.
MAPREDUCE-809 Fix job-summary logs to correctly record final status of
FAILED and KILLED jobs.
MAPREDUCE-740 Log a job-summary at the end of a job, while
allowing it to be configured to use a custom appender if desired.
MAPREDUCE-771 Fixes a bug which delays normal jobs in favor of
high-ram jobs.
HADOOP-5420 Support setsid based kill in LinuxTaskController.
MAPREDUCE-733 Fixes a bug that when a task tracker is killed ,
it throws exception. Instead it should catch it and process it and
allow the rest of the flow to go through
MAPREDUCE-734 Fixes a bug which prevented hi ram jobs from being
removed from the scheduler queue.
MAPREDUCE-693 Fixes a bug that when a job is submitted and the
JT is restarted (before job files have been written) and the job
is killed after recovery, the conf files fail to be moved to the
"done" subdirectory.
MAPREDUCE-722 Fixes a bug where more slots are getting reserved
for HiRAM job tasks than required.
MAPREDUCE-683 TestJobTrackerRestart failed because of stale
filemanager cache (which was created once per jvm). This patch makes
sure that the filemanager is inited upon every JobHistory.init()
and hence upon every restart. Note that this wont happen in production
as upon a restart the new jobtracker will start in a new jvm and
hence a new cache will be created.
MAPREDUCE-709 Fixes a bug where node health check script does
not display the correct message on timeout.
MAPREDUCE-708 Fixes a bug where node health check script does
not refresh the "reason for blacklisting".
MAPREDUCE-522 Rewrote TestQueueCapacities to make it simpler
and avoid timeout errors.
MAPREDUCE-532 Provided ability in the capacity scheduler to
limit the number of slots that can be concurrently used per queue
at any given time.
MAPREDUCE-211 Provides ability to run a health check script on
the tasktracker nodes and blacklist nodes if they are unhealthy.
Contributed by Sreekanth Ramakrishnan.
MAPREDUCE-516 Remove .orig file included by mistake.
MAPREDUCE-416 Moves the history file to a "done" folder whenever
a job completes.
HADOOP-5980 Previously, task spawned off by LinuxTaskController
didn't get LD_LIBRARY_PATH in their environment. The tasks will now
get same LD_LIBRARY_PATH value as when spawned off by
DefaultTaskController.
HADOOP-5981 This issue completes the feature mentioned in
HADOOP-2838. HADOOP-2838 provided a way to set env variables in
child process. This issue provides a way to inherit tt's env variables
and append or reset it. So now X=$X:y will inherit X (if there) and
append y to it.
HADOOP-5419 This issue is to provide an improvement on the
existing M/R framework to let users know which queues they have
access to, and for what operations. One use case for this would
that currently there is no easy way to know if the user has access
to submit jobs to a queue, until it fails with an access control
exception.
HADOOP-5420 Support setsid based kill in LinuxTaskController.
HADOOP-5643 Added the functionality to refresh jobtrackers node
list via command line (bin/hadoop mradmin -refreshNodes). The command
should be run as the jobtracker owner (jobtracker process owner)
or from a super group (mapred.permissions.supergroup).
HADOOP-2838 Now the users can set environment variables using
mapred.child.env. They can do the following X=Y : set X to Y X=$X:Y
: Append Y to X (which should be taken from the tasktracker)
HADOOP-5818. Revert the renaming from FSNamesystem.checkSuperuserPrivilege
to checkAccess by HADOOP-5643. (Amar Kamat via szetszwo)
HADOOP-5801. Fixes the problem: If the hosts file is changed across restart
then it should be refreshed upon recovery so that the excluded hosts are
lost and the maps are re-executed. (Amar Kamat via ddas)
HADOOP-5643. HADOOP-5643. Adds a way to decommission TaskTrackers
while the JobTracker is running. (Amar Kamat via ddas)
HADOOP-5419. Provide a facility to query the Queue ACLs for the
current user. (Rahul Kumar Singh via yhemanth)
HADOOP-5733. Add map/reduce slot capacity and blacklisted capacity to
JobTracker metrics. (Sreekanth Ramakrishnan via cdouglas)
HADOOP-5738. Split "waiting_tasks" JobTracker metric into waiting maps and
waiting reduces. (Sreekanth Ramakrishnan via cdouglas)
HADOOP-4842. Streaming now allows specifiying a command for the combiner.
(Amareshwari Sriramadasu via ddas)
HADOOP-4490. Provide ability to run tasks as job owners.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5442. Paginate jobhistory display and added some search
capabilities. (Amar Kamat via acmurthy)
HADOOP-3327. Improves handling of READ_TIMEOUT during map output copying.
(Amareshwari Sriramadasu via ddas)
HADOOP-5113. Fixed logcondense to remove files for usernames
beginning with characters specified in the -l option.
(Peeyush Bishnoi via yhemanth)
HADOOP-2898. Provide an option to specify a port range for
Hadoop services provisioned by HOD.
(Peeyush Bishnoi via yhemanth)
HADOOP-4930. Implement a Linux native executable that can be used to
launch tasks as users. (Sreekanth Ramakrishnan via yhemanth)
Release 0.20.3 - Unreleased
IMPROVEMENTS
BUG FIXES
HDFS-955. New implementation of saveNamespace() to avoid loss of edits
when name-node fails during saving. (shv)
Release 0.20.2 - Unreleased
BUG FIXES
MAPREDUCE-112. Add counters for reduce input, output records to the new API.
(Jothi Padmanabhan via cdouglas)
HADOOP-6498. IPC client bug may cause rpc call hang. (Ruyue Ma and hairong
via hairong)
HDFS-927. DFSInputStream retries too many times for new block locations
(Todd Lipcon via Stack)
HDFS-793. DataNode should first receive the whole packet ack message
before it constructs and sends its own ack message for the packet.
(hairong)
HDFS-723. Fix deadlock in DFSClient#DFSOutputStream. (hairong)
HDFS-732. DFSClient.DFSOutputStream.close() should throw an exception if
the stream cannot be closed successfully. (szetszwo)
IMPROVEMENTS
HDFS-187. Initialize secondary namenode http address in TestStartup.
(Todd Lipcon via szetszwo)
HDFS-185. Disallow chown, chgrp, chmod, setQuota, and setSpaceQuota when
name-node is in safemode. (Ravi Phulari via shv)
HADOOP-5611. Fix C++ libraries to build on Debian Lenny. (Todd Lipcon
via tomwhite)
HADOOP-5612. Some c++ scripts are not chmodded before ant execution.
(Todd Lipcon via tomwhite)
HDFS-579. Fix DfsTask to follow the semantics of 0.19, regarding non-zero
return values as failures. (Christian Kunz via cdouglas)
HDFS-596. Fix memory leak in hdfsFreeFileInfo() for libhdfs.
(Zhang Bingjun via dhruba)
MAPREDUCE-1070. Prevent a deadlock in the fair scheduler servlet.
(Todd Lipcon via cdouglas)
HADOOP-5623. Ensure streaming status messages aren't overwritten. (Rick
Cox & Ravi Gummadi via tomwhite)
MAPREDUCE-1163. Remove unused, hard-coded paths from libhdfs. (Allen
Wittenauer via cdouglas)
HADOOP-6315. Avoid incorrect use of BuiltInflater/BuiltInDeflater in
GzipCodec. (Aaron Kimball via cdouglas)
HADOOP-6269. Fix threading issue with defaultResource in Configuration.
(Sreekanth Ramakrishnan via cdouglas)
HADOOP-5759. Fix for IllegalArgumentException when CombineFileInputFormat
is used as job InputFormat. (Amareshwari Sriramadasu via zshao)
Release 0.20.1 - 2009-09-01
INCOMPATIBLE CHANGES
HADOOP-5726. Remove pre-emption from capacity scheduler code base.
(Rahul Kumar Singh via yhemanth)
HADOOP-5881. Simplify memory monitoring and scheduling related
configuration. (Vinod Kumar Vavilapalli via yhemanth)
NEW FEATURES
HADOOP-6080. Introduce -skipTrash option to rm and rmr.
(Jakob Homan via shv)
HADOOP-3315. Add a new, binary file foramt, TFile. (Hong Tang via cdouglas)
IMPROVEMENTS
HADOOP-5711. Change Namenode file close log to info. (szetszwo)
HADOOP-5736. Update the capacity scheduler documentation for features
like memory based scheduling, job initialization and removal of pre-emption.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4674. Fix fs help messages for -test, -text, -tail, -stat
and -touchz options. (Ravi Phulari via szetszwo)
HADOOP-4372. Improves the way history filenames are obtained and manipulated.
(Amar Kamat via ddas)
HADOOP-5897. Add name-node metrics to capture java heap usage.
(Suresh Srinivas via shv)
HDFS-438. Improve help message for space quota command. (Raghu Angadi)
MAPREDUCE-767. Remove the dependence on the CLI 2.0 snapshot.
(Amar Kamat via ddas)
OPTIMIZATIONS
BUG FIXES
HADOOP-5691. Makes org.apache.hadoop.mapreduce.Reducer concrete class
instead of abstract. (Amareshwari Sriramadasu via sharad)
HADOOP-5646. Fixes a problem in TestQueueCapacities.
(Vinod Kumar Vavilapalli via ddas)
HADOOP-5655. TestMRServerPorts fails on java.net.BindException. (Devaraj
Das via hairong)
HADOOP-5654. TestReplicationPolicy.<init> fails on java.net.BindException.
(hairong)
HADOOP-5688. Fix HftpFileSystem checksum path construction. (Tsz Wo
(Nicholas) Sze via cdouglas)
HADOOP-5213. Fix Null pointer exception caused when bzip2compression
was used and user closed a output stream without writing any data.
(Zheng Shao via dhruba)
HADOOP-5718. Remove the check for the default queue in capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5719. Remove jobs that failed initialization from the waiting queue
in the capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4744. Attaching another fix to the jetty port issue. The TaskTracker
kills itself if it ever discovers that the port to which jetty is actually
bound is invalid (-1). (ddas)
HADOOP-5349. Fixes a problem in LocalDirAllocator to check for the return
path value that is returned for the case where the file we want to write
is of an unknown size. (Vinod Kumar Vavilapalli via ddas)
HADOOP-5636. Prevents a job from going to RUNNING state after it has been
KILLED (this used to happen when the SetupTask would come back with a
success after the job has been killed). (Amar Kamat via ddas)
HADOOP-5641. Fix a NullPointerException in capacity scheduler's memory
based scheduling code when jobs get retired. (yhemanth)
HADOOP-5828. Use absolute path for mapred.local.dir of JobTracker in
MiniMRCluster. (yhemanth)
HADOOP-4981. Fix capacity scheduler to schedule speculative tasks
correctly in the presence of High RAM jobs.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5210. Solves a problem in the progress report of the reduce task.
(Ravi Gummadi via ddas)
HADOOP-5850. Fixes a problem to do with not being able to jobs with
0 maps/reduces. (Vinod K V via ddas)
HADOOP-5728. Fixed FSEditLog.printStatistics IndexOutOfBoundsException.
(Wang Xu via johan)
HADOOP-4626. Correct the API links in hdfs forrest doc so that they
point to the same version of hadoop. (szetszwo)
HADOOP-5883. Fixed tasktracker memory monitoring to account for
momentary spurts in memory usage due to java's fork() model.
(yhemanth)
HADOOP-5539. Fixes a problem to do with not preserving intermediate
output compression for merged data.
(Jothi Padmanabhan and Billy Pearson via ddas)
HADOOP-5932. Fixes a problem in capacity scheduler in computing
available memory on a tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5648. Fixes a build issue in not being able to generate gridmix.jar
in hadoop binary tarball. (Giridharan Kesavan via gkesavan)
HADOOP-5908. Fixes a problem to do with ArithmeticException in the
JobTracker when there are jobs with 0 maps. (Amar Kamat via ddas)
HADOOP-5924. Fixes a corner case problem to do with job recovery with
empty history files. Also, after a JT restart, sends KillTaskAction to
tasks that report back but the corresponding job hasn't been initialized
yet. (Amar Kamat via ddas)
HADOOP-5882. Fixes a reducer progress update problem for new mapreduce
api. (Amareshwari Sriramadasu via sharad)
HADOOP-5746. Fixes a corner case problem in Streaming, where if an
exception happens in MROutputThread after the last call to the map/reduce
method, the exception goes undetected. (Amar Kamat via ddas)
HADOOP-5884. Fixes accounting in capacity scheduler so that high RAM jobs
take more slots. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5937. Correct a safemode message in FSNamesystem. (Ravi Phulari
via szetszwo)
HADOOP-5869. Fix bug in assignment of setup / cleanup task that was
causing TestQueueCapacities to fail.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5921. Fixes a problem in the JobTracker where it sometimes never
used to come up due to a system file creation on JobTracker's system-dir
failing. This problem would sometimes show up only when the FS for the
system-dir (usually HDFS) is started at nearly the same time as the
JobTracker. (Amar Kamat via ddas)
HADOOP-5920. Fixes a testcase failure for TestJobHistory.
(Amar Kamat via ddas)
HDFS-26. Better error message to users when commands fail because of
lack of quota. Allow quota to be set even if the limit is lower than
current consumption. (Boris Shkolnik via rangadi)
MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty
keys. (Amar Kamat via sharad)
MAPREDUCE-130. Delete the jobconf copy from the log directory of the
JobTracker when the job is retired. (Amar Kamat via sharad)
MAPREDUCE-657. Fix hardcoded filesystem problem in CompletedJobStatusStore.
(Amar Kamat via sharad)
MAPREDUCE-179. Update progress in new RecordReaders. (cdouglas)
MAPREDUCE-124. Fix a bug in failure handling of abort task of
OutputCommiter. (Amareshwari Sriramadasu via sharad)
HADOOP-6139. Fix the FsShell help messages for rm and rmr. (Jakob Homan
via szetszwo)
HADOOP-6141. Fix a few bugs in 0.20 test-patch.sh. (Hong Tang via
szetszwo)
HADOOP-6145. Fix FsShell rm/rmr error messages when there is a FNFE.
(Jakob Homan via szetszwo)
MAPREDUCE-565. Fix partitioner to work with new API. (Owen O'Malley via
cdouglas)
MAPREDUCE-465. Fix a bug in MultithreadedMapRunner. (Amareshwari
Sriramadasu via sharad)
MAPREDUCE-18. Puts some checks to detect cases where jetty serves up
incorrect output during shuffle. (Ravi Gummadi via ddas)
MAPREDUCE-735. Fixes a problem in the KeyFieldHelper to do with
the end index for some inputs (Amar Kamat via ddas)
HADOOP-6150. Users should be able to instantiate comparator using TFile
API. (Hong Tang via rangadi)
MAPREDUCE-383. Fix a bug in Pipes combiner due to bytes count not
getting reset after the spill. (Christian Kunz via sharad)
MAPREDUCE-40. Keep memory management backwards compatible for job
configuration parameters and limits. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-796. Fixes a ClassCastException in an exception log in
MultiThreadedMapRunner. (Amar Kamat via ddas)
MAPREDUCE-838. Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would
be declared as successful. (Amareshwari Sriramadasu via ddas)
MAPREDUCE-805. Fixes some deadlocks in the JobTracker due to the fact
the JobTracker lock hierarchy wasn't maintained in some JobInProgress
method calls. (Amar Kamat via ddas)
HDFS-167. Fix a bug in DFSClient that caused infinite retries on write.
(Bill Zeller via szetszwo)
HDFS-527. Remove unnecessary DFSClient constructors. (szetszwo)
MAPREDUCE-832. Reduce number of warning messages printed when
deprecated memory variables are used. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-745. Fixes a testcase problem to do with generation of JobTracker
IDs. (Amar Kamat via ddas)
MAPREDUCE-834. Enables memory management on tasktrackers when old
memory management parameters are used in configuration.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-818. Fixes Counters#getGroup API. (Amareshwari Sriramadasu
via sharad)
MAPREDUCE-807. Handles the AccessControlException during the deletion of
mapred.system.dir in the JobTracker. The JobTracker will bail out if it
encounters such an exception. (Amar Kamat via ddas)
HADOOP-6213. Remove commons dependency on commons-cli2. (Amar Kamat via
sharad)
MAPREDUCE-430. Fix a bug related to task getting stuck in case of
OOM error. (Amar Kamat via ddas)
HADOOP-6215. fix GenericOptionParser to deal with -D with '=' in the
value. (Amar Kamat via sharad)
MAPREDUCE-421. Fix Pipes to use returned system exit code.
(Christian Kunz via omalley)
HDFS-525. The SimpleDateFormat object in ListPathsServlet is not thread
safe. (Suresh Srinivas and cdouglas)
MAPREDUCE-911. Fix a bug in TestTaskFail related to speculative
execution. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-687. Fix an assertion in TestMiniMRMapRedDebugScript.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-924. Fixes the TestPipes testcase to use Tool.
(Amareshwari Sriramadasu via sharad)
Release 0.20.0 - 2009-04-15
INCOMPATIBLE CHANGES
HADOOP-4210. Fix findbugs warnings for equals implementations of mapred ID
classes. Removed public, static ID::read and ID::forName; made ID an
abstract class. (Suresh Srinivas via cdouglas)
HADOOP-4253. Fix various warnings generated by findbugs.
Following deprecated methods in RawLocalFileSystem are removed:
public String getName()
public void lock(Path p, boolean shared)
public void release(Path p)
(Suresh Srinivas via johan)
HADOOP-4618. Move http server from FSNamesystem into NameNode.
FSNamesystem.getNameNodeInfoPort() is removed.
FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort()
replaced by FSNamesystem.getDFSNameNodeAddress().
NameNode(bindAddress, conf) is removed.
(shv)
HADOOP-4567. GetFileBlockLocations returns the NetworkTopology
information of the machines where the blocks reside. (dhruba)
HADOOP-4435. The JobTracker WebUI displays the amount of heap memory
in use. (dhruba)
HADOOP-4628. Move Hive into a standalone subproject. (omalley)
HADOOP-4188. Removes task's dependency on concrete filesystems.
(Sharad Agarwal via ddas)
HADOOP-1650. Upgrade to Jetty 6. (cdouglas)
HADOOP-3986. Remove static Configuration from JobClient. (Amareshwari
Sriramadasu via cdouglas)
JobClient::setCommandLineConfig is removed
JobClient::getCommandLineConfig is removed
JobShell, TestJobShell classes are removed
HADOOP-4422. S3 file systems should not create bucket.
(David Phillips via tomwhite)
HADOOP-4035. Support memory based scheduling in capacity scheduler.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-3497. Fix bug in overly restrictive file globbing with a
PathFilter. (tomwhite)
HADOOP-4445. Replace running task counts with running task
percentage in capacity scheduler UI. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4631. Splits the configuration into three parts - one for core,
one for mapred and the last one for HDFS. (Sharad Agarwal via cdouglas)
HADOOP-3344. Fix libhdfs build to use autoconf and build the same
architecture (32 vs 64 bit) of the JVM running Ant. The libraries for
pipes, utils, and libhdfs are now all in c++/<os_osarch_jvmdatamodel>/lib.
(Giridharan Kesavan via nigel)
HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
HADOOP-4970. The full path name of a file is preserved inside Trash.
(Prasad Chakka via dhruba)
HADOOP-4103. NameNode keeps a count of missing blocks. It warns on
WebUI if there are such blocks. '-report' and '-metaSave' have extra
info to track such blocks. (Raghu Angadi)
HADOOP-4783. Change permissions on history files on the jobtracker
to be only group readable instead of world readable.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5531. Removed Chukwa from Hadoop 0.20.0. (nigel)
NEW FEATURES
HADOOP-4575. Add a proxy service for relaying HsftpFileSystem requests.
Includes client authentication via user certificates and config-based
access control. (Kan Zhang via cdouglas)
HADOOP-4661. Add DistCh, a new tool for distributed ch{mod,own,grp}.
(szetszwo)
HADOOP-4709. Add several new features and bug fixes to Chukwa.
Added Hadoop Infrastructure Care Center (UI for visualize data collected
by Chukwa)
Added FileAdaptor for streaming small file in one chunk
Added compression to archive and demux output
Added unit tests and validation for agent, collector, and demux map
reduce job
Added database loader for loading demux output (sequence file) to jdbc
connected database
Added algorithm to distribute collector load more evenly
(Jerome Boulon, Eric Yang, Andy Konwinski, Ariel Rabkin via cdouglas)
HADOOP-4179. Add Vaidya tool to analyze map/reduce job logs for performanc
problems. (Suhas Gogate via omalley)
HADOOP-4029. Add NameNode storage information to the dfshealth page and
move DataNode information to a separated page. (Boris Shkolnik via
szetszwo)
HADOOP-4348. Add service-level authorization for Hadoop. (acmurthy)
HADOOP-4826. Introduce admin command saveNamespace. (shv)
HADOOP-3063 BloomMapFile - fail-fast version of MapFile for sparsely
populated key space (Andrzej Bialecki via stack)
HADOOP-1230. Add new map/reduce API and deprecate the old one. Generally,
the old code should work without problem. The new api is in
org.apache.hadoop.mapreduce and the old classes in org.apache.hadoop.mapred
are deprecated. Differences in the new API:
1. All of the methods take Context objects that allow us to add new
methods without breaking compatability.
2. Mapper and Reducer now have a "run" method that is called once and
contains the control loop for the task, which lets applications
replace it.
3. Mapper and Reducer by default are Identity Mapper and Reducer.
4. The FileOutputFormats use part-r-00000 for the output of reduce 0 and
part-m-00000 for the output of map 0.
5. The reduce grouping comparator now uses the raw compare instead of
object compare.
6. The number of maps in FileInputFormat is controlled by min and max
split size rather than min size and the desired number of maps.
(omalley)
HADOOP-3305. Use Ivy to manage dependencies. (Giridharan Kesavan
and Steve Loughran via cutting)
IMPROVEMENTS
HADOOP-4565. Added CombineFileInputFormat to use data locality information
to create splits. (dhruba via zshao)
HADOOP-4749. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via
zshao)
HADOOP-4234. Fix KFS "glue" layer to allow applications to interface
with multiple KFS metaservers. (Sriram Rao via lohit)
HADOOP-4245. Update to latest version of KFS "glue" library jar.
(Sriram Rao via lohit)
HADOOP-4244. Change test-patch.sh to check Eclipse classpath no matter
it is run by Hudson or not. (szetszwo)
HADOOP-3180. Add name of missing class to WritableName.getClass
IOException. (Pete Wyckoff via omalley)
HADOOP-4178. Make the capacity scheduler's default values configurable.
(Sreekanth Ramakrishnan via omalley)
HADOOP-4262. Generate better error message when client exception has null
message. (stevel via omalley)
HADOOP-4226. Refactor and document LineReader to make it more readily
understandable. (Yuri Pradkin via cdouglas)
HADOOP-4238. When listing jobs, if scheduling information isn't available
print NA instead of empty output. (Sreekanth Ramakrishnan via johan)
HADOOP-4284. Support filters that apply to all requests, or global filters,
to HttpServer. (Kan Zhang via cdouglas)
HADOOP-4276. Improve the hashing functions and deserialization of the
mapred ID classes. (omalley)
HADOOP-4485. Add a compile-native ant task, as a shorthand. (enis)
HADOOP-4454. Allow # comments in slaves file. (Rama Ramasamy via omalley)
HADOOP-3461. Remove hdfs.StringBytesWritable. (szetszwo)
HADOOP-4437. Use Halton sequence instead of java.util.Random in
PiEstimator. (szetszwo)
HADOOP-4572. Change INode and its sub-classes to package private.
(szetszwo)
HADOOP-4187. Does a runtime lookup for JobConf/JobConfigurable, and if
found, invokes the appropriate configure method. (Sharad Agarwal via ddas)
HADOOP-4453. Improve ssl configuration and handling in HsftpFileSystem,
particularly when used with DistCp. (Kan Zhang via cdouglas)
HADOOP-4583. Several code optimizations in HDFS. (Suresh Srinivas via
szetszwo)
HADOOP-3923. Remove org.apache.hadoop.mapred.StatusHttpServer. (szetszwo)
HADOOP-4622. Explicitly specify interpretor for non-native
pipes binaries. (Fredrik Hedberg via johan)
HADOOP-4505. Add a unit test to test faulty setup task and cleanup
task killing the job. (Amareshwari Sriramadasu via johan)
HADOOP-4608. Don't print a stack trace when the example driver gets an
unknown program to run. (Edward Yoon via omalley)
HADOOP-4645. Package HdfsProxy contrib project without the extra level
of directories. (Kan Zhang via omalley)
HADOOP-4126. Allow access to HDFS web UI on EC2 (tomwhite via omalley)
HADOOP-4612. Removes RunJar's dependency on JobClient.
(Sharad Agarwal via ddas)
HADOOP-4185. Adds setVerifyChecksum() method to FileSystem.
(Sharad Agarwal via ddas)
HADOOP-4523. Prevent too many tasks scheduled on a node from bringing
it down by monitoring for cumulative memory usage across tasks.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4640. Adds an input format that can split lzo compressed
text files. (johan)
HADOOP-4666. Launch reduces only after a few maps have run in the
Fair Scheduler. (Matei Zaharia via johan)
HADOOP-4339. Remove redundant calls from FileSystem/FsShell when
generating/processing ContentSummary. (David Phillips via cdouglas)
HADOOP-2774. Add counters tracking records spilled to disk in MapTask and
ReduceTask. (Ravi Gummadi via cdouglas)
HADOOP-4513. Initialize jobs asynchronously in the capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
Qi via cdouglas)
HADOOP-4708. Add support for dfsadmin commands in TestCLI. (Boris Shkolnik
via cdouglas)
HADOOP-4758. Add a splitter for metrics contexts to support more than one
type of collector. (cdouglas)
HADOOP-4722. Add tests for dfsadmin quota error messages. (Boris Shkolnik
via cdouglas)
HADOOP-4690. fuse-dfs - create source file/function + utils + config +
main source files. (pete wyckoff via mahadev)
HADOOP-3750. Fix and enforce module dependencies. (Sharad Agarwal via
tomwhite)
HADOOP-4747. Speed up FsShell::ls by removing redundant calls to the
filesystem. (David Phillips via cdouglas)
HADOOP-4305. Improves the blacklisting strategy, whereby, tasktrackers
that are blacklisted are not given tasks to run from other jobs, subject
to the following conditions (all must be met):
1) The TaskTracker has been blacklisted by at least 4 jobs (configurable)
2) The TaskTracker has been blacklisted 50% more number of times than
the average (configurable)
3) The cluster has less than 50% trackers blacklisted
Once in 24 hours, a TaskTracker blacklisted for all jobs is given a chance.
Restarting the TaskTracker moves it out of the blacklist.
(Amareshwari Sriramadasu via ddas)
HADOOP-4688. Modify the MiniMRDFSSort unit test to spill multiple times,
exercising the map-side merge code. (cdouglas)
HADOOP-4737. Adds the KILLED notification when jobs get killed.
(Amareshwari Sriramadasu via ddas)
HADOOP-4728. Add a test exercising different namenode configurations.
(Boris Shkolnik via cdouglas)
HADOOP-4807. Adds JobClient commands to get the active/blacklisted tracker
names. Also adds commands to display running/completed task attempt IDs.
(ddas)
HADOOP-4699. Remove checksum validation from map output servlet. (cdouglas)
HADOOP-4838. Added a registry to automate metrics and mbeans management.
(Sanjay Radia via acmurthy)
HADOOP-3136. Fixed the default scheduler to assign multiple tasks to each
tasktracker per heartbeat, when feasible. To ensure locality isn't hurt
too badly, the scheudler will not assign more than one off-switch task per
heartbeat. The heartbeat interval is also halved since the task-tracker is
fixed to no longer send out heartbeats on each task completion. A
slow-start for scheduling reduces is introduced to ensure that reduces
aren't started till sufficient number of maps are done, else reduces of
jobs whose maps aren't scheduled might swamp the cluster.
Configuration changes to mapred-default.xml:
add mapred.reduce.slowstart.completed.maps
(acmurthy)
HADOOP-4545. Add example and test case of secondary sort for the reduce.
(omalley)
HADOOP-4753. Refactor gridmix2 to reduce code duplication. (cdouglas)
HADOOP-4909. Fix Javadoc and make some of the API more consistent in their
use of the JobContext instead of Configuration. (omalley)
HADOOP-4830. Add end-to-end test cases for testing queue capacities.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4980. Improve code layout of capacity scheduler to make it
easier to fix some blocker bugs. (Vivek Ratan via yhemanth)
HADOOP-4916. Make user/location of Chukwa installation configurable by an
external properties file. (Eric Yang via cdouglas)
HADOOP-4950. Make the CompressorStream, DecompressorStream,
BlockCompressorStream, and BlockDecompressorStream public to facilitate
non-Hadoop codecs. (omalley)
HADOOP-4843. Collect job history and configuration in Chukwa. (Eric Yang
via cdouglas)
HADOOP-5030. Build Chukwa RPM to install into configured directory. (Eric
Yang via cdouglas)
HADOOP-4828. Updates documents to do with configuration (HADOOP-4631).
(Sharad Agarwal via ddas)
HADOOP-4939. Adds a test that would inject random failures for tasks in
large jobs and would also inject TaskTracker failures. (ddas)
HADOOP-4920. Stop storing Forrest output in Subversion. (cutting)
HADOOP-4944. A configuration file can include other configuration
files. (Rama Ramasamy via dhruba)
HADOOP-4804. Provide Forrest documentation for the Fair Scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5248. A testcase that checks for the existence of job directory
after the job completes. Fails if it exists. (ddas)
HADOOP-4664. Introduces multiple job initialization threads, where the
number of threads are configurable via mapred.jobinit.threads.
(Matei Zaharia and Jothi Padmanabhan via ddas)
HADOOP-4191. Adds a testcase for JobHistory. (Ravi Gummadi via ddas)
HADOOP-5466. Change documenation CSS style for headers and code. (Corinne
Chandel via szetszwo)
HADOOP-5275. Add ivy directory and files to built tar.
(Giridharan Kesavan via nigel)
HADOOP-5468. Add sub-menus to forrest documentation and make some minor
edits. (Corinne Chandel via szetszwo)
HADOOP-5437. Fix TestMiniMRDFSSort to properly test jvm-reuse. (omalley)
HADOOP-5521. Removes dependency of TestJobInProgress on RESTART_COUNT
JobHistory tag. (Ravi Gummadi via ddas)
HADOOP-5714. Add a metric for NameNode getFileInfo operation. (Jakob Homan
via szetszwo)
OPTIMIZATIONS
HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
based on the rack/host that has the most number of bytes.
(Jothi Padmanabhan via ddas)
HADOOP-4683. Fixes Reduce shuffle scheduler to invoke
getMapCompletionEvents in a separate thread. (Jothi Padmanabhan
via ddas)
BUG FIXES
HADOOP-5379. CBZip2InputStream to throw IOException on data crc error.
(Rodrigo Schmidt via zshao)
HADOOP-5326. Fixes CBZip2OutputStream data corruption problem.
(Rodrigo Schmidt via zshao)
HADOOP-4204. Fix findbugs warnings related to unused variables, naive
Number subclass instantiation, Map iteration, and badly scoped inner
classes. (Suresh Srinivas via cdouglas)
HADOOP-4207. Update derby jar file to release 10.4.2 release.
(Prasad Chakka via dhruba)
HADOOP-4325. SocketInputStream.read() should return -1 in case EOF.
(Raghu Angadi)
HADOOP-4408. FsAction functions need not create new objects. (cdouglas)
HADOOP-4440. TestJobInProgressListener tests for jobs killed in queued
state (Amar Kamat via ddas)
HADOOP-4346. Implement blocking connect so that Hadoop is not affected
by selector problem with JDK default implementation. (Raghu Angadi)
HADOOP-4388. If there are invalid blocks in the transfer list, Datanode
should handle them and keep transferring the remaining blocks. (Suresh
Srinivas via szetszwo)
HADOOP-4587. Fix a typo in Mapper javadoc. (Koji Noguchi via szetszwo)
HADOOP-4530. In fsck, HttpServletResponse sendError fails with
IllegalStateException. (hairong)
HADOOP-4377. Fix a race condition in directory creation in
NativeS3FileSystem. (David Phillips via cdouglas)
HADOOP-4621. Fix javadoc warnings caused by duplicate jars. (Kan Zhang via
cdouglas)
HADOOP-4566. Deploy new hive code to support more types.
(Zheng Shao via dhruba)
HADOOP-4571. Add chukwa conf files to svn:ignore list. (Eric Yang via
szetszwo)
HADOOP-4589. Correct PiEstimator output messages and improve the code
readability. (szetszwo)
HADOOP-4650. Correct a mismatch between the default value of
local.cache.size in the config and the source. (Jeff Hammerbacher via
cdouglas)
HADOOP-4606. Fix cygpath error if the log directory does not exist.
(szetszwo via omalley)
HADOOP-4141. Fix bug in ScriptBasedMapping causing potential infinite
loop on misconfigured hadoop-site. (Aaron Kimball via tomwhite)
HADOOP-4691. Correct a link in the javadoc of IndexedSortable. (szetszwo)
HADOOP-4598. '-setrep' command skips under-replicated blocks. (hairong)
HADOOP-4429. Set defaults for user, group in UnixUserGroupInformation so
login fails more predictably when misconfigured. (Alex Loddengaard via
cdouglas)
HADOOP-4676. Fix broken URL in blacklisted tasktrackers page. (Amareshwari
Sriramadasu via cdouglas)
HADOOP-3422 Ganglia counter metrics are all reported with the metric
name "value", so the counter values can not be seen. (Jason Attributor
and Brian Bockelman via stack)
HADOOP-4704. Fix javadoc typos "the the". (szetszwo)
HADOOP-4677. Fix semantics of FileSystem::getBlockLocations to return
meaningful values. (Hong Tang via cdouglas)
HADOOP-4669. Use correct operator when evaluating whether access time is
enabled (Dhruba Borthakur via cdouglas)
HADOOP-4732. Pass connection and read timeouts in the correct order when
setting up fetch in reduce. (Amareshwari Sriramadasu via cdouglas)
HADOOP-4558. Fix capacity reclamation in capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4770. Fix rungridmix_2 script to work with RunJar. (cdouglas)
HADOOP-4738. When using git, the saveVersion script will use only the
commit hash for the version and not the message, which requires escaping.
(cdouglas)
HADOOP-4576. Show pending job count instead of task count in the UI per
queue in capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4623. Maintain running tasks even if speculative execution is off.
(Amar Kamat via yhemanth)
HADOOP-4786. Fix broken compilation error in
TestTrackerBlacklistAcrossJobs. (yhemanth)
HADOOP-4785. Fixes theJobTracker heartbeat to not make two calls to
System.currentTimeMillis(). (Amareshwari Sriramadasu via ddas)
HADOOP-4792. Add generated Chukwa configuration files to version control
ignore lists. (cdouglas)
HADOOP-4796. Fix Chukwa test configuration, remove unused components. (Eric
Yang via cdouglas)
HADOOP-4708. Add binaries missed in the initial checkin for Chukwa. (Eric
Yang via cdouglas)
HADOOP-4805. Remove black list collector from Chukwa Agent HTTP Sender.
(Eric Yang via cdouglas)
HADOOP-4837. Move HADOOP_CONF_DIR configuration to chukwa-env.sh (Jerome
Boulon via cdouglas)
HADOOP-4825. Use ps instead of jps for querying process status in Chukwa.
(Eric Yang via cdouglas)
HADOOP-4844. Fixed javadoc for
org.apache.hadoop.fs.permission.AccessControlException to document that
it's deprecated in favour of
org.apache.hadoop.security.AccessControlException. (acmurthy)
HADOOP-4706. Close the underlying output stream in
IFileOutputStream::close. (Jothi Padmanabhan via cdouglas)
HADOOP-4855. Fixed command-specific help messages for refreshServiceAcl in
DFSAdmin and MRAdmin. (acmurthy)
HADOOP-4820. Remove unused method FSNamesystem::deleteInSafeMode. (Suresh
Srinivas via cdouglas)
HADOOP-4698. Lower io.sort.mb to 10 in the tests and raise the junit memory
limit to 512m from 256m. (Nigel Daley via cdouglas)
HADOOP-4860. Split TestFileTailingAdapters into three separate tests to
avoid contention. (Eric Yang via cdouglas)
HADOOP-3921. Fixed clover (code coverage) target to work with JDK 6.
(tomwhite via nigel)
HADOOP-4845. Modify the reduce input byte counter to record only the
compressed size and add a human-readable label. (Yongqiang He via cdouglas)
HADOOP-4458. Add a test creating symlinks in the working directory.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-4879. Fix org.apache.hadoop.mapred.Counters to correctly define
Object.equals rather than depend on contentEquals api. (omalley via
acmurthy)
HADOOP-4791. Fix rpm build process for Chukwa. (Eric Yang via cdouglas)
HADOOP-4771. Correct initialization of the file count for directories
with quotas. (Ruyue Ma via shv)
HADOOP-4878. Fix eclipse plugin classpath file to point to ivy's resolved
lib directory and added the same to test-patch.sh. (Giridharan Kesavan via
acmurthy)
HADOOP-4774. Fix default values of some capacity scheduler configuration
items which would otherwise not work on a fresh checkout.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4876. Fix capacity scheduler reclamation by updating count of
pending tasks correctly. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4849. Documentation for Service Level Authorization implemented in
HADOOP-4348. (acmurthy)
HADOOP-4827. Replace Consolidator with Aggregator macros in Chukwa (Eric
Yang via cdouglas)
HADOOP-4894. Correctly parse ps output in Chukwa jettyCollector.sh. (Ari
Rabkin via cdouglas)
HADOOP-4892. Close fds out of Chukwa ExecPlugin. (Ari Rabkin via cdouglas)
HADOOP-4889. Fix permissions in RPM packaging. (Eric Yang via cdouglas)
HADOOP-4869. Fixes the TT-JT heartbeat to have an explicit flag for
restart apart from the initialContact flag that there was earlier.
(Amareshwari Sriramadasu via ddas)
HADOOP-4716. Fixes ReduceTask.java to clear out the mapping between
hosts and MapOutputLocation upon a JT restart (Amar Kamat via ddas)
HADOOP-4880. Removes an unnecessary testcase from TestJobTrackerRestart.
(Amar Kamat via ddas)
HADOOP-4924. Fixes a race condition in TaskTracker re-init. (ddas)
HADOOP-4854. Read reclaim capacity interval from capacity scheduler
configuration. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4896. HDFS Fsck does not load HDFS configuration. (Raghu Angadi)
HADOOP-4956. Creates TaskStatus for failed tasks with an empty Counters
object instead of null. (ddas)
HADOOP-4979. Fix capacity scheduler to block cluster for failed high
RAM requirements across task types. (Vivek Ratan via yhemanth)
HADOOP-4949. Fix native compilation. (Chris Douglas via acmurthy)
HADOOP-4787. Fixes the testcase TestTrackerBlacklistAcrossJobs which was
earlier failing randomly. (Amareshwari Sriramadasu via ddas)
HADOOP-4914. Add description fields to Chukwa init.d scripts (Eric Yang via
cdouglas)
HADOOP-4884. Make tool tip date format match standard HICC format. (Eric
Yang via cdouglas)
HADOOP-4925. Make Chukwa sender properties configurable. (Ari Rabkin via
cdouglas)
HADOOP-4947. Make Chukwa command parsing more forgiving of whitespace. (Ari
Rabkin via cdouglas)
HADOOP-5026. Make chukwa/bin scripts executable in repository. (Andy
Konwinski via cdouglas)
HADOOP-4977. Fix a deadlock between the reclaimCapacity and assignTasks
in capacity scheduler. (Vivek Ratan via yhemanth)
HADOOP-4988. Fix reclaim capacity to work even when there are queues with
no capacity. (Vivek Ratan via yhemanth)
HADOOP-5065. Remove generic parameters from argument to
setIn/OutputFormatClass so that it works with SequenceIn/OutputFormat.
(cdouglas via omalley)
HADOOP-4818. Pass user config to instrumentation API. (Eric Yang via
cdouglas)
HADOOP-4993. Fix Chukwa agent configuration and startup to make it both
more modular and testable. (Ari Rabkin via cdouglas)
HADOOP-5048. Fix capacity scheduler to correctly cleanup jobs that are
killed after initialization, but before running.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4671. Mark loop control variables shared between threads as
volatile. (cdouglas)
HADOOP-5079. HashFunction inadvertently destroys some randomness
(Jonathan Ellis via stack)
HADOOP-4999. A failure to write to FsEditsLog results in
IndexOutOfBounds exception. (Boris Shkolnik via rangadi)
HADOOP-5139. Catch IllegalArgumentException during metrics registration
in RPC. (Hairong Kuang via szetszwo)
HADOOP-5085. Copying a file to local with Crc throws an exception.
(hairong)
HADOOP-4759. Removes temporary output directory for failed and
killed tasks by launching special CLEANUP tasks for the same.
(Amareshwari Sriramadasu via ddas)
HADOOP-5211. Fix check for job completion in TestSetupAndCleanupFailure.
(enis)
HADOOP-5254. The Configuration class should be able to work with XML
parsers that do not support xmlinclude. (Steve Loughran via dhruba)
HADOOP-4692. Namenode in infinite loop for replicating/deleting corrupt
blocks. (hairong)
HADOOP-5255. Fix use of Math.abs to avoid overflow. (Jonathan Ellis via
cdouglas)
HADOOP-5269. Fixes a problem to do with tasktracker holding on to
FAILED_UNCLEAN or KILLED_UNCLEAN tasks forever. (Amareshwari Sriramadasu
via ddas)
HADOOP-5214. Fixes a ConcurrentModificationException while the Fairshare
Scheduler accesses the tasktrackers stored by the JobTracker.
(Rahul Kumar Singh via yhemanth)
HADOOP-5233. Addresses the three issues - Race condition in updating
status, NPE in TaskTracker task localization when the conf file is missing
(HADOOP-5234) and NPE in handling KillTaskAction of a cleanup task
(HADOOP-5235). (Amareshwari Sriramadasu via ddas)
HADOOP-5247. Introduces a broadcast of KillJobAction to all trackers when
a job finishes. This fixes a bunch of problems to do with NPE when a
completed job is not in memory and a tasktracker comes to the jobtracker
with a status report of a task belonging to that job. (Amar Kamat via ddas)
HADOOP-5282. Fixed job history logs for task attempts that are
failed by the JobTracker, say due to lost task trackers. (Amar
Kamat via yhemanth)
HADOOP-4963. Fixes a logging to do with getting the location of
map output file. (Amareshwari Sriramadasu via ddas)
HADOOP-5292. Fix NPE in KFS::getBlockLocations. (Sriram Rao via lohit)
HADOOP-5241. Fixes a bug in disk-space resource estimation. Makes
the estimation formula linear where blowUp =
Total-Output/Total-Input. (Sharad Agarwal via ddas)
HADOOP-5142. Fix MapWritable#putAll to store key/value classes.
(Do??acan G??ney via enis)
HADOOP-4744. Workaround for jetty6 returning -1 when getLocalPort
is invoked on the connector. The workaround patch retries a few
times before failing. (Jothi Padmanabhan via yhemanth)
HADOOP-5280. Adds a check to prevent a task state transition from
FAILED to any of UNASSIGNED, RUNNING, COMMIT_PENDING or
SUCCEEDED. (ddas)
HADOOP-5272. Fixes a problem to do with detecting whether an
attempt is the first attempt of a Task. This affects JobTracker
restart. (Amar Kamat via ddas)
HADOOP-5306. Fixes a problem to do with logging/parsing the http port of a
lost tracker. Affects JobTracker restart. (Amar Kamat via ddas)
HADOOP-5111. Fix Job::set* methods to work with generics. (cdouglas)
HADOOP-5274. Fix gridmix2 dependency on wordcount example. (cdouglas)
HADOOP-5145. Balancer sometimes runs out of memory after running
days or weeks. (hairong)
HADOOP-5338. Fix jobtracker restart to clear task completion
events cached by tasktrackers forcing them to fetch all events
afresh, thus avoiding missed task completion events on the
tasktrackers. (Amar Kamat via yhemanth)
HADOOP-4695. Change TestGlobalFilter so that it allows a web page to be
filtered more than once for a single access. (Kan Zhang via szetszwo)
HADOOP-5298. Change TestServletFilter so that it allows a web page to be
filtered more than once for a single access. (szetszwo)
HADOOP-5432. Disable ssl during unit tests in hdfsproxy, as it is unused
and causes failures. (cdouglas)
HADOOP-5416. Correct the shell command "fs -test" forrest doc description.
(Ravi Phulari via szetszwo)
HADOOP-5327. Fixed job tracker to remove files from system directory on
ACL check failures and also check ACLs on restart.
(Amar Kamat via yhemanth)
HADOOP-5395. Change the exception message when a job is submitted to an
invalid queue. (Rahul Kumar Singh via yhemanth)
HADOOP-5276. Fixes a problem to do with updating the start time of
a task when the tracker that ran the task is lost. (Amar Kamat via
ddas)
HADOOP-5278. Fixes a problem to do with logging the finish time of
a task during recovery (after a JobTracker restart). (Amar Kamat
via ddas)
HADOOP-5490. Fixes a synchronization problem in the
EagerTaskInitializationListener class. (Jothi Padmanabhan via
ddas)
HADOOP-5493. The shuffle copier threads return the codecs back to
the pool when the shuffle completes. (Jothi Padmanabhan via ddas)
HADOOP-5505. Fix JspHelper initialization in the context of
MiniDFSCluster. (Raghu Angadi)
HADOOP-5414. Fixes IO exception while executing hadoop fs -touchz
fileName by making sure that lease renewal thread exits before dfs
client exits. (hairong)
HADOOP-5103. FileInputFormat now reuses the clusterMap network
topology object and that brings down the log messages in the
JobClient to do with NetworkTopology.add significantly. (Jothi
Padmanabhan via ddas)
HADOOP-5483. Fixes a problem in the Directory Cleanup Thread due to which
TestMiniMRWithDFS sometimes used to fail. (ddas)
HADOOP-5281. Prevent sharing incompatible ZlibCompressor instances between
GzipCodec and DefaultCodec. (cdouglas)
HADOOP-5463. Balancer throws "Not a host:port pair" unless port is
specified in fs.default.name. (Stuart White via hairong)
HADOOP-5514. Fix JobTracker metrics and add metrics for wating, failed
tasks. (cdouglas)
HADOOP-5516. Fix NullPointerException in TaskMemoryManagerThread
that comes when monitored processes disappear when the thread is
running. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5382. Support combiners in the new context object API. (omalley)
HADOOP-5471. Fixes a problem to do with updating the log.index file in the
case where a cleanup task is run. (Amareshwari Sriramadasu via ddas)
HADOOP-5534. Fixed a deadlock in Fair scheduler's servlet.
(Rahul Kumar Singh via yhemanth)
HADOOP-5328. Fixes a problem in the renaming of job history files during
job recovery. Amar Kamat via ddas)
HADOOP-5417. Don't ignore InterruptedExceptions that happen when calling
into rpc. (omalley)
HADOOP-5320. Add a close() in TestMapReduceLocal. (Jothi Padmanabhan
via szetszwo)
HADOOP-5520. Fix a typo in disk quota help message. (Ravi Phulari
via szetszwo)
HADOOP-5519. Remove claims from mapred-default.xml that prime numbers
of tasks are helpful. (Owen O'Malley via szetszwo)
HADOOP-5484. TestRecoveryManager fails wtih FileAlreadyExistsException.
(Amar Kamat via hairong)
HADOOP-5564. Limit the JVM heap size in the java command for initializing
JAVA_PLATFORM. (Suresh Srinivas via szetszwo)
HADOOP-5565. Add API for failing/finalized jobs to the JT metrics
instrumentation. (Jerome Boulon via cdouglas)
HADOOP-5390. Remove duplicate jars from tarball, src from binary tarball
added by hdfsproxy. (Zhiyong Zhang via cdouglas)
HADOOP-5066. Building binary tarball should not build docs/javadocs, copy
src, or run jdiff. (Giridharan Kesavan via cdouglas)
HADOOP-5459. Fix undetected CRC errors where intermediate output is closed
before it has been completely consumed. (cdouglas)
HADOOP-5571. Remove widening primitive conversion in TupleWritable mask
manipulation. (Jingkei Ly via cdouglas)
HADOOP-5588. Remove an unnecessary call to listStatus(..) in
FileSystem.globStatusInternal(..). (Hairong Kuang via szetszwo)
HADOOP-5473. Solves a race condition in killing a task - the state is KILLED
if there is a user request pending to kill the task and the TT reported
the state as SUCCESS. (Amareshwari Sriramadasu via ddas)
HADOOP-5576. Fix LocalRunner to work with the new context object API in
mapreduce. (Tom White via omalley)
HADOOP-4374. Installs a shutdown hook in the Task JVM so that log.index is
updated before the JVM exits. Also makes the update to log.index atomic.
(Ravi Gummadi via ddas)
HADOOP-5577. Add a verbose flag to mapreduce.Job.waitForCompletion to get
the running job's information printed to the user's stdout as it runs.
(omalley)
HADOOP-5607. Fix NPE in TestCapacityScheduler. (cdouglas)
HADOOP-5605. All the replicas incorrectly got marked as corrupt. (hairong)
HADOOP-5337. JobTracker, upon restart, now waits for the TaskTrackers to
join back before scheduling new tasks. This fixes race conditions associated
with greedy scheduling as was the case earlier. (Amar Kamat via ddas)
HADOOP-5227. Fix distcp so -update and -delete can be meaningfully
combined. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5305. Increase number of files and print debug messages in
TestCopyFiles. (szetszwo)
HADOOP-5548. Add synchronization for JobTracker methods in RecoveryManager.
(Amareshwari Sriramadasu via sharad)
HADOOP-3810. NameNode seems unstable on a cluster with little space left.
(hairong)
HADOOP-5068. Fix NPE in TestCapacityScheduler. (Vinod Kumar Vavilapalli
via szetszwo)
HADOOP-5585. Clear FileSystem statistics between tasks when jvm-reuse
is enabled. (omalley)
HADOOP-5394. JobTracker might schedule 2 attempts of the same task
with the same attempt id across restarts. (Amar Kamat via sharad)
HADOOP-5645. After HADOOP-4920 we need a place to checkin
releasenotes.html. (nigel)
Release 0.19.2 - Unreleased
BUG FIXES
HADOOP-5154. Fixes a deadlock in the fairshare scheduler.
(Matei Zaharia via yhemanth)
HADOOP-5146. Fixes a race condition that causes LocalDirAllocator to miss
files. (Devaraj Das via yhemanth)
HADOOP-4638. Fixes job recovery to not crash the job tracker for problems
with a single job file. (Amar Kamat via yhemanth)
HADOOP-5384. Fix a problem that DataNodeCluster creates blocks with
generationStamp == 1. (szetszwo)
HADOOP-5376. Fixes the code handling lost tasktrackers to set the task state
to KILLED_UNCLEAN only for relevant type of tasks.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5285. Fixes the issues - (1) obtainTaskCleanupTask checks whether job is
inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class
outside the TaskTracker and makes it a generic class that is used by the
JobTracker also for deleting the paths on the job's output fs. (3) Moves the
references to completedJobStore outside the block where the JobTracker is locked.
(ddas)
HADOOP-5392. Fixes a problem to do with JT crashing during recovery when
the job files are garbled. (Amar Kamat vi ddas)
HADOOP-5332. Appending to files is not allowed (by default) unless
dfs.support.append is set to true. (dhruba)
HADOOP-5333. libhdfs supports appending to files. (dhruba)
HADOOP-3998. Fix dfsclient exception when JVM is shutdown. (dhruba)
HADOOP-5440. Fixes a problem to do with removing a taskId from the list
of taskIds that the TaskTracker's TaskMemoryManager manages.
(Amareshwari Sriramadasu via ddas)
HADOOP-5446. Restore TaskTracker metrics. (cdouglas)
HADOOP-5449. Fixes the history cleaner thread.
(Amareshwari Sriramadasu via ddas)
HADOOP-5479. NameNode should not send empty block replication request to
DataNode. (hairong)
HADOOP-5259. Job with output hdfs:/user/<username>/outputpath (no
authority) fails with Wrong FS. (Doug Cutting via hairong)
HADOOP-5522. Documents the setup/cleanup tasks in the mapred tutorial.
(Amareshwari Sriramadasu via ddas)
HADOOP-5549. ReplicationMonitor should schedule both replication and
deletion work in one iteration. (hairong)
HADOOP-5554. DataNodeCluster and CreateEditsLog should create blocks with
the same generation stamp value. (hairong via szetszwo)
HADOOP-5231. Clones the TaskStatus before passing it to the JobInProgress.
(Amareshwari Sriramadasu via ddas)
HADOOP-4719. Fix documentation of 'ls' format for FsShell. (Ravi Phulari
via cdouglas)
HADOOP-5374. Fixes a NPE problem in getTasksToSave method.
(Amareshwari Sriramadasu via ddas)
HADOOP-4780. Cache the size of directories in DistributedCache, avoiding
long delays in recalculating it. (He Yongqiang via cdouglas)
HADOOP-5551. Prevent directory destruction on file create.
(Brian Bockelman via shv)
HADOOP-5671. Fix FNF exceptions when copying from old versions of
HftpFileSystem. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5579. Set errno correctly in libhdfs for permission, quota, and FNF
conditions. (Brian Bockelman via cdouglas)
HADOOP-5816. Fixes a problem in the KeyFieldBasedComparator to do with
ArrayIndexOutOfBounds exception. (He Yongqiang via ddas)
HADOOP-5951. Add Apache license header to StorageInfo.java. (Suresh
Srinivas via szetszwo)
Release 0.19.1 - 2009-02-23
IMPROVEMENTS
HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
mapred tutorial. (Vivek Ratan via cdouglas)
HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
BUG FIXES
HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
blocks correctly. (Sriram Rao via cdouglas)
HADOOP-4420. Add null checks for job, caused by invalid job IDs.
(Aaron Kimball via tomwhite)
HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
current workding directory for scratch space. (Amar Kamat via cdouglas)
HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
szetszwo)
HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
fuse-dfs. (Brian Bockelman via szetszwo)
HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
via szetszwo)
HADOOP-4821. Usage description in the Quotas guide documentations are
incorrect. (Boris Shkolnik via hairong)
HADOOP-4847. Moves the loading of OutputCommitter to the Task.
(Amareshwari Sriramadasu via ddas)
HADOOP-4966. Marks completed setup tasks for removal.
(Amareshwari Sriramadasu via ddas)
HADOOP-4982. TestFsck should run in Eclipse. (shv)
HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
fd unclosed. (hairong)
HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
(Zheng Shao via dhruba).
HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
tasks. (ddas)
HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
verification log unclosed. (hairong)
HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
HADOOP-4955. Make DBOutputFormat us column names from setOutput().
(Kevin Peterson via enis)
HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
spurious IOExceptions logged by DataNode. (Raghu Angadi)
HADOOP-5034. NameNode should send both replication and deletion requests
to DataNode in one reply to a heartbeat. (hairong)
HADOOP-5156. TestHeartbeatHandling uses MiiDFSCluster.getNamesystem()
which does not exit in branch 0.19 and 0.20. (hairong)
HADOOP-5161. Accepted sockets do not get placed in
DataXceiverServer#childSockets. (hairong)
HADOOP-5193. Correct calculation of edits modification time. (shv)
HADOOP-4494. Allow libhdfs to append to files.
(Pete Wyckoff via dhruba)
HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
for the JobTracker. (Amar Kamat via yhemanth).
HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
killed tasks correctly. (Amareshwari Sriramadasu via ddas)
HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
(enis)
Release 0.19.0 - 2008-11-18
INCOMPATIBLE CHANGES
HADOOP-3595. Remove deprecated methods for mapred.combine.once
functionality, which was necessary to providing backwards
compatible combiner semantics for 0.18. (cdouglas via omalley)
HADOOP-3667. Remove the following deprecated methods from JobConf:
addInputPath(Path)
getInputPaths()
getMapOutputCompressionType()
getOutputPath()
getSystemDir()
setInputPath(Path)
setMapOutputCompressionType(CompressionType style)
setOutputPath(Path)
(Amareshwari Sriramadasu via omalley)
HADOOP-3652. Remove deprecated class OutputFormatBase.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-2885. Break the hadoop.dfs package into separate packages under
hadoop.hdfs that reflect whether they are client, server, protocol,
etc. DistributedFileSystem and DFSClient have moved and are now
considered package private. (Sanjay Radia via omalley)
HADOOP-2325. Require Java 6. (cutting)
HADOOP-372. Add support for multiple input paths with a different
InputFormat and Mapper for each path. (Chris Smith via tomwhite)
HADOOP-1700. Support appending to file in HDFS. (dhruba)
HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
zero for true and non-zero for false. (Ben Slusky via cdouglas)
HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
which is no longer needed. (tomwhite via omalley)
HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
EACCES is returned for permission problems. (Ben Slusky via omalley)
HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
so increment the InterTrackerProtocol version. (Hemanth Yamijala via
omalley)
HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
committing output files. Moves job setup to jobclient, and moves jobcleanup
to a separate task. (Amareshwari Sriramadasu via ddas)
HADOOP-3446. Keep map outputs in memory during the reduce. Remove
fs.inmemory.size.mb and replace with properties defining in memory map
output retention during the shuffle and reduce relative to maximum heap
usage. (cdouglas)
HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
jobs can be recovered from the history file. The history file format has
been modified to support recovery. The task attempt ID now has the
JobTracker start time to disinguish attempts of the same TIP across
restarts. (Amar Ramesh Kamat via ddas)
HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
(Sanjay Radia via hairong)
HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
HADOOP-2816. Cluster summary at name node web reports the space
utilization as:
Configured Capacity: capacity of all the data directories - Reserved space
Present Capacity: Space available for dfs,i.e. remaining+used space
DFS Used%: DFS used space/Present Capacity
(Suresh Srinivas via hairong)
HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
quotas in 0.18. (rangadi)
HADOOP-4293. Make Configuration Writable and remove unreleased
WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
HADOOP-4281. Change dfsadmin to report available disk space in a format
consistent with the web interface as defined in HADOOP-2816. (Suresh
Srinivas via cdouglas)
HADOOP-4430. Further change the cluster summary at name node web that was
changed in HADOOP-2816:
Non DFS Used - This indicates the disk space taken by non DFS file from
the Configured capacity
DFS Used % - DFS Used % of Configured Capacity
DFS Remaining % - Remaing % Configured Capacity available for DFS use
DFS command line report reflects the same change. Config parameter
dfs.datanode.du.pct is no longer used and is removed from the
hadoop-default.xml. (Suresh Srinivas via hairong)
HADOOP-4116. Balancer should provide better resource management. (hairong)
HADOOP-4599. BlocksMap and BlockInfo made package private. (shv)
NEW FEATURES
HADOOP-3341. Allow streaming jobs to specify the field separator for map
and reduce input and output. The new configuration values are:
stream.map.input.field.separator
stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
All of them default to "\t". (Zheng Shao via omalley)
HADOOP-3479. Defines the configuration file for the resource manager in
Hadoop. You can configure various parameters related to scheduling, such
as queues and queue properties here. The properties for a queue follow a
naming convention,such as, hadoop.rm.queue.queue-name.property-name.
(Hemanth Yamijala via ddas)
HADOOP-3149. Adds a way in which map/reducetasks can create multiple
outputs. (Alejandro Abdelnur via ddas)
HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
bash tab completion for the bin/hadoop script. See the README file
in the contrib directory for the installation. (Chris Smith via enis)
HADOOP-3730. Adds a new JobConf constructor that disables loading
default configurations. (Alejandro Abdelnur via ddas)
HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
(Ari Rabkin via acmurthy)
HADOOP-2302. Provides a comparator for numerical sorting of key fields.
(ddas)
HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
HADOOP-657. Free disk space should be modelled and used by the scheduler
to make scheduling decisions. (Ari Rabkin via omalley)
HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
and Eric Yang)
HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
number of files/bytes copied in a particular run to support incremental
updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3585. FailMon package for hardware failure monitoring and
analysis of anomalies. (Ioannis Koltsidas via dhruba)
HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3759. Provides ability to run memory intensive jobs without
affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
HADOOP-3828. Provides a way to write skipped records to DFS.
(Sharad Agarwal via ddas)
HADOOP-3948. Separate name-node edits and fsimage directories.
(Lohit Vijayarenu via shv)
HADOOP-3939. Add an option to DistCp to delete files at the destination
not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
(Craig Weisenfluh via omalley)
HADOOP-3698. Add access control to control who is allowed to submit or
modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
HADOOP-1869. Support access times for HDFS files. (dhruba)
HADOOP-3941. Extend FileSystem API to return file-checksums.
(szetszwo)
HADOOP-3581. Prevents memory intensive user tasks from taking down
nodes. (Vinod K V via ddas)
HADOOP-3970. Provides a way to recover counters written to JobHistory.
(Amar Kamat via ddas)
HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
chains of Maps and Reduces in a single Map/Reduce job, something like
MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
queues as a percentage of the cluster. (Vivek Ratan via omalley)
HADOOP-3992. Add a synthetic load generation facility to the test
directory. (hairong via szetszwo)
HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
and change DistCp to use file checksum for comparing src and dst files
(szetszwo)
HADOOP-3829. Narrown down skipped records based on user acceptable value.
(Sharad Agarwal via ddas)
HADOOP-3930. Add common interfaces for the pluggable schedulers and the
cli & gui clients. (Sreekanth Ramakrishnan via omalley)
HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
Configuration changes to hadoop-default.xml:
add mapred.job.reuse.jvm.num.tasks
(Devaraj Das via acmurthy)
HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
query language. (tomwhite)
HADOOP-2536. Implement a JDBC based database input and output formats to
allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
Enis Soztutar via acmurthy)
HADOOP-3019. A new library to support total order partitions.
(cdouglas via omalley)
HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
acmurthy)
IMPROVEMENTS
HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
(zshao)
HADOOP-4106. libhdfs: add time, permission and user attribute support
(part 2). (Pete Wyckoff through zshao)
HADOOP-4104. libhdfs: add time, permission and user attribute support.
(Pete Wyckoff through zshao)
HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
(Pete Wyckoff through zshao)
HADOOP-3732. Delay intialization of datanode block verification till
the verification thread is started. (rangadi)
HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
(rangadi)
HADOOP-3577. Tools to inject blocks into name node and simulated
data nodes for testing. (Sanjay Radia via hairong)
HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. (cdouglas via omalley)
HADOOP-3655. Add additional ant properties to control junit. (Steve
Loughran via omalley)
HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
HADOOP-3587. Add a unit test for the contrib/data_join framework.
(cdouglas)
HADOOP-3402. Add terasort example program (omalley)
HADOOP-3660. Add replication factor for injecting blocks in simulated
datanodes. (Sanjay Radia via cdouglas)
HADOOP-3684. Add a cloning function to the contrib/data_join framework
permitting users to define a more efficient method for cloning values from
the reduce than serialization/deserialization. (Runping Qi via cdouglas)
HADOOP-3478. Improves the handling of map output fetching. Now the
randomization is by the hosts (and not the map outputs themselves).
(Jothi Padmanabhan via ddas)
HADOOP-3617. Removed redundant checks of accounting space in MapTask and
makes the spill thread persistent so as to avoid creating a new one for
each spill. (Chris Douglas via acmurthy)
HADOOP-3412. Factor the scheduler out of the JobTracker and make
it pluggable. (Tom White and Brice Arnould via omalley)
HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
hadoop-default.xml. (rangadi)
HADOOP-3747. Adds counter suport for MultipleOutputs.
(Alejandro Abdelnur via ddas)
HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
constructor. (TszWo (Nicholas), SZE via hairong)
HADOOP-3824. Move base functionality of StatusHttpServer to a core
package. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
may be processed by map/reduce. (Abdul Qadeer via cdouglas)
HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
(tomwhite via omalley)
HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
cdouglas)
HADOOP-3694. Improve unit test performance by changing
MiniDFSCluster to listen only on 127.0.0.1. (cutting)
HADOOP-3620. Namenode should synchronously resolve a datanode's network
location when the datanode registers. (hairong)
HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
benchmarks. (shv)
HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
HADOOP-3875. Change the time period between heartbeats to be relative to
the end of the heartbeat rpc, rather than the start. This causes better
behavior if the JobTracker is overloaded. (acmurthy via omalley)
HADOOP-3853. Move multiple input format (HADOOP-372) extension to
library package. (tomwhite via johan)
HADOOP-9. Use roulette scheduling for temporary space when the size
is not known. (Ari Rabkin via omalley)
HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
(Amareshwari Sriramadasu via omalley)
HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
via omalley)
HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
via omalley)
HADOOP-3780. Remove asynchronous resolution of network topology in the
JobTracker (Amar Kamat via omalley)
HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
error messages. (Steve Loughran via omalley)
HADOOP-3844. Include message of local exception in RPC client failures.
(Steve Loughran via omalley)
HADOOP-3935. Split out inner classes from DataNode.java. (johan)
HADOOP-3905. Create generic interfaces for edit log streams. (shv)
HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling. (cdouglas)
HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
generative javadoc for developers. (Sanjay Radia via omalley)
HADOOP-3944. Improve documentation for public TupleWritable class in
join package. (Chris Douglas via enis)
HADOOP-2330. Preallocate HDFS transaction log to improve performance.
(dhruba and hairong)
HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
Groshupf and Craig Macdonald via omalley)
HADOOP-3342. Change the kill task actions to require http post instead of
get to prevent accidental crawls from triggering it. (enis via omalley)
HADOOP-3937. Limit the job name in the job history filename to 50
characters. (Matei Zaharia via omalley)
HADOOP-3943. Remove unnecessary synchronization in
NetworkTopology.pseudoSortByDistance. (hairong via omalley)
HADOOP-3498. File globbing alternation should be able to span path
components. (tomwhite)
HADOOP-3361. Implement renames for NativeS3FileSystem.
(Albert Chern via tomwhite)
HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
unset. (Al Hoang via tomwhite)
HADOOP-4147. Remove unused class JobWithTaskContext from class
JobInProgress. (Amareshwari Sriramadasu via johan)
HADOOP-4151. Add a byte-comparable interface that both Text and
BytesWritable implement. (cdouglas via omalley)
HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
NamenodeProtocol. (shv via szetszwo)
HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
developing under git. (omalley)
HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
omalley)
HADOOP-4184. Break the module dependencies between core, hdfs, and
mapred. (tomwhite via omalley)
HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
(Ramya R via nigel)
HADOOP-4117. Improve configurability of Hadoop EC2 instances.
(tomwhite)
HADOOP-2411. Add support for larger CPU EC2 instance types.
(Chris K Wensel via tomwhite)
HADOOP-4083. Changed the configuration attribute queue.name to
mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
HADOOP-4194. Added the JobConf and JobID to job-related methods in
JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
HADOOP-3975. Change test-patch script to report working the dir
modifications preventing the suite from being run. (Ramya R via cdouglas)
HADOOP-4124. Added a command-line switch to allow users to set job
priorities, also allow it to be manipulated via the web-ui. (Hemanth
Yamijala via acmurthy)
HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
userlogs. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4062. Remove the synchronization on the output stream when a
connection is closed and also remove an undesirable exception when
a client is stoped while there is no pending RPC request. (hairong)
HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
(szetszwo)
HADOOP-4006. Clean up FSConstants and move some of the constants to
better places. (Sanjay Radia via rangadi)
HADOOP-4279. Trace the seeds of random sequences in append unit tests to
make itermitant failures reproducible. (szetszwo via cdouglas)
HADOOP-4209. Remove the change to the format of task attempt id by
incrementing the task attempt numbers by 1000 when the job restarts.
(Amar Kamat via omalley)
HADOOP-4301. Adds forrest doc for the skip bad records feature.
(Sharad Agarwal via ddas)
HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
(szetszwo)
HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
Yamijala via omalley)
HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
via cdouglas).
HADOOP-4438. Update forrest documentation to include missing FsShell
commands. (Suresh Srinivas via cdouglas)
HADOOP-4105. Add forrest documentation for libhdfs.
(Pete Wyckoff via cutting)
HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
OPTIMIZATIONS
HADOOP-3556. Removed lock contention in MD5Hash by changing the
singleton MessageDigester by an instance per Thread using
ThreadLocal. (Iv?n de Prado via omalley)
HADOOP-3328. When client is writing data to DFS, only the last
datanode in the pipeline needs to verify the checksum. Saves around
30% CPU on intermediate datanodes. (rangadi)
HADOOP-3863. Use a thread-local string encoder rather than a static one
that is protected by a lock. (acmurthy via omalley)
HADOOP-3864. Prevent the JobTracker from locking up when a job is being
initialized. (acmurthy via omalley)
HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
HADOOP-2130. Pipes submit job should have both blocking and non-blocking
versions. (acmurthy via omalley)
HADOOP-3769. Make the SampleMapper and SampleReducer from
GenericMRLoadGenerator public, so they can be used in other contexts.
(Lingyun Yang via omalley)
HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
it from a different .crc file. (Jothi Padmanabhan via ddas)
HADOOP-3638. Caches the iFile index files in memory to reduce seeks
(Jothi Padmanabhan via ddas)
HADOOP-4225. FSEditLog.logOpenFile() should persist accessTime
rather than modificationTime. (shv)
HADOOP-4380. Made several new classes (Child, JVMId,
JobTrackerInstrumentation, QueueManager, ResourceEstimator,
TaskTrackerInstrumentation, and TaskTrackerMetricsInst) in
org.apache.hadoop.mapred package private instead of public. (omalley)
BUG FIXES
HADOOP-3563. Refactor the distributed upgrade code so that it is
easier to identify datanode and namenode related code. (dhruba)
HADOOP-3640. Fix the read method in the NativeS3InputStream. (tomwhite via
omalley)
HADOOP-3711. Fixes the Streaming input parsing to properly find the
separator. (Amareshwari Sriramadasu via ddas)
HADOOP-3725. Prevent TestMiniMRMapDebugScript from swallowing exceptions.
(Steve Loughran via cdouglas)
HADOOP-3726. Throw exceptions from TestCLI setup and teardown instead of
swallowing them. (Steve Loughran via cdouglas)
HADOOP-3721. Refactor CompositeRecordReader and related mapred.join classes
to make them clearer. (cdouglas)
HADOOP-3720. Re-read the config file when dfsadmin -refreshNodes is invoked
so dfs.hosts and dfs.hosts.exclude are observed. (lohit vijayarenu via
cdouglas)
HADOOP-3485. Allow writing to files over fuse.
(Pete Wyckoff via dhruba)
HADOOP-3723. The flags to the libhdfs.create call can be treated as
a bitmask. (Pete Wyckoff via dhruba)
HADOOP-3643. Filter out completed tasks when asking for running tasks in
the JobTracker web/ui. (Amar Kamat via omalley)
HADOOP-3777. Ensure that Lzo compressors/decompressors correctly handle the
case where native libraries aren't available. (Chris Douglas via acmurthy)
HADOOP-3728. Fix SleepJob so that it doesn't depend on temporary files,
this ensures we can now run more than one instance of SleepJob
simultaneously. (Chris Douglas via acmurthy)
HADOOP-3795. Fix saving image files on Namenode with different checkpoint
stamps. (Lohit Vijayarenu via mahadev)
HADOOP-3624. Improving createeditslog to create tree directory structure.
(Lohit Vijayarenu via mahadev)
HADOOP-3778. DFSInputStream.seek() did not retry in case of some errors.
(LN via rangadi)
HADOOP-3661. The handling of moving files deleted through fuse-dfs to
Trash made similar to the behaviour from dfs shell.
(Pete Wyckoff via dhruba)
HADOOP-3819. Unset LANG and LC_CTYPE in saveVersion.sh to make it
compatible with non-English locales. (Rong-En Fan via cdouglas)
HADOOP-3848. Cache calls to getSystemDir in the TaskTracker instead of
calling it for each task start. (acmurthy via omalley)
HADOOP-3131. Fix reduce progress reporting for compressed intermediate
data. (Matei Zaharia via acmurthy)
HADOOP-3796. fuse-dfs configuration is implemented as file system
mount options. (Pete Wyckoff via dhruba)
HADOOP-3836. Fix TestMultipleOutputs to correctly clean up. (Alejandro
Abdelnur via acmurthy)
HADOOP-3805. Improve fuse-dfs write performance.
(Pete Wyckoff via zshao)
HADOOP-3846. Fix unit test CreateEditsLog to generate paths correctly.
(Lohit Vjayarenu via cdouglas)
HADOOP-3904. Fix unit tests using the old dfs package name.
(TszWo (Nicholas), SZE via johan)
HADOOP-3319. Fix some HOD error messages to go stderr instead of
stdout. (Vinod Kumar Vavilapalli via omalley)
HADOOP-3907. Move INodeDirectoryWithQuota to its own .java file.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3919. Fix attribute name in hadoop-default for
mapred.jobtracker.instrumentation. (Ari Rabkin via omalley)
HADOOP-3903. Change the package name for the servlets to be hdfs instead of
dfs. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3773. Change Pipes to set the default map output key and value
types correctly. (Koji Noguchi via omalley)
HADOOP-3952. Fix compilation error in TestDataJoin referencing dfs package.
(omalley)
HADOOP-3951. Fix package name for FSNamesystem logs and modify other
hard-coded Logs to use the class name. (cdouglas)
HADOOP-3889. Improve error reporting from HftpFileSystem, handling in
DistCp. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3946. Fix TestMapRed after hadoop-3664. (tomwhite via omalley)
HADOOP-3949. Remove duplicate jars from Chukwa. (Jerome Boulon via omalley)
HADOOP-3933. DataNode sometimes sends up to io.byte.per.checksum bytes
more than required to client. (Ning Li via rangadi)
HADOOP-3962. Shell command "fs -count" should support paths with different
file systems. (Tsz Wo (Nicholas), SZE via mahadev)
HADOOP-3957. Fix javac warnings in DistCp and TestCopyFiles. (Tsz Wo
(Nicholas), SZE via cdouglas)
HADOOP-3958. Fix TestMapRed to check the success of test-job. (omalley via
acmurthy)
HADOOP-3985. Fix TestHDFSServerPorts to use random ports. (Hairong Kuang
via omalley)
HADOOP-3964. Fix javadoc warnings introduced by FailMon. (dhruba)
HADOOP-3785. Fix FileSystem cache to be case-insensitive for scheme and
authority. (Bill de hOra via cdouglas)
HADOOP-3506. Fix a rare NPE caused by error handling in S3. (Tom White via
cdouglas)
HADOOP-3705. Fix mapred.join parser to accept InputFormats named with
underscore and static, inner classes. (cdouglas)
HADOOP-4023. Fix javadoc warnings introduced when the HDFS javadoc was
made private. (omalley)
HADOOP-4030. Remove lzop from the default list of codecs. (Arun Murthy via
cdouglas)
HADOOP-3961. Fix task disk space requirement estimates for virtual
input jobs. Delays limiting task placement until after 10% of the maps
have finished. (Ari Rabkin via omalley)
HADOOP-2168. Fix problem with C++ record reader's progress not being
reported to framework. (acmurthy via omalley)
HADOOP-3966. Copy findbugs generated output files to PATCH_DIR while
running test-patch. (Ramya R via lohit)
HADOOP-4037. Fix the eclipse plugin for versions of kfs and log4j. (nigel
via omalley)
HADOOP-3950. Cause the Mini MR cluster to wait for task trackers to
register before continuing. (enis via omalley)
HADOOP-3910. Remove unused ClusterTestDFSNamespaceLogging and
ClusterTestDFS. (Tsz Wo (Nicholas), SZE