blob: bd292c99a6ad1a7b29315d9253a4b92db0f1cc14 [file] [log] [blame]
Hadoop Change Log
Release 0.20.3 - Unreleased
Release 0.20.2 - 2010-2-18
NEW FEATURES
HADOOP-6218. Adds a feature where TFile can be split by Record
Sequence number. (Hong Tang and Raghu Angadi via ddas)
BUG FIXES
MAPREDUCE-112. Add counters for reduce input, output records to the new API.
(Jothi Padmanabhan via cdouglas)
HADOOP-6231. Allow caching of filesystem instances to be disabled on a
per-instance basis (Tom White and Ben Slusky via mahadev)
MAPREDUCE-826. harchive doesn't use ToolRunner / harchive returns 0 even
if the job fails with exception (koji via mahadev)
MAPREDUCE-979. Fixed JobConf APIs related to memory parameters to return
values of new configuration variables when deprecated variables are
disabled. (Sreekanth Ramakrishnan via yhemanth)
HDFS-686. NullPointerException is thrown while merging edit log and image.
(hairong)
HDFS-677. Rename failure when both source and destination quota exceeds
results in deletion of source. (suresh)
HDFS-709. Fix TestDFSShell failure due to rename bug introduced by
HDFS-677. (suresh)
HDFS-579. Fix DfsTask to follow the semantics of 0.19, regarding non-zero
return values as failures. (Christian Kunz via cdouglas)
MAPREDUCE-1070. Prevent a deadlock in the fair scheduler servlet.
(Todd Lipcon via cdouglas)
HADOOP-5759. Fix for IllegalArgumentException when CombineFileInputFormat
is used as job InputFormat. (Amareshwari Sriramadasu via zshao)
HADOOP-6097. Fix Path conversion in makeQualified and reset LineReader byte
count at the start of each block in Hadoop archives. (Ben Slusky, Tom
White, and Mahadev Konar via cdouglas)
HDFS-723. Fix deadlock in DFSClient#DFSOutputStream. (hairong)
HDFS-732. DFSClient.DFSOutputStream.close() should throw an exception if
the stream cannot be closed successfully. (szetszwo)
MAPREDUCE-1163. Remove unused, hard-coded paths from libhdfs. (Allen
Wittenauer via cdouglas)
HDFS-761. Fix failure to process rename operation from edits log due to
quota verification. (suresh)
MAPREDUCE-623. Resolve javac warnings in mapreduce. (Jothi Padmanabhan
via sharad)
HADOOP-6575. Remove call to fault injection tests not present in 0.20.
(cdouglas)
IMPROVEMENTS
HADOOP-5611. Fix C++ libraries to build on Debian Lenny. (Todd Lipcon
via tomwhite)
MAPREDUCE-1068. Fix streaming job to show proper message if file is
is not present. (Amareshwari Sriramadasu via sharad)
HDFS-596. Fix memory leak in hdfsFreeFileInfo() for libhdfs.
(Zhang Bingjun via dhruba)
MAPREDUCE-1147. Add map output counters to new API. (Amar Kamat via
cdouglas)
HADOOP-6269. Fix threading issue with defaultResource in Configuration.
(Sreekanth Ramakrishnan via cdouglas)
MAPREDUCE-1182. Fix overflow in reduce causing allocations to exceed the
configured threshold. (cdouglas)
HADOOP-6386. NameNode's HttpServer can't instantiate InetSocketAddress:
IllegalArgumentException is thrown. (cos)
HDFS-185. Disallow chown, chgrp, chmod, setQuota, and setSpaceQuota when
name-node is in safemode. (Ravi Phulari via shv)
HADOOP-6428. HttpServer sleeps with negative values (cos)
HADOOP-5623. Fixes a problem to do with status messages getting overwritten
in streaming jobs. (Rick Cox and Jothi Padmanabhan via tomwhite)
HADOOP-6315. Avoid incorrect use of BuiltInflater/BuiltInDeflater in
GzipCodec. (Aaron Kimball via cdouglas)
HDFS-187. Initialize secondary namenode http address in TestStartup.
(Todd Lipcon via szetszwo)
MAPREDUCE-433. Use more reliable counters in TestReduceFetch. (cdouglas)
HDFS-792. DFSClient 0.20.1 is incompatible with HDFS 0.20.2.
(Tod Lipcon via hairong)
HADOOP-6498. IPC client bug may cause rpc call hang. (Ruyue Ma and
hairong via hairong)
HADOOP-6596. Failing tests prevent the rest of test targets from
execution. (cos)
HADOOP-6524. Contrib tests are failing Clover'ed build. (cos)
HDFS-919. Create test to validate the BlocksVerified metric (Gary Murry
via cos)
HDFS-907. Add tests for getBlockLocations and totalLoad metrics.
(Ravi Phulari via cos)
MAPREDUCE-1251. c++ utils doesn't compile. (Eli Collins via tomwhite)
HADOOP-5612. Some c++ scripts are not chmodded before ant execution.
(Todd Lipcon via tomwhite)
Release 0.20.1 - 2009-09-01
INCOMPATIBLE CHANGES
HADOOP-5726. Remove pre-emption from capacity scheduler code base.
(Rahul Kumar Singh via yhemanth)
HADOOP-5881. Simplify memory monitoring and scheduling related
configuration. (Vinod Kumar Vavilapalli via yhemanth)
NEW FEATURES
HADOOP-6080. Introduce -skipTrash option to rm and rmr.
(Jakob Homan via shv)
HADOOP-3315. Add a new, binary file foramt, TFile. (Hong Tang via cdouglas)
IMPROVEMENTS
HADOOP-5711. Change Namenode file close log to info. (szetszwo)
HADOOP-5736. Update the capacity scheduler documentation for features
like memory based scheduling, job initialization and removal of pre-emption.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4674. Fix fs help messages for -test, -text, -tail, -stat
and -touchz options. (Ravi Phulari via szetszwo)
HADOOP-4372. Improves the way history filenames are obtained and manipulated.
(Amar Kamat via ddas)
HADOOP-5897. Add name-node metrics to capture java heap usage.
(Suresh Srinivas via shv)
HDFS-438. Improve help message for space quota command. (Raghu Angadi)
MAPREDUCE-767. Remove the dependence on the CLI 2.0 snapshot.
(Amar Kamat via ddas)
OPTIMIZATIONS
BUG FIXES
HADOOP-5691. Makes org.apache.hadoop.mapreduce.Reducer concrete class
instead of abstract. (Amareshwari Sriramadasu via sharad)
HADOOP-5646. Fixes a problem in TestQueueCapacities.
(Vinod Kumar Vavilapalli via ddas)
HADOOP-5655. TestMRServerPorts fails on java.net.BindException. (Devaraj
Das via hairong)
HADOOP-5654. TestReplicationPolicy.<init> fails on java.net.BindException.
(hairong)
HADOOP-5688. Fix HftpFileSystem checksum path construction. (Tsz Wo
(Nicholas) Sze via cdouglas)
HADOOP-5213. Fix Null pointer exception caused when bzip2compression
was used and user closed a output stream without writing any data.
(Zheng Shao via dhruba)
HADOOP-5718. Remove the check for the default queue in capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5719. Remove jobs that failed initialization from the waiting queue
in the capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4744. Attaching another fix to the jetty port issue. The TaskTracker
kills itself if it ever discovers that the port to which jetty is actually
bound is invalid (-1). (ddas)
HADOOP-5349. Fixes a problem in LocalDirAllocator to check for the return
path value that is returned for the case where the file we want to write
is of an unknown size. (Vinod Kumar Vavilapalli via ddas)
HADOOP-5636. Prevents a job from going to RUNNING state after it has been
KILLED (this used to happen when the SetupTask would come back with a
success after the job has been killed). (Amar Kamat via ddas)
HADOOP-5641. Fix a NullPointerException in capacity scheduler's memory
based scheduling code when jobs get retired. (yhemanth)
HADOOP-5828. Use absolute path for mapred.local.dir of JobTracker in
MiniMRCluster. (yhemanth)
HADOOP-4981. Fix capacity scheduler to schedule speculative tasks
correctly in the presence of High RAM jobs.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5210. Solves a problem in the progress report of the reduce task.
(Ravi Gummadi via ddas)
HADOOP-5850. Fixes a problem to do with not being able to jobs with
0 maps/reduces. (Vinod K V via ddas)
HADOOP-5728. Fixed FSEditLog.printStatistics IndexOutOfBoundsException.
(Wang Xu via johan)
HADOOP-4626. Correct the API links in hdfs forrest doc so that they
point to the same version of hadoop. (szetszwo)
HADOOP-5883. Fixed tasktracker memory monitoring to account for
momentary spurts in memory usage due to java's fork() model.
(yhemanth)
HADOOP-5539. Fixes a problem to do with not preserving intermediate
output compression for merged data.
(Jothi Padmanabhan and Billy Pearson via ddas)
HADOOP-5932. Fixes a problem in capacity scheduler in computing
available memory on a tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5648. Fixes a build issue in not being able to generate gridmix.jar
in hadoop binary tarball. (Giridharan Kesavan via gkesavan)
HADOOP-5908. Fixes a problem to do with ArithmeticException in the
JobTracker when there are jobs with 0 maps. (Amar Kamat via ddas)
HADOOP-5924. Fixes a corner case problem to do with job recovery with
empty history files. Also, after a JT restart, sends KillTaskAction to
tasks that report back but the corresponding job hasn't been initialized
yet. (Amar Kamat via ddas)
HADOOP-5882. Fixes a reducer progress update problem for new mapreduce
api. (Amareshwari Sriramadasu via sharad)
HADOOP-5746. Fixes a corner case problem in Streaming, where if an
exception happens in MROutputThread after the last call to the map/reduce
method, the exception goes undetected. (Amar Kamat via ddas)
HADOOP-5884. Fixes accounting in capacity scheduler so that high RAM jobs
take more slots. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5937. Correct a safemode message in FSNamesystem. (Ravi Phulari
via szetszwo)
HADOOP-5869. Fix bug in assignment of setup / cleanup task that was
causing TestQueueCapacities to fail.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5921. Fixes a problem in the JobTracker where it sometimes never
used to come up due to a system file creation on JobTracker's system-dir
failing. This problem would sometimes show up only when the FS for the
system-dir (usually HDFS) is started at nearly the same time as the
JobTracker. (Amar Kamat via ddas)
HADOOP-5920. Fixes a testcase failure for TestJobHistory.
(Amar Kamat via ddas)
HDFS-26. Better error message to users when commands fail because of
lack of quota. Allow quota to be set even if the limit is lower than
current consumption. (Boris Shkolnik via rangadi)
MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty
keys. (Amar Kamat via sharad)
MAPREDUCE-130. Delete the jobconf copy from the log directory of the
JobTracker when the job is retired. (Amar Kamat via sharad)
MAPREDUCE-657. Fix hardcoded filesystem problem in CompletedJobStatusStore.
(Amar Kamat via sharad)
MAPREDUCE-179. Update progress in new RecordReaders. (cdouglas)
MAPREDUCE-124. Fix a bug in failure handling of abort task of
OutputCommiter. (Amareshwari Sriramadasu via sharad)
HADOOP-6139. Fix the FsShell help messages for rm and rmr. (Jakob Homan
via szetszwo)
HADOOP-6141. Fix a few bugs in 0.20 test-patch.sh. (Hong Tang via
szetszwo)
HADOOP-6145. Fix FsShell rm/rmr error messages when there is a FNFE.
(Jakob Homan via szetszwo)
MAPREDUCE-565. Fix partitioner to work with new API. (Owen O'Malley via
cdouglas)
MAPREDUCE-465. Fix a bug in MultithreadedMapRunner. (Amareshwari
Sriramadasu via sharad)
MAPREDUCE-18. Puts some checks to detect cases where jetty serves up
incorrect output during shuffle. (Ravi Gummadi via ddas)
MAPREDUCE-735. Fixes a problem in the KeyFieldHelper to do with
the end index for some inputs (Amar Kamat via ddas)
HADOOP-6150. Users should be able to instantiate comparator using TFile
API. (Hong Tang via rangadi)
MAPREDUCE-383. Fix a bug in Pipes combiner due to bytes count not
getting reset after the spill. (Christian Kunz via sharad)
MAPREDUCE-40. Keep memory management backwards compatible for job
configuration parameters and limits. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-796. Fixes a ClassCastException in an exception log in
MultiThreadedMapRunner. (Amar Kamat via ddas)
MAPREDUCE-838. Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would
be declared as successful. (Amareshwari Sriramadasu via ddas)
MAPREDUCE-805. Fixes some deadlocks in the JobTracker due to the fact
the JobTracker lock hierarchy wasn't maintained in some JobInProgress
method calls. (Amar Kamat via ddas)
HDFS-167. Fix a bug in DFSClient that caused infinite retries on write.
(Bill Zeller via szetszwo)
HDFS-527. Remove unnecessary DFSClient constructors. (szetszwo)
MAPREDUCE-832. Reduce number of warning messages printed when
deprecated memory variables are used. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-745. Fixes a testcase problem to do with generation of JobTracker
IDs. (Amar Kamat via ddas)
MAPREDUCE-834. Enables memory management on tasktrackers when old
memory management parameters are used in configuration.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-818. Fixes Counters#getGroup API. (Amareshwari Sriramadasu
via sharad)
MAPREDUCE-807. Handles the AccessControlException during the deletion of
mapred.system.dir in the JobTracker. The JobTracker will bail out if it
encounters such an exception. (Amar Kamat via ddas)
HADOOP-6213. Remove commons dependency on commons-cli2. (Amar Kamat via
sharad)
MAPREDUCE-430. Fix a bug related to task getting stuck in case of
OOM error. (Amar Kamat via ddas)
HADOOP-6215. fix GenericOptionParser to deal with -D with '=' in the
value. (Amar Kamat via sharad)
MAPREDUCE-421. Fix Pipes to use returned system exit code.
(Christian Kunz via omalley)
HDFS-525. The SimpleDateFormat object in ListPathsServlet is not thread
safe. (Suresh Srinivas and cdouglas)
MAPREDUCE-911. Fix a bug in TestTaskFail related to speculative
execution. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-687. Fix an assertion in TestMiniMRMapRedDebugScript.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-924. Fixes the TestPipes testcase to use Tool.
(Amareshwari Sriramadasu via sharad)
Release 0.20.0 - 2009-04-15
INCOMPATIBLE CHANGES
HADOOP-4210. Fix findbugs warnings for equals implementations of mapred ID
classes. Removed public, static ID::read and ID::forName; made ID an
abstract class. (Suresh Srinivas via cdouglas)
HADOOP-4253. Fix various warnings generated by findbugs.
Following deprecated methods in RawLocalFileSystem are removed:
public String getName()
public void lock(Path p, boolean shared)
public void release(Path p)
(Suresh Srinivas via johan)
HADOOP-4618. Move http server from FSNamesystem into NameNode.
FSNamesystem.getNameNodeInfoPort() is removed.
FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort()
replaced by FSNamesystem.getDFSNameNodeAddress().
NameNode(bindAddress, conf) is removed.
(shv)
HADOOP-4567. GetFileBlockLocations returns the NetworkTopology
information of the machines where the blocks reside. (dhruba)
HADOOP-4435. The JobTracker WebUI displays the amount of heap memory
in use. (dhruba)
HADOOP-4628. Move Hive into a standalone subproject. (omalley)
HADOOP-4188. Removes task's dependency on concrete filesystems.
(Sharad Agarwal via ddas)
HADOOP-1650. Upgrade to Jetty 6. (cdouglas)
HADOOP-3986. Remove static Configuration from JobClient. (Amareshwari
Sriramadasu via cdouglas)
JobClient::setCommandLineConfig is removed
JobClient::getCommandLineConfig is removed
JobShell, TestJobShell classes are removed
HADOOP-4422. S3 file systems should not create bucket.
(David Phillips via tomwhite)
HADOOP-4035. Support memory based scheduling in capacity scheduler.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-3497. Fix bug in overly restrictive file globbing with a
PathFilter. (tomwhite)
HADOOP-4445. Replace running task counts with running task
percentage in capacity scheduler UI. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4631. Splits the configuration into three parts - one for core,
one for mapred and the last one for HDFS. (Sharad Agarwal via cdouglas)
HADOOP-3344. Fix libhdfs build to use autoconf and build the same
architecture (32 vs 64 bit) of the JVM running Ant. The libraries for
pipes, utils, and libhdfs are now all in c++/<os_osarch_jvmdatamodel>/lib.
(Giridharan Kesavan via nigel)
HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
HADOOP-4970. The full path name of a file is preserved inside Trash.
(Prasad Chakka via dhruba)
HADOOP-4103. NameNode keeps a count of missing blocks. It warns on
WebUI if there are such blocks. '-report' and '-metaSave' have extra
info to track such blocks. (Raghu Angadi)
HADOOP-4783. Change permissions on history files on the jobtracker
to be only group readable instead of world readable.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5531. Removed Chukwa from Hadoop 0.20.0. (nigel)
NEW FEATURES
HADOOP-4575. Add a proxy service for relaying HsftpFileSystem requests.
Includes client authentication via user certificates and config-based
access control. (Kan Zhang via cdouglas)
HADOOP-4661. Add DistCh, a new tool for distributed ch{mod,own,grp}.
(szetszwo)
HADOOP-4709. Add several new features and bug fixes to Chukwa.
Added Hadoop Infrastructure Care Center (UI for visualize data collected
by Chukwa)
Added FileAdaptor for streaming small file in one chunk
Added compression to archive and demux output
Added unit tests and validation for agent, collector, and demux map
reduce job
Added database loader for loading demux output (sequence file) to jdbc
connected database
Added algorithm to distribute collector load more evenly
(Jerome Boulon, Eric Yang, Andy Konwinski, Ariel Rabkin via cdouglas)
HADOOP-4179. Add Vaidya tool to analyze map/reduce job logs for performanc
problems. (Suhas Gogate via omalley)
HADOOP-4029. Add NameNode storage information to the dfshealth page and
move DataNode information to a separated page. (Boris Shkolnik via
szetszwo)
HADOOP-4348. Add service-level authorization for Hadoop. (acmurthy)
HADOOP-4826. Introduce admin command saveNamespace. (shv)
HADOOP-3063 BloomMapFile - fail-fast version of MapFile for sparsely
populated key space (Andrzej Bialecki via stack)
HADOOP-1230. Add new map/reduce API and deprecate the old one. Generally,
the old code should work without problem. The new api is in
org.apache.hadoop.mapreduce and the old classes in org.apache.hadoop.mapred
are deprecated. Differences in the new API:
1. All of the methods take Context objects that allow us to add new
methods without breaking compatability.
2. Mapper and Reducer now have a "run" method that is called once and
contains the control loop for the task, which lets applications
replace it.
3. Mapper and Reducer by default are Identity Mapper and Reducer.
4. The FileOutputFormats use part-r-00000 for the output of reduce 0 and
part-m-00000 for the output of map 0.
5. The reduce grouping comparator now uses the raw compare instead of
object compare.
6. The number of maps in FileInputFormat is controlled by min and max
split size rather than min size and the desired number of maps.
(omalley)
HADOOP-3305. Use Ivy to manage dependencies. (Giridharan Kesavan
and Steve Loughran via cutting)
IMPROVEMENTS
HADOOP-4565. Added CombineFileInputFormat to use data locality information
to create splits. (dhruba via zshao)
HADOOP-4749. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via
zshao)
HADOOP-4234. Fix KFS "glue" layer to allow applications to interface
with multiple KFS metaservers. (Sriram Rao via lohit)
HADOOP-4245. Update to latest version of KFS "glue" library jar.
(Sriram Rao via lohit)
HADOOP-4244. Change test-patch.sh to check Eclipse classpath no matter
it is run by Hudson or not. (szetszwo)
HADOOP-3180. Add name of missing class to WritableName.getClass
IOException. (Pete Wyckoff via omalley)
HADOOP-4178. Make the capacity scheduler's default values configurable.
(Sreekanth Ramakrishnan via omalley)
HADOOP-4262. Generate better error message when client exception has null
message. (stevel via omalley)
HADOOP-4226. Refactor and document LineReader to make it more readily
understandable. (Yuri Pradkin via cdouglas)
HADOOP-4238. When listing jobs, if scheduling information isn't available
print NA instead of empty output. (Sreekanth Ramakrishnan via johan)
HADOOP-4284. Support filters that apply to all requests, or global filters,
to HttpServer. (Kan Zhang via cdouglas)
HADOOP-4276. Improve the hashing functions and deserialization of the
mapred ID classes. (omalley)
HADOOP-4485. Add a compile-native ant task, as a shorthand. (enis)
HADOOP-4454. Allow # comments in slaves file. (Rama Ramasamy via omalley)
HADOOP-3461. Remove hdfs.StringBytesWritable. (szetszwo)
HADOOP-4437. Use Halton sequence instead of java.util.Random in
PiEstimator. (szetszwo)
HADOOP-4572. Change INode and its sub-classes to package private.
(szetszwo)
HADOOP-4187. Does a runtime lookup for JobConf/JobConfigurable, and if
found, invokes the appropriate configure method. (Sharad Agarwal via ddas)
HADOOP-4453. Improve ssl configuration and handling in HsftpFileSystem,
particularly when used with DistCp. (Kan Zhang via cdouglas)
HADOOP-4583. Several code optimizations in HDFS. (Suresh Srinivas via
szetszwo)
HADOOP-3923. Remove org.apache.hadoop.mapred.StatusHttpServer. (szetszwo)
HADOOP-4622. Explicitly specify interpretor for non-native
pipes binaries. (Fredrik Hedberg via johan)
HADOOP-4505. Add a unit test to test faulty setup task and cleanup
task killing the job. (Amareshwari Sriramadasu via johan)
HADOOP-4608. Don't print a stack trace when the example driver gets an
unknown program to run. (Edward Yoon via omalley)
HADOOP-4645. Package HdfsProxy contrib project without the extra level
of directories. (Kan Zhang via omalley)
HADOOP-4126. Allow access to HDFS web UI on EC2 (tomwhite via omalley)
HADOOP-4612. Removes RunJar's dependency on JobClient.
(Sharad Agarwal via ddas)
HADOOP-4185. Adds setVerifyChecksum() method to FileSystem.
(Sharad Agarwal via ddas)
HADOOP-4523. Prevent too many tasks scheduled on a node from bringing
it down by monitoring for cumulative memory usage across tasks.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4640. Adds an input format that can split lzo compressed
text files. (johan)
HADOOP-4666. Launch reduces only after a few maps have run in the
Fair Scheduler. (Matei Zaharia via johan)
HADOOP-4339. Remove redundant calls from FileSystem/FsShell when
generating/processing ContentSummary. (David Phillips via cdouglas)
HADOOP-2774. Add counters tracking records spilled to disk in MapTask and
ReduceTask. (Ravi Gummadi via cdouglas)
HADOOP-4513. Initialize jobs asynchronously in the capacity scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
Qi via cdouglas)
HADOOP-4708. Add support for dfsadmin commands in TestCLI. (Boris Shkolnik
via cdouglas)
HADOOP-4758. Add a splitter for metrics contexts to support more than one
type of collector. (cdouglas)
HADOOP-4722. Add tests for dfsadmin quota error messages. (Boris Shkolnik
via cdouglas)
HADOOP-4690. fuse-dfs - create source file/function + utils + config +
main source files. (pete wyckoff via mahadev)
HADOOP-3750. Fix and enforce module dependencies. (Sharad Agarwal via
tomwhite)
HADOOP-4747. Speed up FsShell::ls by removing redundant calls to the
filesystem. (David Phillips via cdouglas)
HADOOP-4305. Improves the blacklisting strategy, whereby, tasktrackers
that are blacklisted are not given tasks to run from other jobs, subject
to the following conditions (all must be met):
1) The TaskTracker has been blacklisted by at least 4 jobs (configurable)
2) The TaskTracker has been blacklisted 50% more number of times than
the average (configurable)
3) The cluster has less than 50% trackers blacklisted
Once in 24 hours, a TaskTracker blacklisted for all jobs is given a chance.
Restarting the TaskTracker moves it out of the blacklist.
(Amareshwari Sriramadasu via ddas)
HADOOP-4688. Modify the MiniMRDFSSort unit test to spill multiple times,
exercising the map-side merge code. (cdouglas)
HADOOP-4737. Adds the KILLED notification when jobs get killed.
(Amareshwari Sriramadasu via ddas)
HADOOP-4728. Add a test exercising different namenode configurations.
(Boris Shkolnik via cdouglas)
HADOOP-4807. Adds JobClient commands to get the active/blacklisted tracker
names. Also adds commands to display running/completed task attempt IDs.
(ddas)
HADOOP-4699. Remove checksum validation from map output servlet. (cdouglas)
HADOOP-4838. Added a registry to automate metrics and mbeans management.
(Sanjay Radia via acmurthy)
HADOOP-3136. Fixed the default scheduler to assign multiple tasks to each
tasktracker per heartbeat, when feasible. To ensure locality isn't hurt
too badly, the scheudler will not assign more than one off-switch task per
heartbeat. The heartbeat interval is also halved since the task-tracker is
fixed to no longer send out heartbeats on each task completion. A
slow-start for scheduling reduces is introduced to ensure that reduces
aren't started till sufficient number of maps are done, else reduces of
jobs whose maps aren't scheduled might swamp the cluster.
Configuration changes to mapred-default.xml:
add mapred.reduce.slowstart.completed.maps
(acmurthy)
HADOOP-4545. Add example and test case of secondary sort for the reduce.
(omalley)
HADOOP-4753. Refactor gridmix2 to reduce code duplication. (cdouglas)
HADOOP-4909. Fix Javadoc and make some of the API more consistent in their
use of the JobContext instead of Configuration. (omalley)
HADOOP-4830. Add end-to-end test cases for testing queue capacities.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-4980. Improve code layout of capacity scheduler to make it
easier to fix some blocker bugs. (Vivek Ratan via yhemanth)
HADOOP-4916. Make user/location of Chukwa installation configurable by an
external properties file. (Eric Yang via cdouglas)
HADOOP-4950. Make the CompressorStream, DecompressorStream,
BlockCompressorStream, and BlockDecompressorStream public to facilitate
non-Hadoop codecs. (omalley)
HADOOP-4843. Collect job history and configuration in Chukwa. (Eric Yang
via cdouglas)
HADOOP-5030. Build Chukwa RPM to install into configured directory. (Eric
Yang via cdouglas)
HADOOP-4828. Updates documents to do with configuration (HADOOP-4631).
(Sharad Agarwal via ddas)
HADOOP-4939. Adds a test that would inject random failures for tasks in
large jobs and would also inject TaskTracker failures. (ddas)
HADOOP-4920. Stop storing Forrest output in Subversion. (cutting)
HADOOP-4944. A configuration file can include other configuration
files. (Rama Ramasamy via dhruba)
HADOOP-4804. Provide Forrest documentation for the Fair Scheduler.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-5248. A testcase that checks for the existence of job directory
after the job completes. Fails if it exists. (ddas)
HADOOP-4664. Introduces multiple job initialization threads, where the
number of threads are configurable via mapred.jobinit.threads.
(Matei Zaharia and Jothi Padmanabhan via ddas)
HADOOP-4191. Adds a testcase for JobHistory. (Ravi Gummadi via ddas)
HADOOP-5466. Change documenation CSS style for headers and code. (Corinne
Chandel via szetszwo)
HADOOP-5275. Add ivy directory and files to built tar.
(Giridharan Kesavan via nigel)
HADOOP-5468. Add sub-menus to forrest documentation and make some minor
edits. (Corinne Chandel via szetszwo)
HADOOP-5437. Fix TestMiniMRDFSSort to properly test jvm-reuse. (omalley)
HADOOP-5521. Removes dependency of TestJobInProgress on RESTART_COUNT
JobHistory tag. (Ravi Gummadi via ddas)
HADOOP-5714. Add a metric for NameNode getFileInfo operation. (Jakob Homan
via szetszwo)
OPTIMIZATIONS
HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
based on the rack/host that has the most number of bytes.
(Jothi Padmanabhan via ddas)
HADOOP-4683. Fixes Reduce shuffle scheduler to invoke
getMapCompletionEvents in a separate thread. (Jothi Padmanabhan
via ddas)
BUG FIXES
HADOOP-5379. CBZip2InputStream to throw IOException on data crc error.
(Rodrigo Schmidt via zshao)
HADOOP-5326. Fixes CBZip2OutputStream data corruption problem.
(Rodrigo Schmidt via zshao)
HADOOP-4204. Fix findbugs warnings related to unused variables, naive
Number subclass instantiation, Map iteration, and badly scoped inner
classes. (Suresh Srinivas via cdouglas)
HADOOP-4207. Update derby jar file to release 10.4.2 release.
(Prasad Chakka via dhruba)
HADOOP-4325. SocketInputStream.read() should return -1 in case EOF.
(Raghu Angadi)
HADOOP-4408. FsAction functions need not create new objects. (cdouglas)
HADOOP-4440. TestJobInProgressListener tests for jobs killed in queued
state (Amar Kamat via ddas)
HADOOP-4346. Implement blocking connect so that Hadoop is not affected
by selector problem with JDK default implementation. (Raghu Angadi)
HADOOP-4388. If there are invalid blocks in the transfer list, Datanode
should handle them and keep transferring the remaining blocks. (Suresh
Srinivas via szetszwo)
HADOOP-4587. Fix a typo in Mapper javadoc. (Koji Noguchi via szetszwo)
HADOOP-4530. In fsck, HttpServletResponse sendError fails with
IllegalStateException. (hairong)
HADOOP-4377. Fix a race condition in directory creation in
NativeS3FileSystem. (David Phillips via cdouglas)
HADOOP-4621. Fix javadoc warnings caused by duplicate jars. (Kan Zhang via
cdouglas)
HADOOP-4566. Deploy new hive code to support more types.
(Zheng Shao via dhruba)
HADOOP-4571. Add chukwa conf files to svn:ignore list. (Eric Yang via
szetszwo)
HADOOP-4589. Correct PiEstimator output messages and improve the code
readability. (szetszwo)
HADOOP-4650. Correct a mismatch between the default value of
local.cache.size in the config and the source. (Jeff Hammerbacher via
cdouglas)
HADOOP-4606. Fix cygpath error if the log directory does not exist.
(szetszwo via omalley)
HADOOP-4141. Fix bug in ScriptBasedMapping causing potential infinite
loop on misconfigured hadoop-site. (Aaron Kimball via tomwhite)
HADOOP-4691. Correct a link in the javadoc of IndexedSortable. (szetszwo)
HADOOP-4598. '-setrep' command skips under-replicated blocks. (hairong)
HADOOP-4429. Set defaults for user, group in UnixUserGroupInformation so
login fails more predictably when misconfigured. (Alex Loddengaard via
cdouglas)
HADOOP-4676. Fix broken URL in blacklisted tasktrackers page. (Amareshwari
Sriramadasu via cdouglas)
HADOOP-3422 Ganglia counter metrics are all reported with the metric
name "value", so the counter values can not be seen. (Jason Attributor
and Brian Bockelman via stack)
HADOOP-4704. Fix javadoc typos "the the". (szetszwo)
HADOOP-4677. Fix semantics of FileSystem::getBlockLocations to return
meaningful values. (Hong Tang via cdouglas)
HADOOP-4669. Use correct operator when evaluating whether access time is
enabled (Dhruba Borthakur via cdouglas)
HADOOP-4732. Pass connection and read timeouts in the correct order when
setting up fetch in reduce. (Amareshwari Sriramadasu via cdouglas)
HADOOP-4558. Fix capacity reclamation in capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4770. Fix rungridmix_2 script to work with RunJar. (cdouglas)
HADOOP-4738. When using git, the saveVersion script will use only the
commit hash for the version and not the message, which requires escaping.
(cdouglas)
HADOOP-4576. Show pending job count instead of task count in the UI per
queue in capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4623. Maintain running tasks even if speculative execution is off.
(Amar Kamat via yhemanth)
HADOOP-4786. Fix broken compilation error in
TestTrackerBlacklistAcrossJobs. (yhemanth)
HADOOP-4785. Fixes theJobTracker heartbeat to not make two calls to
System.currentTimeMillis(). (Amareshwari Sriramadasu via ddas)
HADOOP-4792. Add generated Chukwa configuration files to version control
ignore lists. (cdouglas)
HADOOP-4796. Fix Chukwa test configuration, remove unused components. (Eric
Yang via cdouglas)
HADOOP-4708. Add binaries missed in the initial checkin for Chukwa. (Eric
Yang via cdouglas)
HADOOP-4805. Remove black list collector from Chukwa Agent HTTP Sender.
(Eric Yang via cdouglas)
HADOOP-4837. Move HADOOP_CONF_DIR configuration to chukwa-env.sh (Jerome
Boulon via cdouglas)
HADOOP-4825. Use ps instead of jps for querying process status in Chukwa.
(Eric Yang via cdouglas)
HADOOP-4844. Fixed javadoc for
org.apache.hadoop.fs.permission.AccessControlException to document that
it's deprecated in favour of
org.apache.hadoop.security.AccessControlException. (acmurthy)
HADOOP-4706. Close the underlying output stream in
IFileOutputStream::close. (Jothi Padmanabhan via cdouglas)
HADOOP-4855. Fixed command-specific help messages for refreshServiceAcl in
DFSAdmin and MRAdmin. (acmurthy)
HADOOP-4820. Remove unused method FSNamesystem::deleteInSafeMode. (Suresh
Srinivas via cdouglas)
HADOOP-4698. Lower io.sort.mb to 10 in the tests and raise the junit memory
limit to 512m from 256m. (Nigel Daley via cdouglas)
HADOOP-4860. Split TestFileTailingAdapters into three separate tests to
avoid contention. (Eric Yang via cdouglas)
HADOOP-3921. Fixed clover (code coverage) target to work with JDK 6.
(tomwhite via nigel)
HADOOP-4845. Modify the reduce input byte counter to record only the
compressed size and add a human-readable label. (Yongqiang He via cdouglas)
HADOOP-4458. Add a test creating symlinks in the working directory.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-4879. Fix org.apache.hadoop.mapred.Counters to correctly define
Object.equals rather than depend on contentEquals api. (omalley via
acmurthy)
HADOOP-4791. Fix rpm build process for Chukwa. (Eric Yang via cdouglas)
HADOOP-4771. Correct initialization of the file count for directories
with quotas. (Ruyue Ma via shv)
HADOOP-4878. Fix eclipse plugin classpath file to point to ivy's resolved
lib directory and added the same to test-patch.sh. (Giridharan Kesavan via
acmurthy)
HADOOP-4774. Fix default values of some capacity scheduler configuration
items which would otherwise not work on a fresh checkout.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4876. Fix capacity scheduler reclamation by updating count of
pending tasks correctly. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4849. Documentation for Service Level Authorization implemented in
HADOOP-4348. (acmurthy)
HADOOP-4827. Replace Consolidator with Aggregator macros in Chukwa (Eric
Yang via cdouglas)
HADOOP-4894. Correctly parse ps output in Chukwa jettyCollector.sh. (Ari
Rabkin via cdouglas)
HADOOP-4892. Close fds out of Chukwa ExecPlugin. (Ari Rabkin via cdouglas)
HADOOP-4889. Fix permissions in RPM packaging. (Eric Yang via cdouglas)
HADOOP-4869. Fixes the TT-JT heartbeat to have an explicit flag for
restart apart from the initialContact flag that there was earlier.
(Amareshwari Sriramadasu via ddas)
HADOOP-4716. Fixes ReduceTask.java to clear out the mapping between
hosts and MapOutputLocation upon a JT restart (Amar Kamat via ddas)
HADOOP-4880. Removes an unnecessary testcase from TestJobTrackerRestart.
(Amar Kamat via ddas)
HADOOP-4924. Fixes a race condition in TaskTracker re-init. (ddas)
HADOOP-4854. Read reclaim capacity interval from capacity scheduler
configuration. (Sreekanth Ramakrishnan via yhemanth)
HADOOP-4896. HDFS Fsck does not load HDFS configuration. (Raghu Angadi)
HADOOP-4956. Creates TaskStatus for failed tasks with an empty Counters
object instead of null. (ddas)
HADOOP-4979. Fix capacity scheduler to block cluster for failed high
RAM requirements across task types. (Vivek Ratan via yhemanth)
HADOOP-4949. Fix native compilation. (Chris Douglas via acmurthy)
HADOOP-4787. Fixes the testcase TestTrackerBlacklistAcrossJobs which was
earlier failing randomly. (Amareshwari Sriramadasu via ddas)
HADOOP-4914. Add description fields to Chukwa init.d scripts (Eric Yang via
cdouglas)
HADOOP-4884. Make tool tip date format match standard HICC format. (Eric
Yang via cdouglas)
HADOOP-4925. Make Chukwa sender properties configurable. (Ari Rabkin via
cdouglas)
HADOOP-4947. Make Chukwa command parsing more forgiving of whitespace. (Ari
Rabkin via cdouglas)
HADOOP-5026. Make chukwa/bin scripts executable in repository. (Andy
Konwinski via cdouglas)
HADOOP-4977. Fix a deadlock between the reclaimCapacity and assignTasks
in capacity scheduler. (Vivek Ratan via yhemanth)
HADOOP-4988. Fix reclaim capacity to work even when there are queues with
no capacity. (Vivek Ratan via yhemanth)
HADOOP-5065. Remove generic parameters from argument to
setIn/OutputFormatClass so that it works with SequenceIn/OutputFormat.
(cdouglas via omalley)
HADOOP-4818. Pass user config to instrumentation API. (Eric Yang via
cdouglas)
HADOOP-4993. Fix Chukwa agent configuration and startup to make it both
more modular and testable. (Ari Rabkin via cdouglas)
HADOOP-5048. Fix capacity scheduler to correctly cleanup jobs that are
killed after initialization, but before running.
(Sreekanth Ramakrishnan via yhemanth)
HADOOP-4671. Mark loop control variables shared between threads as
volatile. (cdouglas)
HADOOP-5079. HashFunction inadvertently destroys some randomness
(Jonathan Ellis via stack)
HADOOP-4999. A failure to write to FsEditsLog results in
IndexOutOfBounds exception. (Boris Shkolnik via rangadi)
HADOOP-5139. Catch IllegalArgumentException during metrics registration
in RPC. (Hairong Kuang via szetszwo)
HADOOP-5085. Copying a file to local with Crc throws an exception.
(hairong)
HADOOP-4759. Removes temporary output directory for failed and
killed tasks by launching special CLEANUP tasks for the same.
(Amareshwari Sriramadasu via ddas)
HADOOP-5211. Fix check for job completion in TestSetupAndCleanupFailure.
(enis)
HADOOP-5254. The Configuration class should be able to work with XML
parsers that do not support xmlinclude. (Steve Loughran via dhruba)
HADOOP-4692. Namenode in infinite loop for replicating/deleting corrupt
blocks. (hairong)
HADOOP-5255. Fix use of Math.abs to avoid overflow. (Jonathan Ellis via
cdouglas)
HADOOP-5269. Fixes a problem to do with tasktracker holding on to
FAILED_UNCLEAN or KILLED_UNCLEAN tasks forever. (Amareshwari Sriramadasu
via ddas)
HADOOP-5214. Fixes a ConcurrentModificationException while the Fairshare
Scheduler accesses the tasktrackers stored by the JobTracker.
(Rahul Kumar Singh via yhemanth)
HADOOP-5233. Addresses the three issues - Race condition in updating
status, NPE in TaskTracker task localization when the conf file is missing
(HADOOP-5234) and NPE in handling KillTaskAction of a cleanup task
(HADOOP-5235). (Amareshwari Sriramadasu via ddas)
HADOOP-5247. Introduces a broadcast of KillJobAction to all trackers when
a job finishes. This fixes a bunch of problems to do with NPE when a
completed job is not in memory and a tasktracker comes to the jobtracker
with a status report of a task belonging to that job. (Amar Kamat via ddas)
HADOOP-5282. Fixed job history logs for task attempts that are
failed by the JobTracker, say due to lost task trackers. (Amar
Kamat via yhemanth)
HADOOP-4963. Fixes a logging to do with getting the location of
map output file. (Amareshwari Sriramadasu via ddas)
HADOOP-5292. Fix NPE in KFS::getBlockLocations. (Sriram Rao via lohit)
HADOOP-5241. Fixes a bug in disk-space resource estimation. Makes
the estimation formula linear where blowUp =
Total-Output/Total-Input. (Sharad Agarwal via ddas)
HADOOP-5142. Fix MapWritable#putAll to store key/value classes.
(Do??acan G??ney via enis)
HADOOP-4744. Workaround for jetty6 returning -1 when getLocalPort
is invoked on the connector. The workaround patch retries a few
times before failing. (Jothi Padmanabhan via yhemanth)
HADOOP-5280. Adds a check to prevent a task state transition from
FAILED to any of UNASSIGNED, RUNNING, COMMIT_PENDING or
SUCCEEDED. (ddas)
HADOOP-5272. Fixes a problem to do with detecting whether an
attempt is the first attempt of a Task. This affects JobTracker
restart. (Amar Kamat via ddas)
HADOOP-5306. Fixes a problem to do with logging/parsing the http port of a
lost tracker. Affects JobTracker restart. (Amar Kamat via ddas)
HADOOP-5111. Fix Job::set* methods to work with generics. (cdouglas)
HADOOP-5274. Fix gridmix2 dependency on wordcount example. (cdouglas)
HADOOP-5145. Balancer sometimes runs out of memory after running
days or weeks. (hairong)
HADOOP-5338. Fix jobtracker restart to clear task completion
events cached by tasktrackers forcing them to fetch all events
afresh, thus avoiding missed task completion events on the
tasktrackers. (Amar Kamat via yhemanth)
HADOOP-4695. Change TestGlobalFilter so that it allows a web page to be
filtered more than once for a single access. (Kan Zhang via szetszwo)
HADOOP-5298. Change TestServletFilter so that it allows a web page to be
filtered more than once for a single access. (szetszwo)
HADOOP-5432. Disable ssl during unit tests in hdfsproxy, as it is unused
and causes failures. (cdouglas)
HADOOP-5416. Correct the shell command "fs -test" forrest doc description.
(Ravi Phulari via szetszwo)
HADOOP-5327. Fixed job tracker to remove files from system directory on
ACL check failures and also check ACLs on restart.
(Amar Kamat via yhemanth)
HADOOP-5395. Change the exception message when a job is submitted to an
invalid queue. (Rahul Kumar Singh via yhemanth)
HADOOP-5276. Fixes a problem to do with updating the start time of
a task when the tracker that ran the task is lost. (Amar Kamat via
ddas)
HADOOP-5278. Fixes a problem to do with logging the finish time of
a task during recovery (after a JobTracker restart). (Amar Kamat
via ddas)
HADOOP-5490. Fixes a synchronization problem in the
EagerTaskInitializationListener class. (Jothi Padmanabhan via
ddas)
HADOOP-5493. The shuffle copier threads return the codecs back to
the pool when the shuffle completes. (Jothi Padmanabhan via ddas)
HADOOP-5505. Fix JspHelper initialization in the context of
MiniDFSCluster. (Raghu Angadi)
HADOOP-5414. Fixes IO exception while executing hadoop fs -touchz
fileName by making sure that lease renewal thread exits before dfs
client exits. (hairong)
HADOOP-5103. FileInputFormat now reuses the clusterMap network
topology object and that brings down the log messages in the
JobClient to do with NetworkTopology.add significantly. (Jothi
Padmanabhan via ddas)
HADOOP-5483. Fixes a problem in the Directory Cleanup Thread due to which
TestMiniMRWithDFS sometimes used to fail. (ddas)
HADOOP-5281. Prevent sharing incompatible ZlibCompressor instances between
GzipCodec and DefaultCodec. (cdouglas)
HADOOP-5463. Balancer throws "Not a host:port pair" unless port is
specified in fs.default.name. (Stuart White via hairong)
HADOOP-5514. Fix JobTracker metrics and add metrics for wating, failed
tasks. (cdouglas)
HADOOP-5516. Fix NullPointerException in TaskMemoryManagerThread
that comes when monitored processes disappear when the thread is
running. (Vinod Kumar Vavilapalli via yhemanth)
HADOOP-5382. Support combiners in the new context object API. (omalley)
HADOOP-5471. Fixes a problem to do with updating the log.index file in the
case where a cleanup task is run. (Amareshwari Sriramadasu via ddas)
HADOOP-5534. Fixed a deadlock in Fair scheduler's servlet.
(Rahul Kumar Singh via yhemanth)
HADOOP-5328. Fixes a problem in the renaming of job history files during
job recovery. Amar Kamat via ddas)
HADOOP-5417. Don't ignore InterruptedExceptions that happen when calling
into rpc. (omalley)
HADOOP-5320. Add a close() in TestMapReduceLocal. (Jothi Padmanabhan
via szetszwo)
HADOOP-5520. Fix a typo in disk quota help message. (Ravi Phulari
via szetszwo)
HADOOP-5519. Remove claims from mapred-default.xml that prime numbers
of tasks are helpful. (Owen O'Malley via szetszwo)
HADOOP-5484. TestRecoveryManager fails wtih FileAlreadyExistsException.
(Amar Kamat via hairong)
HADOOP-5564. Limit the JVM heap size in the java command for initializing
JAVA_PLATFORM. (Suresh Srinivas via szetszwo)
HADOOP-5565. Add API for failing/finalized jobs to the JT metrics
instrumentation. (Jerome Boulon via cdouglas)
HADOOP-5390. Remove duplicate jars from tarball, src from binary tarball
added by hdfsproxy. (Zhiyong Zhang via cdouglas)
HADOOP-5066. Building binary tarball should not build docs/javadocs, copy
src, or run jdiff. (Giridharan Kesavan via cdouglas)
HADOOP-5459. Fix undetected CRC errors where intermediate output is closed
before it has been completely consumed. (cdouglas)
HADOOP-5571. Remove widening primitive conversion in TupleWritable mask
manipulation. (Jingkei Ly via cdouglas)
HADOOP-5588. Remove an unnecessary call to listStatus(..) in
FileSystem.globStatusInternal(..). (Hairong Kuang via szetszwo)
HADOOP-5473. Solves a race condition in killing a task - the state is KILLED
if there is a user request pending to kill the task and the TT reported
the state as SUCCESS. (Amareshwari Sriramadasu via ddas)
HADOOP-5576. Fix LocalRunner to work with the new context object API in
mapreduce. (Tom White via omalley)
HADOOP-4374. Installs a shutdown hook in the Task JVM so that log.index is
updated before the JVM exits. Also makes the update to log.index atomic.
(Ravi Gummadi via ddas)
HADOOP-5577. Add a verbose flag to mapreduce.Job.waitForCompletion to get
the running job's information printed to the user's stdout as it runs.
(omalley)
HADOOP-5607. Fix NPE in TestCapacityScheduler. (cdouglas)
HADOOP-5605. All the replicas incorrectly got marked as corrupt. (hairong)
HADOOP-5337. JobTracker, upon restart, now waits for the TaskTrackers to
join back before scheduling new tasks. This fixes race conditions associated
with greedy scheduling as was the case earlier. (Amar Kamat via ddas)
HADOOP-5227. Fix distcp so -update and -delete can be meaningfully
combined. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5305. Increase number of files and print debug messages in
TestCopyFiles. (szetszwo)
HADOOP-5548. Add synchronization for JobTracker methods in RecoveryManager.
(Amareshwari Sriramadasu via sharad)
HADOOP-3810. NameNode seems unstable on a cluster with little space left.
(hairong)
HADOOP-5068. Fix NPE in TestCapacityScheduler. (Vinod Kumar Vavilapalli
via szetszwo)
HADOOP-5585. Clear FileSystem statistics between tasks when jvm-reuse
is enabled. (omalley)
HADOOP-5394. JobTracker might schedule 2 attempts of the same task
with the same attempt id across restarts. (Amar Kamat via sharad)
HADOOP-5645. After HADOOP-4920 we need a place to checkin
releasenotes.html. (nigel)
Release 0.19.2 - Unreleased
BUG FIXES
HADOOP-5154. Fixes a deadlock in the fairshare scheduler.
(Matei Zaharia via yhemanth)
HADOOP-5146. Fixes a race condition that causes LocalDirAllocator to miss
files. (Devaraj Das via yhemanth)
HADOOP-4638. Fixes job recovery to not crash the job tracker for problems
with a single job file. (Amar Kamat via yhemanth)
HADOOP-5384. Fix a problem that DataNodeCluster creates blocks with
generationStamp == 1. (szetszwo)
HADOOP-5376. Fixes the code handling lost tasktrackers to set the task state
to KILLED_UNCLEAN only for relevant type of tasks.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5285. Fixes the issues - (1) obtainTaskCleanupTask checks whether job is
inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class
outside the TaskTracker and makes it a generic class that is used by the
JobTracker also for deleting the paths on the job's output fs. (3) Moves the
references to completedJobStore outside the block where the JobTracker is locked.
(ddas)
HADOOP-5392. Fixes a problem to do with JT crashing during recovery when
the job files are garbled. (Amar Kamat vi ddas)
HADOOP-5332. Appending to files is not allowed (by default) unless
dfs.support.append is set to true. (dhruba)
HADOOP-5333. libhdfs supports appending to files. (dhruba)
HADOOP-3998. Fix dfsclient exception when JVM is shutdown. (dhruba)
HADOOP-5440. Fixes a problem to do with removing a taskId from the list
of taskIds that the TaskTracker's TaskMemoryManager manages.
(Amareshwari Sriramadasu via ddas)
HADOOP-5446. Restore TaskTracker metrics. (cdouglas)
HADOOP-5449. Fixes the history cleaner thread.
(Amareshwari Sriramadasu via ddas)
HADOOP-5479. NameNode should not send empty block replication request to
DataNode. (hairong)
HADOOP-5259. Job with output hdfs:/user/<username>/outputpath (no
authority) fails with Wrong FS. (Doug Cutting via hairong)
HADOOP-5522. Documents the setup/cleanup tasks in the mapred tutorial.
(Amareshwari Sriramadasu via ddas)
HADOOP-5549. ReplicationMonitor should schedule both replication and
deletion work in one iteration. (hairong)
HADOOP-5554. DataNodeCluster and CreateEditsLog should create blocks with
the same generation stamp value. (hairong via szetszwo)
HADOOP-5231. Clones the TaskStatus before passing it to the JobInProgress.
(Amareshwari Sriramadasu via ddas)
HADOOP-4719. Fix documentation of 'ls' format for FsShell. (Ravi Phulari
via cdouglas)
HADOOP-5374. Fixes a NPE problem in getTasksToSave method.
(Amareshwari Sriramadasu via ddas)
HADOOP-4780. Cache the size of directories in DistributedCache, avoiding
long delays in recalculating it. (He Yongqiang via cdouglas)
HADOOP-5551. Prevent directory destruction on file create.
(Brian Bockelman via shv)
HADOOP-5671. Fix FNF exceptions when copying from old versions of
HftpFileSystem. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5579. Set errno correctly in libhdfs for permission, quota, and FNF
conditions. (Brian Bockelman via cdouglas)
HADOOP-5816. Fixes a problem in the KeyFieldBasedComparator to do with
ArrayIndexOutOfBounds exception. (He Yongqiang via ddas)
HADOOP-5951. Add Apache license header to StorageInfo.java. (Suresh
Srinivas via szetszwo)
Release 0.19.1 - 2009-02-23
IMPROVEMENTS
HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
mapred tutorial. (Vivek Ratan via cdouglas)
HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
BUG FIXES
HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
blocks correctly. (Sriram Rao via cdouglas)
HADOOP-4420. Add null checks for job, caused by invalid job IDs.
(Aaron Kimball via tomwhite)
HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
current workding directory for scratch space. (Amar Kamat via cdouglas)
HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
szetszwo)
HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
fuse-dfs. (Brian Bockelman via szetszwo)
HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
via szetszwo)
HADOOP-4821. Usage description in the Quotas guide documentations are
incorrect. (Boris Shkolnik via hairong)
HADOOP-4847. Moves the loading of OutputCommitter to the Task.
(Amareshwari Sriramadasu via ddas)
HADOOP-4966. Marks completed setup tasks for removal.
(Amareshwari Sriramadasu via ddas)
HADOOP-4982. TestFsck should run in Eclipse. (shv)
HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
fd unclosed. (hairong)
HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
(Zheng Shao via dhruba).
HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
tasks. (ddas)
HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
verification log unclosed. (hairong)
HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
HADOOP-4955. Make DBOutputFormat us column names from setOutput().
(Kevin Peterson via enis)
HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
spurious IOExceptions logged by DataNode. (Raghu Angadi)
HADOOP-5034. NameNode should send both replication and deletion requests
to DataNode in one reply to a heartbeat. (hairong)
HADOOP-5156. TestHeartbeatHandling uses MiiDFSCluster.getNamesystem()
which does not exit in branch 0.19 and 0.20. (hairong)
HADOOP-5161. Accepted sockets do not get placed in
DataXceiverServer#childSockets. (hairong)
HADOOP-5193. Correct calculation of edits modification time. (shv)
HADOOP-4494. Allow libhdfs to append to files.
(Pete Wyckoff via dhruba)
HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
for the JobTracker. (Amar Kamat via yhemanth).
HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
killed tasks correctly. (Amareshwari Sriramadasu via ddas)
HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
(enis)
Release 0.19.0 - 2008-11-18
INCOMPATIBLE CHANGES
HADOOP-3595. Remove deprecated methods for mapred.combine.once
functionality, which was necessary to providing backwards
compatible combiner semantics for 0.18. (cdouglas via omalley)
HADOOP-3667. Remove the following deprecated methods from JobConf:
addInputPath(Path)
getInputPaths()
getMapOutputCompressionType()
getOutputPath()
getSystemDir()
setInputPath(Path)
setMapOutputCompressionType(CompressionType style)
setOutputPath(Path)
(Amareshwari Sriramadasu via omalley)
HADOOP-3652. Remove deprecated class OutputFormatBase.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-2885. Break the hadoop.dfs package into separate packages under
hadoop.hdfs that reflect whether they are client, server, protocol,
etc. DistributedFileSystem and DFSClient have moved and are now
considered package private. (Sanjay Radia via omalley)
HADOOP-2325. Require Java 6. (cutting)
HADOOP-372. Add support for multiple input paths with a different
InputFormat and Mapper for each path. (Chris Smith via tomwhite)
HADOOP-1700. Support appending to file in HDFS. (dhruba)
HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
zero for true and non-zero for false. (Ben Slusky via cdouglas)
HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
which is no longer needed. (tomwhite via omalley)
HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
EACCES is returned for permission problems. (Ben Slusky via omalley)
HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
so increment the InterTrackerProtocol version. (Hemanth Yamijala via
omalley)
HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
committing output files. Moves job setup to jobclient, and moves jobcleanup
to a separate task. (Amareshwari Sriramadasu via ddas)
HADOOP-3446. Keep map outputs in memory during the reduce. Remove
fs.inmemory.size.mb and replace with properties defining in memory map
output retention during the shuffle and reduce relative to maximum heap
usage. (cdouglas)
HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
jobs can be recovered from the history file. The history file format has
been modified to support recovery. The task attempt ID now has the
JobTracker start time to disinguish attempts of the same TIP across
restarts. (Amar Ramesh Kamat via ddas)
HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
(Sanjay Radia via hairong)
HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
HADOOP-2816. Cluster summary at name node web reports the space
utilization as:
Configured Capacity: capacity of all the data directories - Reserved space
Present Capacity: Space available for dfs,i.e. remaining+used space
DFS Used%: DFS used space/Present Capacity
(Suresh Srinivas via hairong)
HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
quotas in 0.18. (rangadi)
HADOOP-4293. Make Configuration Writable and remove unreleased
WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
HADOOP-4281. Change dfsadmin to report available disk space in a format
consistent with the web interface as defined in HADOOP-2816. (Suresh
Srinivas via cdouglas)
HADOOP-4430. Further change the cluster summary at name node web that was
changed in HADOOP-2816:
Non DFS Used - This indicates the disk space taken by non DFS file from
the Configured capacity
DFS Used % - DFS Used % of Configured Capacity
DFS Remaining % - Remaing % Configured Capacity available for DFS use
DFS command line report reflects the same change. Config parameter
dfs.datanode.du.pct is no longer used and is removed from the
hadoop-default.xml. (Suresh Srinivas via hairong)
HADOOP-4116. Balancer should provide better resource management. (hairong)
HADOOP-4599. BlocksMap and BlockInfo made package private. (shv)
NEW FEATURES
HADOOP-3341. Allow streaming jobs to specify the field separator for map
and reduce input and output. The new configuration values are:
stream.map.input.field.separator
stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
All of them default to "\t". (Zheng Shao via omalley)
HADOOP-3479. Defines the configuration file for the resource manager in
Hadoop. You can configure various parameters related to scheduling, such
as queues and queue properties here. The properties for a queue follow a
naming convention,such as, hadoop.rm.queue.queue-name.property-name.
(Hemanth Yamijala via ddas)
HADOOP-3149. Adds a way in which map/reducetasks can create multiple
outputs. (Alejandro Abdelnur via ddas)
HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
bash tab completion for the bin/hadoop script. See the README file
in the contrib directory for the installation. (Chris Smith via enis)
HADOOP-3730. Adds a new JobConf constructor that disables loading
default configurations. (Alejandro Abdelnur via ddas)
HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
(Ari Rabkin via acmurthy)
HADOOP-2302. Provides a comparator for numerical sorting of key fields.
(ddas)
HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
HADOOP-657. Free disk space should be modelled and used by the scheduler
to make scheduling decisions. (Ari Rabkin via omalley)
HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
and Eric Yang)
HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
number of files/bytes copied in a particular run to support incremental
updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3585. FailMon package for hardware failure monitoring and
analysis of anomalies. (Ioannis Koltsidas via dhruba)
HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3759. Provides ability to run memory intensive jobs without
affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
HADOOP-3828. Provides a way to write skipped records to DFS.
(Sharad Agarwal via ddas)
HADOOP-3948. Separate name-node edits and fsimage directories.
(Lohit Vijayarenu via shv)
HADOOP-3939. Add an option to DistCp to delete files at the destination
not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
(Craig Weisenfluh via omalley)
HADOOP-3698. Add access control to control who is allowed to submit or
modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
HADOOP-1869. Support access times for HDFS files. (dhruba)
HADOOP-3941. Extend FileSystem API to return file-checksums.
(szetszwo)
HADOOP-3581. Prevents memory intensive user tasks from taking down
nodes. (Vinod K V via ddas)
HADOOP-3970. Provides a way to recover counters written to JobHistory.
(Amar Kamat via ddas)
HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
chains of Maps and Reduces in a single Map/Reduce job, something like
MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
queues as a percentage of the cluster. (Vivek Ratan via omalley)
HADOOP-3992. Add a synthetic load generation facility to the test
directory. (hairong via szetszwo)
HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
and change DistCp to use file checksum for comparing src and dst files
(szetszwo)
HADOOP-3829. Narrown down skipped records based on user acceptable value.
(Sharad Agarwal via ddas)
HADOOP-3930. Add common interfaces for the pluggable schedulers and the
cli & gui clients. (Sreekanth Ramakrishnan via omalley)
HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
Configuration changes to hadoop-default.xml:
add mapred.job.reuse.jvm.num.tasks
(Devaraj Das via acmurthy)
HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
query language. (tomwhite)
HADOOP-2536. Implement a JDBC based database input and output formats to
allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
Enis Soztutar via acmurthy)
HADOOP-3019. A new library to support total order partitions.
(cdouglas via omalley)
HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
acmurthy)
IMPROVEMENTS
HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
(zshao)
HADOOP-4106. libhdfs: add time, permission and user attribute support
(part 2). (Pete Wyckoff through zshao)
HADOOP-4104. libhdfs: add time, permission and user attribute support.
(Pete Wyckoff through zshao)
HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
(Pete Wyckoff through zshao)
HADOOP-3732. Delay intialization of datanode block verification till
the verification thread is started. (rangadi)
HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
(rangadi)
HADOOP-3577. Tools to inject blocks into name node and simulated
data nodes for testing. (Sanjay Radia via hairong)
HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. (cdouglas via omalley)
HADOOP-3655. Add additional ant properties to control junit. (Steve
Loughran via omalley)
HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
HADOOP-3587. Add a unit test for the contrib/data_join framework.
(cdouglas)
HADOOP-3402. Add terasort example program (omalley)
HADOOP-3660. Add replication factor for injecting blocks in simulated
datanodes. (Sanjay Radia via cdouglas)
HADOOP-3684. Add a cloning function to the contrib/data_join framework
permitting users to define a more efficient method for cloning values from
the reduce than serialization/deserialization. (Runping Qi via cdouglas)
HADOOP-3478. Improves the handling of map output fetching. Now the
randomization is by the hosts (and not the map outputs themselves).
(Jothi Padmanabhan via ddas)
HADOOP-3617. Removed redundant checks of accounting space in MapTask and
makes the spill thread persistent so as to avoid creating a new one for
each spill. (Chris Douglas via acmurthy)
HADOOP-3412. Factor the scheduler out of the JobTracker and make
it pluggable. (Tom White and Brice Arnould via omalley)
HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
hadoop-default.xml. (rangadi)
HADOOP-3747. Adds counter suport for MultipleOutputs.
(Alejandro Abdelnur via ddas)
HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
constructor. (TszWo (Nicholas), SZE via hairong)
HADOOP-3824. Move base functionality of StatusHttpServer to a core
package. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
may be processed by map/reduce. (Abdul Qadeer via cdouglas)
HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
(tomwhite via omalley)
HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
cdouglas)
HADOOP-3694. Improve unit test performance by changing
MiniDFSCluster to listen only on 127.0.0.1. (cutting)
HADOOP-3620. Namenode should synchronously resolve a datanode's network
location when the datanode registers. (hairong)
HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
benchmarks. (shv)
HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
HADOOP-3875. Change the time period between heartbeats to be relative to
the end of the heartbeat rpc, rather than the start. This causes better
behavior if the JobTracker is overloaded. (acmurthy via omalley)
HADOOP-3853. Move multiple input format (HADOOP-372) extension to
library package. (tomwhite via johan)
HADOOP-9. Use roulette scheduling for temporary space when the size
is not known. (Ari Rabkin via omalley)
HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
(Amareshwari Sriramadasu via omalley)
HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
via omalley)
HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
via omalley)
HADOOP-3780. Remove asynchronous resolution of network topology in the
JobTracker (Amar Kamat via omalley)
HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
error messages. (Steve Loughran via omalley)
HADOOP-3844. Include message of local exception in RPC client failures.
(Steve Loughran via omalley)
HADOOP-3935. Split out inner classes from DataNode.java. (johan)
HADOOP-3905. Create generic interfaces for edit log streams. (shv)
HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling. (cdouglas)
HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
generative javadoc for developers. (Sanjay Radia via omalley)
HADOOP-3944. Improve documentation for public TupleWritable class in
join package. (Chris Douglas via enis)
HADOOP-2330. Preallocate HDFS transaction log to improve performance.
(dhruba and hairong)
HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
Groshupf and Craig Macdonald via omalley)
HADOOP-3342. Change the kill task actions to require http post instead of
get to prevent accidental crawls from triggering it. (enis via omalley)
HADOOP-3937. Limit the job name in the job history filename to 50
characters. (Matei Zaharia via omalley)
HADOOP-3943. Remove unnecessary synchronization in
NetworkTopology.pseudoSortByDistance. (hairong via omalley)
HADOOP-3498. File globbing alternation should be able to span path
components. (tomwhite)
HADOOP-3361. Implement renames for NativeS3FileSystem.
(Albert Chern via tomwhite)
HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
unset. (Al Hoang via tomwhite)
HADOOP-4147. Remove unused class JobWithTaskContext from class
JobInProgress. (Amareshwari Sriramadasu via johan)
HADOOP-4151. Add a byte-comparable interface that both Text and
BytesWritable implement. (cdouglas via omalley)
HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
NamenodeProtocol. (shv via szetszwo)
HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
developing under git. (omalley)
HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
omalley)
HADOOP-4184. Break the module dependencies between core, hdfs, and
mapred. (tomwhite via omalley)
HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
(Ramya R via nigel)
HADOOP-4117. Improve configurability of Hadoop EC2 instances.
(tomwhite)
HADOOP-2411. Add support for larger CPU EC2 instance types.
(Chris K Wensel via tomwhite)
HADOOP-4083. Changed the configuration attribute queue.name to
mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
HADOOP-4194. Added the JobConf and JobID to job-related methods in
JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
HADOOP-3975. Change test-patch script to report working the dir
modifications preventing the suite from being run. (Ramya R via cdouglas)
HADOOP-4124. Added a command-line switch to allow users to set job
priorities, also allow it to be manipulated via the web-ui. (Hemanth
Yamijala via acmurthy)
HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
userlogs. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4062. Remove the synchronization on the output stream when a
connection is closed and also remove an undesirable exception when
a client is stoped while there is no pending RPC request. (hairong)
HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
(szetszwo)
HADOOP-4006. Clean up FSConstants and move some of the constants to
better places. (Sanjay Radia via rangadi)
HADOOP-4279. Trace the seeds of random sequences in append unit tests to
make itermitant failures reproducible. (szetszwo via cdouglas)
HADOOP-4209. Remove the change to the format of task attempt id by
incrementing the task attempt numbers by 1000 when the job restarts.
(Amar Kamat via omalley)
HADOOP-4301. Adds forrest doc for the skip bad records feature.
(Sharad Agarwal via ddas)
HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
(szetszwo)
HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
Yamijala via omalley)
HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
via cdouglas).
HADOOP-4438. Update forrest documentation to include missing FsShell
commands. (Suresh Srinivas via cdouglas)
HADOOP-4105. Add forrest documentation for libhdfs.
(Pete Wyckoff via cutting)
HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
OPTIMIZATIONS
HADOOP-3556. Removed lock contention in MD5Hash by changing the
singleton MessageDigester by an instance per Thread using
ThreadLocal. (Iv?n de Prado via omalley)
HADOOP-3328. When client is writing data to DFS, only the last
datanode in the pipeline needs to verify the checksum. Saves around
30% CPU on intermediate datanodes. (rangadi)
HADOOP-3863. Use a thread-local string encoder rather than a static one
that is protected by a lock. (acmurthy via omalley)
HADOOP-3864. Prevent the JobTracker from locking up when a job is being
initialized. (acmurthy via omalley)
HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
HADOOP-2130. Pipes submit job should have both blocking and non-blocking
versions. (acmurthy via omalley)
HADOOP-3769. Make the SampleMapper and SampleReducer from
GenericMRLoadGenerator public, so they can be used in other contexts.
(Lingyun Yang via omalley)
HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
it from a different .crc file. (Jothi Padmanabhan via ddas)
HADOOP-3638. Caches the iFile index files in memory to reduce seeks
(Jothi Padmanabhan via ddas)
HADOOP-4225. FSEditLog.logOpenFile() should persist accessTime
rather than modificationTime. (shv)
HADOOP-4380. Made several new classes (Child, JVMId,
JobTrackerInstrumentation, QueueManager, ResourceEstimator,
TaskTrackerInstrumentation, and TaskTrackerMetricsInst) in
org.apache.hadoop.mapred package private instead of public. (omalley)
BUG FIXES
HADOOP-3563. Refactor the distributed upgrade code so that it is
easier to identify datanode and namenode related code. (dhruba)
HADOOP-3640. Fix the read method in the NativeS3InputStream. (tomwhite via
omalley)
HADOOP-3711. Fixes the Streaming input parsing to properly find the
separator. (Amareshwari Sriramadasu via ddas)
HADOOP-3725. Prevent TestMiniMRMapDebugScript from swallowing exceptions.
(Steve Loughran via cdouglas)
HADOOP-3726. Throw exceptions from TestCLI setup and teardown instead of
swallowing them. (Steve Loughran via cdouglas)
HADOOP-3721. Refactor CompositeRecordReader and related mapred.join classes
to make them clearer. (cdouglas)
HADOOP-3720. Re-read the config file when dfsadmin -refreshNodes is invoked
so dfs.hosts and dfs.hosts.exclude are observed. (lohit vijayarenu via
cdouglas)
HADOOP-3485. Allow writing to files over fuse.
(Pete Wyckoff via dhruba)
HADOOP-3723. The flags to the libhdfs.create call can be treated as
a bitmask. (Pete Wyckoff via dhruba)
HADOOP-3643. Filter out completed tasks when asking for running tasks in
the JobTracker web/ui. (Amar Kamat via omalley)
HADOOP-3777. Ensure that Lzo compressors/decompressors correctly handle the
case where native libraries aren't available. (Chris Douglas via acmurthy)
HADOOP-3728. Fix SleepJob so that it doesn't depend on temporary files,
this ensures we can now run more than one instance of SleepJob
simultaneously. (Chris Douglas via acmurthy)
HADOOP-3795. Fix saving image files on Namenode with different checkpoint
stamps. (Lohit Vijayarenu via mahadev)
HADOOP-3624. Improving createeditslog to create tree directory structure.
(Lohit Vijayarenu via mahadev)
HADOOP-3778. DFSInputStream.seek() did not retry in case of some errors.
(LN via rangadi)
HADOOP-3661. The handling of moving files deleted through fuse-dfs to
Trash made similar to the behaviour from dfs shell.
(Pete Wyckoff via dhruba)
HADOOP-3819. Unset LANG and LC_CTYPE in saveVersion.sh to make it
compatible with non-English locales. (Rong-En Fan via cdouglas)
HADOOP-3848. Cache calls to getSystemDir in the TaskTracker instead of
calling it for each task start. (acmurthy via omalley)
HADOOP-3131. Fix reduce progress reporting for compressed intermediate
data. (Matei Zaharia via acmurthy)
HADOOP-3796. fuse-dfs configuration is implemented as file system
mount options. (Pete Wyckoff via dhruba)
HADOOP-3836. Fix TestMultipleOutputs to correctly clean up. (Alejandro
Abdelnur via acmurthy)
HADOOP-3805. Improve fuse-dfs write performance.
(Pete Wyckoff via zshao)
HADOOP-3846. Fix unit test CreateEditsLog to generate paths correctly.
(Lohit Vjayarenu via cdouglas)
HADOOP-3904. Fix unit tests using the old dfs package name.
(TszWo (Nicholas), SZE via johan)
HADOOP-3319. Fix some HOD error messages to go stderr instead of
stdout. (Vinod Kumar Vavilapalli via omalley)
HADOOP-3907. Move INodeDirectoryWithQuota to its own .java file.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3919. Fix attribute name in hadoop-default for
mapred.jobtracker.instrumentation. (Ari Rabkin via omalley)
HADOOP-3903. Change the package name for the servlets to be hdfs instead of
dfs. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3773. Change Pipes to set the default map output key and value
types correctly. (Koji Noguchi via omalley)
HADOOP-3952. Fix compilation error in TestDataJoin referencing dfs package.
(omalley)
HADOOP-3951. Fix package name for FSNamesystem logs and modify other
hard-coded Logs to use the class name. (cdouglas)
HADOOP-3889. Improve error reporting from HftpFileSystem, handling in
DistCp. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3946. Fix TestMapRed after hadoop-3664. (tomwhite via omalley)
HADOOP-3949. Remove duplicate jars from Chukwa. (Jerome Boulon via omalley)
HADOOP-3933. DataNode sometimes sends up to io.byte.per.checksum bytes
more than required to client. (Ning Li via rangadi)
HADOOP-3962. Shell command "fs -count" should support paths with different
file systems. (Tsz Wo (Nicholas), SZE via mahadev)
HADOOP-3957. Fix javac warnings in DistCp and TestCopyFiles. (Tsz Wo
(Nicholas), SZE via cdouglas)
HADOOP-3958. Fix TestMapRed to check the success of test-job. (omalley via
acmurthy)
HADOOP-3985. Fix TestHDFSServerPorts to use random ports. (Hairong Kuang
via omalley)
HADOOP-3964. Fix javadoc warnings introduced by FailMon. (dhruba)
HADOOP-3785. Fix FileSystem cache to be case-insensitive for scheme and
authority. (Bill de hOra via cdouglas)
HADOOP-3506. Fix a rare NPE caused by error handling in S3. (Tom White via
cdouglas)
HADOOP-3705. Fix mapred.join parser to accept InputFormats named with
underscore and static, inner classes. (cdouglas)
HADOOP-4023. Fix javadoc warnings introduced when the HDFS javadoc was
made private. (omalley)
HADOOP-4030. Remove lzop from the default list of codecs. (Arun Murthy via
cdouglas)
HADOOP-3961. Fix task disk space requirement estimates for virtual
input jobs. Delays limiting task placement until after 10% of the maps
have finished. (Ari Rabkin via omalley)
HADOOP-2168. Fix problem with C++ record reader's progress not being
reported to framework. (acmurthy via omalley)
HADOOP-3966. Copy findbugs generated output files to PATCH_DIR while
running test-patch. (Ramya R via lohit)
HADOOP-4037. Fix the eclipse plugin for versions of kfs and log4j. (nigel
via omalley)
HADOOP-3950. Cause the Mini MR cluster to wait for task trackers to
register before continuing. (enis via omalley)
HADOOP-3910. Remove unused ClusterTestDFSNamespaceLogging and
ClusterTestDFS. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3954. Disable record skipping by default. (Sharad Agarwal via
cdouglas)
HADOOP-4050. Fix TestFairScheduler to use absolute paths for the work
directory. (Matei Zaharia via omalley)
HADOOP-4069. Keep temporary test files from TestKosmosFileSystem under
test.build.data instead of /tmp. (lohit via omalley)
HADOOP-4078. Create test files for TestKosmosFileSystem in separate
directory under test.build.data. (lohit)
HADOOP-3968. Fix getFileBlockLocations calls to use FileStatus instead
of Path reflecting the new API. (Pete Wyckoff via lohit)
HADOOP-3963. libhdfs does not exit on its own, instead it returns error
to the caller and behaves as a true library. (Pete Wyckoff via dhruba)
HADOOP-4100. Removes the cleanupTask scheduling from the Scheduler
implementations and moves it to the JobTracker.
(Amareshwari Sriramadasu via ddas)
HADOOP-4097. Make hive work well with speculative execution turned on.
(Joydeep Sen Sarma via dhruba)
HADOOP-4113. Changes to libhdfs to not exit on its own, rather return
an error code to the caller. (Pete Wyckoff via dhruba)
HADOOP-4054. Remove duplicate lease removal during edit log loading.
(hairong)
HADOOP-4071. FSNameSystem.isReplicationInProgress should add an
underReplicated block to the neededReplication queue using method
"add" not "update". (hairong)
HADOOP-4154. Fix type warnings in WritableUtils. (szetszwo via omalley)
HADOOP-4133. Log files generated by Hive should reside in the
build directory. (Prasad Chakka via dhruba)
HADOOP-4094. Hive now has hive-default.xml and hive-site.xml similar
to core hadoop. (Prasad Chakka via dhruba)
HADOOP-4112. Handles cleanupTask in JobHistory
(Amareshwari Sriramadasu via ddas)
HADOOP-3831. Very slow reading clients sometimes failed while reading.
(rangadi)
HADOOP-4155. Use JobTracker's start time while initializing JobHistory's
JobTracker Unique String. (lohit)
HADOOP-4099. Fix null pointer when using HFTP from an 0.18 server.
(dhruba via omalley)
HADOOP-3570. Includes user specified libjar files in the client side
classpath path. (Sharad Agarwal via ddas)
HADOOP-4129. Changed memory limits of TaskTracker and Tasks to be in
KiloBytes rather than bytes. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4139. Optimize Hive multi group-by.
(Namin Jain via dhruba)
HADOOP-3911. Add a check to fsck options to make sure -files is not
the first option to resolve conflicts with GenericOptionsParser
(lohit)
HADOOP-3623. Refactor LeaseManager. (szetszwo)
HADOOP-4125. Handles Reduce cleanup tip on the web ui.
(Amareshwari Sriramadasu via ddas)
HADOOP-4087. Hive Metastore API for php and python clients.
(Prasad Chakka via dhruba)
HADOOP-4197. Update DATA_TRANSFER_VERSION for HADOOP-3981. (szetszwo)
HADOOP-4138. Refactor the Hive SerDe library to better structure
the interfaces to the serializer and de-serializer.
(Zheng Shao via dhruba)
HADOOP-4195. Close compressor before returning to codec pool.
(acmurthy via omalley)
HADOOP-2403. Escapes some special characters before logging to
history files. (Amareshwari Sriramadasu via ddas)
HADOOP-4200. Fix a bug in the test-patch.sh script.
(Ramya R via nigel)
HADOOP-4084. Add explain plan capabilities to Hive Query Language.
(Ashish Thusoo via dhruba)
HADOOP-4121. Preserve cause for exception if the initialization of
HistoryViewer for JobHistory fails. (Amareshwari Sri Ramadasu via
acmurthy)
HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4077. Setting access and modification time for a file
requires write permissions on the file. (dhruba)
HADOOP-3592. Fix a couple of possible file leaks in FileUtil
(Bill de hOra via rangadi)
HADOOP-4120. Hive interactive shell records the time taken by a
query. (Raghotham Murthy via dhruba)
HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME
and then the path. (Raghotham Murthy via dhruba)
HADOOP-4242. Remove extra ";" in FSDirectory that blocks compilation
in some IDE's. (szetszwo via omalley)
HADOOP-4249. Fix eclipse path to include the hsqldb.jar. (szetszwo via
omalley)
HADOOP-4247. Move InputSampler into org.apache.hadoop.mapred.lib, so that
examples.jar doesn't depend on tools.jar. (omalley)
HADOOP-4269. Fix the deprecation of LineReader by extending the new class
into the old name and deprecating it. Also update the tests to test the
new class. (cdouglas via omalley)
HADOOP-4280. Fix conversions between seconds in C and milliseconds in
Java for access times for files. (Pete Wyckoff via rangadi)
HADOOP-4254. -setSpaceQuota command does not convert "TB" extenstion to
terabytes properly. Implementation now uses StringUtils for parsing this.
(Raghu Angadi)
HADOOP-4259. Findbugs should run over tools.jar also. (cdouglas via
omalley)
HADOOP-4275. Move public method isJobValidName from JobID to a private
method in JobTracker. (omalley)
HADOOP-4173. fix failures in TestProcfsBasedProcessTree and
TestTaskTrackerMemoryManager tests. ProcfsBasedProcessTree and
memory management in TaskTracker are disabled on Windows.
(Vinod K V via rangadi)
HADOOP-4189. Fixes the history blocksize & intertracker protocol version
issues introduced as part of HADOOP-3245. (Amar Kamat via ddas)
HADOOP-4190. Fixes the backward compatibility issue with Job History.
introduced by HADOOP-3245 and HADOOP-2403. (Amar Kamat via ddas)
HADOOP-4237. Fixes the TestStreamingBadRecords.testNarrowDown testcase.
(Sharad Agarwal via ddas)
HADOOP-4274. Capacity scheduler accidently modifies the underlying
data structures when browing the job lists. (Hemanth Yamijala via omalley)
HADOOP-4309. Fix eclipse-plugin compilation. (cdouglas)
HADOOP-4232. Fix race condition in JVM reuse when multiple slots become
free. (ddas via acmurthy)
HADOOP-4302. Fix a race condition in TestReduceFetch that can yield false
negatvies. (cdouglas)
HADOOP-3942. Update distcp documentation to include features introduced in
HADOOP-3873, HADOOP-3939. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4319. fuse-dfs dfs_read function returns as many bytes as it is
told to read unlesss end-of-file is reached. (Pete Wyckoff via dhruba)
HADOOP-4246. Ensure we have the correct lower bound on the number of
retries for fetching map-outputs; also fixed the case where the reducer
automatically kills on too many unique map-outputs could not be fetched
for small jobs. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-4163. Report FSErrors from map output fetch threads instead of
merely logging them. (Sharad Agarwal via cdouglas)
HADOOP-4261. Adds a setup task for jobs. This is required so that we
don't setup jobs that haven't been inited yet (since init could lead
to job failure). Only after the init has successfully happened do we
launch the setupJob task. (Amareshwari Sriramadasu via ddas)
HADOOP-4256. Removes Completed and Failed Job tables from
jobqueue_details.jsp. (Sreekanth Ramakrishnan via ddas)
HADOOP-4267. Occasional exceptions during shutting down HSQLDB is logged
but not rethrown. (enis)
HADOOP-4018. The number of tasks for a single job cannot exceed a
pre-configured maximum value. (dhruba)
HADOOP-4288. Fixes a NPE problem in CapacityScheduler.
(Amar Kamat via ddas)
HADOOP-4014. Create hard links with 'fsutil hardlink' on Windows. (shv)
HADOOP-4393. Merged org.apache.hadoop.fs.permission.AccessControlException
and org.apache.hadoop.security.AccessControlIOException into a single
class hadoop.security.AccessControlException. (omalley via acmurthy)
HADOOP-4287. Fixes an issue to do with maintaining counts of running/pending
maps/reduces. (Sreekanth Ramakrishnan via ddas)
HADOOP-4361. Makes sure that jobs killed from command line are killed
fast (i.e., there is a slot to run the cleanup task soon).
(Amareshwari Sriramadasu via ddas)
HADOOP-4400. Add "hdfs://" to fs.default.name on quickstart.html.
(Jeff Hammerbacher via omalley)
HADOOP-4378. Fix TestJobQueueInformation to use SleepJob rather than
WordCount via TestMiniMRWithDFS. (Sreekanth Ramakrishnan via acmurthy)
HADOOP-4376. Fix formatting in hadoop-default.xml for
hadoop.http.filter.initializers. (Enis Soztutar via acmurthy)
HADOOP-4410. Adds an extra arg to the API FileUtil.makeShellPath to
determine whether to canonicalize file paths or not.
(Amareshwari Sriramadasu via ddas)
HADOOP-4236. Ensure un-initialized jobs are killed correctly on
user-demand. (Sharad Agarwal via acmurthy)
HADOOP-4373. Fix calculation of Guaranteed Capacity for the
capacity-scheduler. (Hemanth Yamijala via acmurthy)
HADOOP-4053. Schedulers must be notified when jobs complete. (Amar Kamat via omalley)
HADOOP-4335. Fix FsShell -ls for filesystems without owners/groups. (David
Phillips via cdouglas)
HADOOP-4426. TestCapacityScheduler broke due to the two commits HADOOP-4053
and HADOOP-4373. This patch fixes that. (Hemanth Yamijala via ddas)
HADOOP-4418. Updates documentation in forrest for Mapred, streaming and pipes.
(Amareshwari Sriramadasu via ddas)
HADOOP-3155. Ensure that there is only one thread fetching
TaskCompletionEvents on TaskTracker re-init. (Dhruba Borthakur via
acmurthy)
HADOOP-4425. Fix EditLogInputStream to overload the bulk read method.
(cdouglas)
HADOOP-4427. Adds the new queue/job commands to the manual.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4278. Increase debug logging for unit test TestDatanodeDeath.
Fix the case when primary is dead. (dhruba via szetszwo)
HADOOP-4423. Keep block length when the block recovery is triggered by
append. (szetszwo)
HADOOP-4449. Fix dfsadmin usage. (Raghu Angadi via cdouglas)
HADOOP-4455. Added TestSerDe so that unit tests can run successfully.
(Ashish Thusoo via dhruba)
HADOOP-4457. Fixes an input split logging problem introduced by
HADOOP-3245. (Amareshwari Sriramadasu via ddas)
HADOOP-4464. Separate out TestFileCreationClient from TestFileCreation.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4404. saveFSImage() removes files from a storage directory that do
not correspond to its type. (shv)
HADOOP-4149. Fix handling of updates to the job priority, by changing the
list of jobs to be keyed by the priority, submit time, and job tracker id.
(Amar Kamat via omalley)
HADOOP-4296. Fix job client failures by not retiring a job as soon as it
is finished. (dhruba)
HADOOP-4439. Remove configuration variables that aren't usable yet, in
particular mapred.tasktracker.tasks.maxmemory and mapred.task.max.memory.
(Hemanth Yamijala via omalley)
HADOOP-4230. Fix for serde2 interface, limit operator, select * operator,
UDF trim functions and sampling. (Ashish Thusoo via dhruba)
HADOOP-4358. No need to truncate access time in INode. Also fixes NPE
in CreateEditsLog. (Raghu Angadi)
HADOOP-4387. TestHDFSFileSystemContract fails on windows nightly builds.
(Raghu Angadi)
HADOOP-4466. Ensure that SequenceFileOutputFormat isn't tied to Writables
and can be used with other Serialization frameworks. (Chris Wensel via
acmurthy)
HADOOP-4525. Fix ipc.server.ipcnodelay originally missed in in HADOOP-2232.
(cdouglas via Clint Morgan)
HADOOP-4498. Ensure that JobHistory correctly escapes the job name so that
regex patterns work. (Chris Wensel via acmurthy)
HADOOP-4446. Modify guaranteed capacity labels in capacity scheduler's UI
to reflect the information being displayed. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4282. Some user facing URLs are not filtered by user filters.
(szetszwo)
HADOOP-4595. Fixes two race conditions - one to do with updating free slot count,
and another to do with starting the MapEventsFetcher thread. (ddas)
HADOOP-4552. Fix a deadlock in RPC server. (Raghu Angadi)
HADOOP-4471. Sort running jobs by priority in the capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4500. Fix MultiFileSplit to get the FileSystem from the relevant
path rather than the JobClient. (Joydeep Sen Sarma via cdouglas)
Release 0.18.4 - Unreleased
BUG FIXES
HADOOP-5114. Remove timeout for accept() in DataNode. This makes accept()
fail in JDK on Windows and causes many tests to fail. (Raghu Angadi)
HADOOP-5192. Block receiver should not remove a block that's created or
being written by other threads. (hairong)
HADOOP-5134. FSNamesystem#commitBlockSynchronization adds under-construction
block locations to blocksMap. (Dhruba Borthakur via hairong)
HADOOP-5412. Simulated DataNode should not write to a block that's being
written by another thread. (hairong)
HADOOP-5465. Fix the problem of blocks remaining under-replicated by
providing synchronized modification to the counter xmitsInProgress in
DataNode. (hairong)
HADOOP-5557. Fixes some minor problems in TestOverReplicatedBlocks.
(szetszwo)
HADOOP-5644. Namenode is stuck in safe mode. (suresh Srinivas via hairong)
HADOOP-6017. Lease Manager in NameNode does not handle certain characters
in filenames. This results in fatal errors in Secondary NameNode and while
restrating NameNode. (Tsz Wo (Nicholas), SZE via rangadi)
Release 0.18.3 - 2009-01-27
IMPROVEMENTS
HADOOP-4150. Include librecordio in hadoop releases. (Giridharan Kesavan
via acmurthy)
HADOOP-4668. Improve documentation for setCombinerClass to clarify the
restrictions on combiners. (omalley)
BUG FIXES
HADOOP-4499. DFSClient should invoke checksumOk only once. (Raghu Angadi)
HADOOP-4597. Calculate mis-replicated blocks when safe-mode is turned
off manually. (shv)
HADOOP-3121. lsr should keep listing the remaining items but not
terminate if there is any IOException. (szetszwo)
HADOOP-4610. Always calculate mis-replicated blocks when safe-mode is
turned off. (shv)
HADOOP-3883. Limit namenode to assign at most one generation stamp for
a particular block within a short period. (szetszwo)
HADOOP-4556. Block went missing. (hairong)
HADOOP-4643. NameNode should exclude excessive replicas when counting
live replicas for a block. (hairong)
HADOOP-4703. Should not wait for proxy forever in lease recovering.
(szetszwo)
HADOOP-4647. NamenodeFsck should close the DFSClient it has created.
(szetszwo)
HADOOP-4616. Fuse-dfs can handle bad values from FileSystem.read call.
(Pete Wyckoff via dhruba)
HADOOP-4061. Throttle Datanode decommission monitoring in Namenode.
(szetszwo)
HADOOP-4659. Root cause of connection failure is being lost to code that
uses it for delaying startup. (Steve Loughran and Hairong via hairong)
HADOOP-4614. Lazily open segments when merging map spills to avoid using
too many file descriptors. (Yuri Pradkin via cdouglas)
HADOOP-4257. The DFS client should pick only one datanode as the candidate
to initiate lease recovery. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-4713. Fix librecordio to handle records larger than 64k. (Christian
Kunz via cdouglas)
HADOOP-4635. Fix a memory leak in fuse dfs. (pete wyckoff via mahadev)
HADOOP-4714. Report status between merges and make the number of records
between progress reports configurable. (Jothi Padmanabhan via cdouglas)
HADOOP-4726. Fix documentation typos "the the". (Edward J. Yoon via
szetszwo)
HADOOP-4679. Datanode prints tons of log messages: waiting for threadgroup
to exit, active threads is XX. (hairong)
HADOOP-4746. Job output directory should be normalized. (hairong)
HADOOP-4717. Removal of default port# in NameNode.getUri() causes a
map/reduce job failed to prompt temporary output. (hairong)
HADOOP-4778. Check for zero size block meta file when updating a block.
(szetszwo)
HADOOP-4742. Replica gets deleted by mistake. (Wang Xu via hairong)
HADOOP-4702. Failed block replication leaves an incomplete block in
receiver's tmp data directory. (hairong)
HADOOP-4613. Fix block browsing on Web UI. (Johan Oskarsson via shv)
HADOOP-4806. HDFS rename should not use src path as a regular expression.
(szetszwo)
HADOOP-4795. Prevent lease monitor getting into an infinite loop when
leases and the namespace tree does not match. (szetszwo)
HADOOP-4620. Fixes Streaming to handle well the cases of map/reduce with empty
input/output. (Ravi Gummadi via ddas)
HADOOP-4857. Fixes TestUlimit to have exactly 1 map in the jobs spawned.
(Ravi Gummadi via ddas)
HADOOP-4810. Data lost at cluster startup time. (hairong)
HADOOP-4797. Improve how RPC server reads and writes large buffers. Avoids
soft-leak of direct buffers and excess copies in NIO layer. (Raghu Angadi)
HADOOP-4840. TestNodeCount sometimes fails with NullPointerException.
(hairong)
HADOOP-4904. Fix deadlock while leaving safe mode. (shv)
HADOOP-1980. 'dfsadmin -safemode enter' should prevent the namenode from
leaving safemode automatically. (shv)
HADOOP-4951. Lease monitor should acquire the LeaseManager lock but not the
Monitor lock. (szetszwo)
HADOOP-4935. processMisReplicatedBlocks() should not clear
excessReplicateMap. (shv)
HADOOP-4961. Fix ConcurrentModificationException in lease recovery
of empty files. (shv)
HADOOP-4971. A long (unexpected) delay at datanodes could make subsequent
block reports from many datanode at the same time. (Raghu Angadi)
HADOOP-4910. NameNode should exclude replicas when choosing excessive
replicas to delete to avoid data lose. (hairong)
HADOOP-4983. Fixes a problem in updating Counters in the status reporting.
(Amareshwari Sriramadasu via ddas)
Release 0.18.2 - 2008-11-03
BUG FIXES
HADOOP-3614. Fix a bug that Datanode may use an old GenerationStamp to get
meta file. (szetszwo)
HADOOP-4314. Simulated datanodes should not include blocks that are still
being written in their block report. (Raghu Angadi)
HADOOP-4228. dfs datanode metrics, bytes_read and bytes_written, overflow
due to incorrect type used. (hairong)
HADOOP-4395. The FSEditLog loading is incorrect for the case OP_SET_OWNER.
(szetszwo)
HADOOP-4351. FSNamesystem.getBlockLocationsInternal throws
ArrayIndexOutOfBoundsException. (hairong)
HADOOP-4403. Make TestLeaseRecovery and TestFileCreation more robust.
(szetszwo)
HADOOP-4292. Do not support append() for LocalFileSystem. (hairong)
HADOOP-4399. Make fuse-dfs multi-thread access safe.
(Pete Wyckoff via dhruba)
HADOOP-4369. Use setMetric(...) instead of incrMetric(...) for metrics
averages. (Brian Bockelman via szetszwo)
HADOOP-4469. Rename and add the ant task jar file to the tar file. (nigel)
HADOOP-3914. DFSClient sends Checksum Ok only once for a block.
(Christian Kunz via hairong)
HADOOP-4467. SerializationFactory now uses the current context ClassLoader
allowing for user supplied Serialization instances. (Chris Wensel via
acmurthy)
HADOOP-4517. Release FSDataset lock before joining ongoing create threads.
(szetszwo)
HADOOP-4526. fsck failing with NullPointerException. (hairong)
HADOOP-4483 Honor the max parameter in DatanodeDescriptor.getBlockArray(..)
(Ahad Rana and Hairong Kuang via szetszwo)
HADOOP-4340. Correctly set the exit code from JobShell.main so that the
'hadoop jar' command returns the right code to the user. (acmurthy)
NEW FEATURES
HADOOP-2421. Add jdiff output to documentation, listing all API
changes from the prior release. (cutting)
Release 0.18.1 - 2008-09-17
IMPROVEMENTS
HADOOP-3934. Upgrade log4j to 1.2.15. (omalley)
BUG FIXES
HADOOP-3995. In case of quota failure on HDFS, rename does not restore
source filename. (rangadi)
HADOOP-3821. Prevent SequenceFile and IFile from duplicating codecs in
CodecPool when closed more than once. (Arun Murthy via cdouglas)
HADOOP-4040. Remove coded default of the IPC idle connection timeout
from the TaskTracker, which was causing HDFS client connections to not be
collected. (ddas via omalley)
HADOOP-4046. Made WritableComparable's constructor protected instead of
private to re-enable class derivation. (cdouglas via omalley)
HADOOP-3940. Fix in-memory merge condition to wait when there are no map
outputs or when the final map outputs are being fetched without contention.
(cdouglas)
Release 0.18.0 - 2008-08-19
INCOMPATIBLE CHANGES
HADOOP-2703. The default options to fsck skips checking files
that are being written to. The output of fsck is incompatible
with previous release. (lohit vijayarenu via dhruba)
HADOOP-2865. FsShell.ls() printout format changed to print file names
in the end of the line. (Edward J. Yoon via shv)
HADOOP-3283. The Datanode has a RPC server. It currently supports
two RPCs: the first RPC retrives the metadata about a block and the
second RPC sets the generation stamp of an existing block.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2797. Code related to upgrading to 0.14 (Block CRCs) is
removed. As result, upgrade to 0.18 or later from 0.13 or earlier
is not supported. If upgrading from 0.13 or earlier is required,
please upgrade to an intermediate version (0.14-0.17) and then
to this version. (rangadi)
HADOOP-544. This issue introduces new classes JobID, TaskID and
TaskAttemptID, which should be used instead of their string counterparts.
Functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and
TaskCompletionEvent that use string arguments are deprecated in favor
of the corresponding ones that use ID objects. Applications can use
xxxID.toString() and xxxID.forName() methods to convert/restore objects
to/from strings. (Enis Soztutar via ddas)
HADOOP-2188. RPC client sends a ping rather than throw timeouts.
RPC server does not throw away old RPCs. If clients and the server are on
different versions, they are not able to function well. In addition,
The property ipc.client.timeout is removed from the default hadoop
configuration. It also removes metrics RpcOpsDiscardedOPsNum. (hairong)
HADOOP-2181. This issue adds logging for input splits in Jobtracker log
and jobHistory log. Also adds web UI for viewing input splits in job UI
and history UI. (Amareshwari Sriramadasu via ddas)
HADOOP-3226. Run combiners multiple times over map outputs as they
are merged in both the map and the reduce tasks. (cdouglas via omalley)
HADOOP-3329. DatanodeDescriptor objects should not be stored in the
fsimage. (dhruba)
HADOOP-2656. The Block object has a generation stamp inside it.
Existing blocks get a generation stamp of 0. This is needed to support
appends. (dhruba)
HADOOP-3390. Removed deprecated ClientProtocol.abandonFileInProgress().
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3405. Made some map/reduce internal classes non-public:
MapTaskStatus, ReduceTaskStatus, JobSubmissionProtocol,
CompletedJobStatusStore. (enis via omaley)
HADOOP-3265. Removed depcrecated API getFileCacheHints().
(Lohit Vijayarenu via rangadi)
HADOOP-3310. The namenode instructs the primary datanode to do lease
recovery. The block gets a new generation stamp.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2909. Improve IPC idle connection management. Property
ipc.client.maxidletime is removed from the default configuration,
instead it is defined as twice of the ipc.client.connection.maxidletime.
A connection with outstanding requests won't be treated as idle.
(hairong)
HADOOP-3459. Change in the output format of dfs -ls to more closely match
/bin/ls. New format is: perm repl owner group size date name
(Mukund Madhugiri via omally)
HADOOP-3113. An fsync invoked on a HDFS file really really
persists data! The datanode moves blocks in the tmp directory to
the real block directory on a datanode-restart. (dhruba)
HADOOP-3452. Change fsck to return non-zero status for a corrupt
FileSystem. (lohit vijayarenu via cdouglas)
HADOOP-3193. Include the address of the client that found the corrupted
block in the log. Also include a CorruptedBlocks metric to track the size
of the corrupted block map. (cdouglas)
HADOOP-3512. Separate out the tools into a tools jar. (omalley)
HADOOP-3598. Ensure that temporary task-output directories are not created
if they are not necessary e.g. for Maps with no side-effect files.
(acmurthy)
HADOOP-3665. Modify WritableComparator so that it only creates instances
of the keytype if the type does not define a WritableComparator. Calling
the superclass compare will throw a NullPointerException. Also define
a RawComparator for NullWritable and permit it to be written as a key
to SequenceFiles. (cdouglas)
HADOOP-3673. Avoid deadlock caused by DataNode RPC receoverBlock().
(Tsz Wo (Nicholas), SZE via rangadi)
NEW FEATURES
HADOOP-3074. Provides a UrlStreamHandler for DFS and other FS,
relying on FileSystem (taton)
HADOOP-2585. Name-node imports namespace data from a recent checkpoint
accessible via a NFS mount. (shv)
HADOOP-3061. Writable types for doubles and bytes. (Andrzej
Bialecki via omalley)
HADOOP-2857. Allow libhdfs to set jvm options. (Craig Macdonald
via omalley)
HADOOP-3317. Add default port for HDFS namenode. The port in
"hdfs:" URIs now defaults to 8020, so that one may simply use URIs
of the form "hdfs://example.com/dir/file". (cutting)
HADOOP-2019. Adds support for .tar, .tgz and .tar.gz files in
DistributedCache (Amareshwari Sriramadasu via ddas)
HADOOP-3058. Add FSNamesystem status metrics.
(Lohit Vjayarenu via rangadi)
HADOOP-1915. Allow users to specify counters via strings instead
of enumerations. (tomwhite via omalley)
HADOOP-2065. Delay invalidating corrupt replicas of block until its
is removed from under replicated state. If all replicas are found to
be corrupt, retain all copies and mark the block as corrupt.
(Lohit Vjayarenu via rangadi)
HADOOP-3221. Adds org.apache.hadoop.mapred.lib.NLineInputFormat, which
splits files into splits each of N lines. N can be specified by
configuration property "mapred.line.input.format.linespermap", which
defaults to 1. (Amareshwari Sriramadasu via ddas)
HADOOP-3336. Direct a subset of annotated FSNamesystem calls for audit
logging. (cdouglas)
HADOOP-3400. A new API FileSystem.deleteOnExit() that facilitates
handling of temporary files in HDFS. (dhruba)
HADOOP-4. Add fuse-dfs to contrib, permitting one to mount an
HDFS filesystem on systems that support FUSE, e.g., Linux.
(Pete Wyckoff via cutting)
HADOOP-3246. Add FTPFileSystem. (Ankur Goel via cutting)
HADOOP-3250. Extend FileSystem API to allow appending to files.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3177. Implement Syncable interface for FileSystem.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-1328. Implement user counters in streaming. (tomwhite via
omalley)
HADOOP-3187. Quotas for namespace management. (Hairong Kuang via ddas)
HADOOP-3307. Support for Archives in Hadoop. (Mahadev Konar via ddas)
HADOOP-3460. Add SequenceFileAsBinaryOutputFormat to permit direct
writes of serialized data. (Koji Noguchi via cdouglas)
HADOOP-3230. Add ability to get counter values from command
line. (tomwhite via omalley)
HADOOP-930. Add support for native S3 files. (tomwhite via cutting)
HADOOP-3502. Quota API needs documentation in Forrest. (hairong)
HADOOP-3413. Allow SequenceFile.Reader to use serialization
framework. (tomwhite via omalley)
HADOOP-3541. Import of the namespace from a checkpoint documented
in hadoop user guide. (shv)
IMPROVEMENTS
HADOOP-3677. Simplify generation stamp upgrade by making is a
local upgrade on datandodes. Deleted distributed upgrade.
(rangadi)
HADOOP-2928. Remove deprecated FileSystem.getContentLength().
(Lohit Vijayarenu via rangadi)
HADOOP-3130. Make the connect timeout smaller for getFile.
(Amar Ramesh Kamat via ddas)
HADOOP-3160. Remove deprecated exists() from ClientProtocol and
FSNamesystem (Lohit Vjayarenu via rangadi)
HADOOP-2910. Throttle IPC Clients during bursts of requests or
server slowdown. Clients retry connection for up to 15 minutes
when socket connection times out. (hairong)
HADOOP-3295. Allow TextOutputFormat to use configurable spearators.
(Zheng Shao via cdouglas).
HADOOP-3308. Improve QuickSort by excluding values eq the pivot from the
partition. (cdouglas)
HADOOP-2461. Trim property names in configuration.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2799. Deprecate o.a.h.io.Closable in favor of java.io.Closable.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3345. Enhance the hudson-test-patch target to cleanup messages,
fix minor defects, and add eclipse plugin and python unit tests. (nigel)
HADOOP-3144. Improve robustness of LineRecordReader by defining a maximum
line length (mapred.linerecordreader.maxlength), thereby avoiding reading
too far into the following split. (Zheng Shao via cdouglas)
HADOOP-3334. Move lease handling from FSNamesystem into a seperate class.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3332. Reduces the amount of logging in Reducer's shuffle phase.
(Devaraj Das)
HADOOP-3355. Enhances Configuration class to accept hex numbers for getInt
and getLong. (Amareshwari Sriramadasu via ddas)
HADOOP-3350. Add an argument to distcp to permit the user to limit the
number of maps. (cdouglas)
HADOOP-3013. Add corrupt block reporting to fsck.
(lohit vijayarenu via cdouglas)
HADOOP-3377. Remove TaskRunner::replaceAll and replace with equivalent
String::replace. (Brice Arnould via cdouglas)
HADOOP-3398. Minor improvement to a utility function in that participates
in backoff calculation. (cdouglas)
HADOOP-3381. Clear referenced when directories are deleted so that
effect of memory leaks are not multiplied. (rangadi)
HADOOP-2867. Adds the task's CWD to its LD_LIBRARY_PATH.
(Amareshwari Sriramadasu via ddas)
HADOOP-3232. DU class runs the 'du' command in a seperate thread so
that it does not block user. DataNode misses heartbeats in large
nodes otherwise. (Johan Oskarsson via rangadi)
HADOOP-3035. During block transfers between datanodes, the receiving
datanode, now can report corrupt replicas received from src node to
the namenode. (Lohit Vijayarenu via rangadi)
HADOOP-3434. Retain the cause of the bind failure in Server::bind.
(Steve Loughran via cdouglas)
HADOOP-3429. Increases the size of the buffers used for the communication
for Streaming jobs. (Amareshwari Sriramadasu via ddas)
HADOOP-3486. Change default for initial block report to 0 seconds
and document it. (Sanjay Radia via omalley)
HADOOP-3448. Improve the text in the assertion making sure the
layout versions are consistent in the data node. (Steve Loughran
via omalley)
HADOOP-2095. Improve the Map-Reduce shuffle/merge by cutting down
buffer-copies; changed intermediate sort/merge to use the new IFile format
rather than SequenceFiles and compression of map-outputs is now
implemented by compressing the entire file rather than SequenceFile
compression. Shuffle also has been changed to use a simple byte-buffer
manager rather than the InMemoryFileSystem.
Configuration changes to hadoop-default.xml:
deprecated mapred.map.output.compression.type
(acmurthy)
HADOOP-236. JobTacker now refuses connection from a task tracker with a
different version number. (Sharad Agarwal via ddas)
HADOOP-3427. Improves the shuffle scheduler. It now waits for notifications
from shuffle threads when it has scheduled enough, before scheduling more.
(ddas)
HADOOP-2393. Moves the handling of dir deletions in the tasktracker to
a separate thread. (Amareshwari Sriramadasu via ddas)
HADOOP-3501. Deprecate InMemoryFileSystem. (cutting via omalley)
HADOOP-3366. Stall the shuffle while in-memory merge is in progress.
(acmurthy)
HADOOP-2916. Refactor src structure, but leave package structure alone.
(Raghu Angadi via mukund)
HADOOP-3492. Add forrest documentation for user archives.
(Mahadev Konar via hairong)
HADOOP-3467. Improve documentation for FileSystem::deleteOnExit.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3379. Documents stream.non.zero.exit.status.is.failure for Streaming.
(Amareshwari Sriramadasu via ddas)
HADOOP-3096. Improves documentation about the Task Execution Environment in
the Map-Reduce tutorial. (Amareshwari Sriramadasu via ddas)
HADOOP-2984. Add forrest documentation for DistCp. (cdouglas)
HADOOP-3406. Add forrest documentation for Profiling.
(Amareshwari Sriramadasu via ddas)
HADOOP-2762. Add forrest documentation for controls of memory limits on
hadoop daemons and Map-Reduce tasks. (Amareshwari Sriramadasu via ddas)
HADOOP-3535. Fix documentation and name of IOUtils.close to
reflect that it should only be used in cleanup contexts. (omalley)
HADOOP-3593. Updates the mapred tutorial. (ddas)
HADOOP-3547. Documents the way in which native libraries can be distributed
via the DistributedCache. (Amareshwari Sriramadasu via ddas)
HADOOP-3606. Updates the Streaming doc. (Amareshwari Sriramadasu via ddas)
HADOOP-3532. Add jdiff reports to the build scripts. (omalley)
HADOOP-3100. Develop tests to test the DFS command line interface. (mukund)
HADOOP-3688. Fix up HDFS docs. (Robert Chansler via hairong)
OPTIMIZATIONS
HADOOP-3274. The default constructor of BytesWritable creates empty
byte array. (Tsz Wo (Nicholas), SZE via shv)
HADOOP-3272. Remove redundant copy of Block object in BlocksMap.
(Lohit Vjayarenu via shv)
HADOOP-3164. Reduce DataNode CPU usage by using FileChannel.tranferTo().
On Linux DataNode takes 5 times less CPU while serving data. Results may
vary on other platforms. (rangadi)
HADOOP-3248. Optimization of saveFSImage. (Dhruba via shv)
HADOOP-3297. Fetch more task completion events from the job
tracker and task tracker. (ddas via omalley)
HADOOP-3364. Faster image and log edits loading. (shv)
HADOOP-3369. Fast block processing during name-node startup. (shv)
HADOOP-1702. Reduce buffer copies when data is written to DFS.
DataNodes take 30% less CPU while writing data. (rangadi)
HADOOP-3095. Speed up split generation in the FileInputSplit,
especially for non-HDFS file systems. Deprecates
InputFormat.validateInput. (tomwhite via omalley)
HADOOP-3552. Add forrest documentation for Hadoop commands.
(Sharad Agarwal via cdouglas)
BUG FIXES
HADOOP-2905. 'fsck -move' triggers NPE in NameNode.
(Lohit Vjayarenu via rangadi)
Increment ClientProtocol.versionID missed by HADOOP-2585. (shv)
HADOOP-3254. Restructure internal namenode methods that process
heartbeats to use well-defined BlockCommand object(s) instead of
using the base java Object. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3176. Change lease record when a open-for-write-file
gets renamed. (dhruba)
HADOOP-3269. Fix a case when namenode fails to restart
while processing a lease record. ((Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3282. Port issues in TestCheckpoint resolved. (shv)
HADOOP-3268. file:// URLs issue in TestUrlStreamHandler under Windows.
(taton)
HADOOP-3127. Deleting files in trash should really remove them.
(Brice Arnould via omalley)
HADOOP-3300. Fix locking of explicit locks in NetworkTopology.
(tomwhite via omalley)
HADOOP-3270. Constant DatanodeCommands are stored in static final
immutable variables for better code clarity.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2793. Fix broken links for worst performing shuffle tasks in
the job history page. (Amareshwari Sriramadasu via ddas)
HADOOP-3313. Avoid unnecessary calls to System.currentTimeMillis
in RPC::Invoker. (cdouglas)
HADOOP-3318. Recognize "Darwin" as an alias for "Mac OS X" to
support Soylatte. (Sam Pullara via omalley)
HADOOP-3301. Fix misleading error message when S3 URI hostname
contains an underscore. (tomwhite via omalley)
HADOOP-3338. Fix Eclipse plugin to compile after HADOOP-544 was
committed. Updated all references to use the new JobID representation.
(taton via nigel)
HADOOP-3337. Loading FSEditLog was broken by HADOOP-3283 since it
changed Writable serialization of DatanodeInfo. This patch handles it.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3101. Prevent JobClient from throwing an exception when printing
usage. (Edward J. Yoon via cdouglas)
HADOOP-3119. Update javadoc for Text::getBytes to better describe its
behavior. (Tim Nelson via cdouglas)
HADOOP-2294. Fix documentation in libhdfs to refer to the correct free
function. (Craig Macdonald via cdouglas)
HADOOP-3335. Prevent the libhdfs build from deleting the wrong
files on make clean. (cutting via omalley)
HADOOP-2930. Make {start,stop}-balancer.sh work even if hadoop-daemon.sh
is not in the PATH. (Spiros Papadimitriou via hairong)
HADOOP-3085. Catch Exception in metrics util classes to ensure that
misconfigured metrics don't prevent others from updating. (cdouglas)
HADOOP-3299. CompositeInputFormat should configure the sub-input
formats. (cdouglas via omalley)
HADOOP-3309. Lower io.sort.mb and fs.inmemory.size.mb for MiniMRDFSSort
unit test so it passes on Windows. (lohit vijayarenu via cdouglas)
HADOOP-3348. TestUrlStreamHandler should set URLStreamFactory after
DataNodes are initialized. (Lohit Vijayarenu via rangadi)
HADOOP-3371. Ignore InstanceAlreadyExistsException from
MBeanUtil::registerMBean. (lohit vijayarenu via cdouglas)
HADOOP-3349. A file rename was incorrectly changing the name inside a
lease record. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3365. Removes an unnecessary copy of the key from SegmentDescriptor
to MergeQueue. (Devaraj Das)
HADOOP-3388. Fix for TestDatanodeBlockScanner to handle blocks with
generation stamps in them. (dhruba)
HADOOP-3203. Fixes TaskTracker::localizeJob to pass correct file sizes
for the jarfile and the jobfile. (Amareshwari Sriramadasu via ddas)
HADOOP-3391. Fix a findbugs warning introduced by HADOOP-3248 (rangadi)
HADOOP-3393. Fix datanode shutdown to call DataBlockScanner::shutdown and
close its log, even if the scanner thread is not running. (lohit vijayarenu
via cdouglas)
HADOOP-3399. A debug message was logged at info level. (rangadi)
HADOOP-3396. TestDatanodeBlockScanner occationally fails.
(Lohit Vijayarenu via rangadi)
HADOOP-3339. Some of the failures on 3rd datanode in DFS write pipelie
are not detected properly. This could lead to hard failure of client's
write operation. (rangadi)
HADOOP-3409. Namenode should save the root inode into fsimage. (hairong)
HADOOP-3296. Fix task cache to work for more than two levels in the cache
hierarchy. This also adds a new counter to track cache hits at levels
greater than two. (Amar Kamat via cdouglas)
HADOOP-3375. Lease paths were sometimes not removed from
LeaseManager.sortedLeasesByPath. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3424. Values returned by getPartition should be checked to
make sure they are in the range 0 to #reduces - 1 (cdouglas via
omalley)
HADOOP-3408. Change FSNamesystem to send its metrics as integers to
accommodate collectors that don't support long values. (lohit vijayarenu
via cdouglas)
HADOOP-3403. Fixes a problem in the JobTracker to do with handling of lost
tasktrackers. (Arun Murthy via ddas)
HADOOP-1318. Completed maps are not failed if the number of reducers are
zero. (Amareshwari Sriramadasu via ddas).
HADOOP-3351. Fixes the history viewer tool to not do huge StringBuffer
allocations. (Amareshwari Sriramadasu via ddas)
HADOOP-3419. Fixes TestFsck to wait for updates to happen before
checking results to make the test more reliable. (Lohit Vijaya
Renu via omalley)
HADOOP-3259. Makes failure to read system properties due to a
security manager non-fatal. (Edward Yoon via omalley)
HADOOP-3451. Update libhdfs to use FileSystem::getFileBlockLocations
instead of removed getFileCacheHints. (lohit vijayarenu via cdouglas)
HADOOP-3401. Update FileBench to set the new
"mapred.work.output.dir" property to work post-3041. (cdouglas via omalley)
HADOOP-2669. DFSClient locks pendingCreates appropriately. (dhruba)
HADOOP-3410. Fix KFS implemenation to return correct file
modification time. (Sriram Rao via cutting)
HADOOP-3340. Fix DFS metrics for BlocksReplicated, HeartbeatsNum, and
BlockReportsAverageTime. (lohit vijayarenu via cdouglas)
HADOOP-3435. Remove the assuption in the scripts that bash is at
/bin/bash and fix the test patch to require bash instead of sh.
(Brice Arnould via omalley)
HADOOP-3471. Fix spurious errors from TestIndexedSort and add additional
logging to let failures be reproducible. (cdouglas)
HADOOP-3443. Avoid copying map output across partitions when renaming a
single spill. (omalley via cdouglas)
HADOOP-3454. Fix Text::find to search only valid byte ranges. (Chad Whipkey
via cdouglas)
HADOOP-3417. Removes the static configuration variable,
commandLineConfig from JobClient. Moves the cli parsing from
JobShell to GenericOptionsParser. Thus removes the class
org.apache.hadoop.mapred.JobShell. (Amareshwari Sriramadasu via
ddas)
HADOOP-2132. Only RUNNING/PREP jobs can be killed. (Jothi Padmanabhan
via ddas)
HADOOP-3476. Code cleanup in fuse-dfs.
(Peter Wyckoff via dhruba)
HADOOP-2427. Ensure that the cwd of completed tasks is cleaned-up
correctly on task-completion. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2565. Remove DFSPath cache of FileStatus.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3326. Cleanup the local-fs and in-memory merge in the ReduceTask by
spawing only one thread each for the on-disk and in-memory merge.
(Sharad Agarwal via acmurthy)
HADOOP-3493. Fix TestStreamingFailure to use FileUtil.fullyDelete to
ensure correct cleanup. (Lohit Vijayarenu via acmurthy)
HADOOP-3455. Fix NPE in ipc.Client in case of connection failure and
improve its synchronization. (hairong)
HADOOP-3240. Fix a testcase to not create files in the current directory.
Instead the file is created in the test directory (Mahadev Konar via ddas)
HADOOP-3496. Fix failure in TestHarFileSystem.testArchives due to change
in HADOOP-3095. (tomwhite)
HADOOP-3135. Get the system directory from the JobTracker instead of from
the conf. (Subramaniam Krishnan via ddas)
HADOOP-3503. Fix a race condition when client and namenode start
simultaneous recovery of the same block. (dhruba & Tsz Wo
(Nicholas), SZE)
HADOOP-3440. Fixes DistributedCache to not create symlinks for paths which
don't have fragments even when createSymLink is true.
(Abhijit Bagri via ddas)
HADOOP-3463. Hadoop-daemons script should cd to $HADOOP_HOME. (omalley)
HADOOP-3489. Fix NPE in SafeModeMonitor. (Lohit Vijayarenu via shv)
HADOOP-3509. Fix NPE in FSNamesystem.close. (Tsz Wo (Nicholas), SZE via
shv)
HADOOP-3491. Name-node shutdown causes InterruptedException in
ResolutionMonitor. (Lohit Vijayarenu via shv)
HADOOP-3511. Fixes namenode image to not set the root's quota to an
invalid value when the quota was not saved in the image. (hairong)
HADOOP-3516. Ensure the JobClient in HadoopArchives is initialized
with a configuration. (Subramaniam Krishnan via omalley)
HADOOP-3513. Improve NNThroughputBenchmark log messages. (shv)
HADOOP-3519. Fix NPE in DFS FileSystem rename. (hairong via tomwhite)
HADOOP-3528. Metrics FilesCreated and files_deleted metrics
do not match. (Lohit via Mahadev)
HADOOP-3418. When a directory is deleted, any leases that point to files
in the subdirectory are removed. ((Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3542. Diables the creation of _logs directory for the archives
directory. (Mahadev Konar via ddas)
HADOOP-3544. Fixes a documentation issue for hadoop archives.
(Mahadev Konar via ddas)
HADOOP-3517. Fixes a problem in the reducer due to which the last InMemory
merge may be missed. (Arun Murthy via ddas)
HADOOP-3548. Fixes build.xml to copy all *.jar files to the dist.
(Owen O'Malley via ddas)
HADOOP-3363. Fix unformatted storage detection in FSImage. (shv)
HADOOP-3560. Fixes a problem to do with split creation in archives.
(Mahadev Konar via ddas)
HADOOP-3545. Fixes a overflow problem in archives.
(Mahadev Konar via ddas)
HADOOP-3561. Prevent the trash from deleting its parent directories.
(cdouglas)
HADOOP-3575. Fix the clover ant target after package refactoring.
(Nigel Daley via cdouglas)
HADOOP-3539. Fix the tool path in the bin/hadoop script under
cygwin. (Tsz Wo (Nicholas), Sze via omalley)
HADOOP-3520. TestDFSUpgradeFromImage triggers a race condition in the
Upgrade Manager. Fixed. (dhruba)
HADOOP-3586. Provide deprecated, backwards compatibile semantics for the
combiner to be run once and only once on each record. (cdouglas)
HADOOP-3533. Add deprecated methods to provide API compatibility
between 0.18 and 0.17. Remove the deprecated methods in trunk. (omalley)
HADOOP-3580. Fixes a problem to do with specifying a har as an input to
a job. (Mahadev Konar via ddas)
HADOOP-3333. Don't assign a task to a tasktracker that it failed to
execute earlier (used to happen in the case of lost tasktrackers where
the tasktracker would reinitialize and bind to a different port).
(Jothi Padmanabhan and Arun Murthy via ddas)
HADOOP-3534. Log IOExceptions that happen in closing the name
system when the NameNode shuts down. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3546. TaskTracker re-initialization gets stuck in cleaning up.
(Amareshwari Sriramadasu via ddas)
HADOOP-3576. Fix NullPointerException when renaming a directory
to its subdirectory. (Tse Wo (Nicholas), SZE via hairong)
HADOOP-3320. Fix NullPointerException in NetworkTopology.getDistance().
(hairong)
HADOOP-3569. KFS input stream read() now correctly reads 1 byte
instead of 4. (Sriram Rao via omalley)
HADOOP-3599. Fix JobConf::setCombineOnceOnly to modify the instance rather
than a parameter. (Owen O'Malley via cdouglas)
HADOOP-3590. Null pointer exception in JobTracker when the task tracker is
not yet resolved. (Amar Ramesh Kamat via ddas)
HADOOP-3603. Fix MapOutputCollector to spill when io.sort.spill.percent is
1.0 and to detect spills when emitted records write no data. (cdouglas)
HADOOP-3615. Set DatanodeProtocol.versionID to the correct value.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3559. Fix the libhdfs test script and config to work with the
current semantics. (lohit vijayarenu via cdouglas)
HADOOP-3480. Need to update Eclipse template to reflect current trunk.
(Brice Arnould via tomwhite)
HADOOP-3588. Fixed usability issues with archives. (mahadev)
HADOOP-3635. Uncaught exception in DataBlockScanner.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3639. Exception when closing DFSClient while multiple files are
open. (Benjamin Gufler via hairong)
HADOOP-3572. SetQuotas usage interface has some minor bugs. (hairong)
HADOOP-3649. Fix bug in removing blocks from the corrupted block map.
(Lohit Vijayarenu via shv)
HADOOP-3604. Work around a JVM synchronization problem observed while
retrieving the address of direct buffers from compression code by obtaining
a lock during this call. (Arun C Murthy via cdouglas)
HADOOP-3683. Fix dfs metrics to count file listings rather than files
listed. (lohit vijayarenu via cdouglas)
HADOOP-3597. Fix SortValidator to use filesystems other than the default as
input. Validation job still runs on default fs.
(Jothi Padmanabhan via cdouglas)
HADOOP-3693. Fix archives, distcp and native library documentation to
conform to style guidelines. (Amareshwari Sriramadasu via cdouglas)
HADOOP-3653. Fix test-patch target to properly account for Eclipse
classpath jars. (Brice Arnould via nigel)
HADOOP-3692. Fix documentation for Cluster setup and Quick start guides.
(Amareshwari Sriramadasu via ddas)
HADOOP-3691. Fix streaming and tutorial docs. (Jothi Padmanabhan via ddas)
HADOOP-3630. Fix NullPointerException in CompositeRecordReader from empty
sources (cdouglas)
HADOOP-3706. Fix a ClassLoader issue in the mapred.join Parser that
prevents it from loading user-specified InputFormats.
(Jingkei Ly via cdouglas)
HADOOP-3718. Fix KFSOutputStream::write(int) to output a byte instead of
an int, per the OutputStream contract. (Sriram Rao via cdouglas)
HADOOP-3647. Add debug logs to help track down a very occassional,
hard-to-reproduce, bug in shuffle/merge on the reducer. (acmurthy)
HADOOP-3716. Prevent listStatus in KosmosFileSystem from returning
null for valid, empty directories. (Sriram Rao via cdouglas)
HADOOP-3752. Fix audit logging to record rename events. (cdouglas)
HADOOP-3737. Fix CompressedWritable to call Deflater::end to release
compressor memory. (Grant Glouser via cdouglas)
HADOOP-3670. Fixes JobTracker to clear out split bytes when no longer
required. (Amareshwari Sriramadasu via ddas)
HADOOP-3755. Update gridmix to work with HOD 0.4 (Runping Qi via cdouglas)
HADOOP-3743. Fix -libjars, -files, -archives options to work even if
user code does not implement tools. (Amareshwari Sriramadasu via mahadev)
HADOOP-3774. Fix typos in shell output. (Tsz Wo (Nicholas), SZE via
cdouglas)
HADOOP-3762. Fixed FileSystem cache to work with the default port. (cutting
via omalley)
HADOOP-3798. Fix tests compilation. (Mukund Madhugiri via omalley)
HADOOP-3794. Return modification time instead of zero for KosmosFileSystem.
(Sriram Rao via cdouglas)
HADOOP-3806. Remove debug statement to stdout from QuickSort. (cdouglas)
HADOOP-3776. Fix NPE at NameNode when datanode reports a block after it is
deleted at NameNode. (rangadi)
HADOOP-3537. Disallow adding a datanode to a network topology when its
network location is not resolved. (hairong)
HADOOP-3571. Fix bug in block removal used in lease recovery. (shv)
HADOOP-3645. MetricsTimeVaryingRate returns wrong value for
metric_avg_time. (Lohit Vijayarenu via hairong)
HADOOP-3521. Reverted the missing cast to float for sending Counters' values
to Hadoop metrics which was removed by HADOOP-544. (acmurthy)
HADOOP-3820. Fixes two problems in the gridmix-env - a syntax error, and a
wrong definition of USE_REAL_DATASET by default. (Arun Murthy via ddas)
HADOOP-3724. Fixes two problems related to storing and recovering lease
in the fsimage. (dhruba)
HADOOP-3827. Fixed compression of empty map-outputs. (acmurthy)
HADOOP-3865. Remove reference to FSNamesystem from metrics preventing
garbage collection. (Lohit Vijayarenu via cdouglas)
HADOOP-3884. Fix so that Eclipse plugin builds against recent
Eclipse releases. (cutting)
HADOOP-3837. Streaming jobs report progress status. (dhruba)
HADOOP-3897. Fix a NPE in secondary namenode. (Lohit Vijayarenu via
cdouglas)
HADOOP-3901. Fix bin/hadoop to correctly set classpath under cygwin.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3947. Fix a problem in tasktracker reinitialization.
(Amareshwari Sriramadasu via ddas)
Release 0.17.3 - Unreleased
IMPROVEMENTS
HADOOP-4164. Chinese translation of the documentation. (Xuebing Yan via
omalley)
BUG FIXES
HADOOP-4277. Checksum verification was mistakenly disabled for
LocalFileSystem. (Raghu Angadi)
HADOOP-4271. Checksum input stream can sometimes return invalid
data to the user. (Ning Li via rangadi)
HADOOP-4318. DistCp should use absolute paths for cleanup. (szetszwo)
HADOOP-4326. ChecksumFileSystem does not override create(...) correctly.
(szetszwo)
Release 0.17.2 - 2008-08-11
BUG FIXES
HADOOP-3678. Avoid spurious exceptions logged at DataNode when clients
read from DFS. (rangadi)
HADOOP-3707. NameNode keeps a count of number of blocks scheduled
to be written to a datanode and uses it to avoid allocating more
blocks than a datanode can hold. (rangadi)
HADOOP-3760. Fix a bug with HDFS file close() mistakenly introduced
by HADOOP-3681. (Lohit Vijayarenu via rangadi)
HADOOP-3681. DFSClient can get into an infinite loop while closing
a file if there are some errors. (Lohit Vijayarenu via rangadi)
HADOOP-3002. Hold off block removal while in safe mode. (shv)
HADOOP-3685. Unbalanced replication target. (hairong)
HADOOP-3758. Shutdown datanode on version mismatch instead of retrying
continuously, preventing excessive logging at the namenode.
(lohit vijayarenu via cdouglas)
HADOOP-3633. Correct exception handling in DataXceiveServer, and throttle
the number of xceiver threads in a data-node. (shv)
HADOOP-3370. Ensure that the TaskTracker.runningJobs data-structure is
correctly cleaned-up on task completion. (Zheng Shao via acmurthy)
HADOOP-3813. Fix task-output clean-up on HDFS to use the recursive
FileSystem.delete rather than the FileUtil.fullyDelete. (Amareshwari
Sri Ramadasu via acmurthy)
HADOOP-3859. Allow the maximum number of xceivers in the data node to
be configurable. (Johan Oskarsson via omalley)
HADOOP-3931. Fix corner case in the map-side sort that causes some values
to be counted as too large and cause pre-mature spills to disk. Some values
will also bypass the combiner incorrectly. (cdouglas via omalley)
Release 0.17.1 - 2008-06-23
INCOMPATIBLE CHANGES
HADOOP-3565. Fix the Java serialization, which is not enabled by
default, to clear the state of the serializer between objects.
(tomwhite via omalley)
IMPROVEMENTS
HADOOP-3522. Improve documentation on reduce pointing out that
input keys and values will be reused. (omalley)
HADOOP-3487. Balancer uses thread pools for managing its threads;
therefore provides better resource management. (hairong)
BUG FIXES
HADOOP-2159 Namenode stuck in safemode. The counter blockSafe should
not be decremented for invalid blocks. (hairong)
HADOOP-3472 MapFile.Reader getClosest() function returns incorrect results
when before is true (Todd Lipcon via Stack)
HADOOP-3442. Limit recursion depth on the stack for QuickSort to prevent
StackOverflowErrors. To avoid O(n*n) cases, when partitioning depth exceeds
a multiple of log(n), change to HeapSort. (cdouglas)
HADOOP-3477. Fix build to not package contrib/*/bin twice in
distributions. (Adam Heath via cutting)
HADOOP-3475. Fix MapTask to correctly size the accounting allocation of
io.sort.mb. (cdouglas)
HADOOP-3550. Fix the serialization data structures in MapTask where the
value lengths are incorrectly calculated. (cdouglas)
HADOOP-3526. Fix contrib/data_join framework by cloning values retained
in the reduce. (Spyros Blanas via cdouglas)
HADOOP-1979. Speed up fsck by adding a buffered stream. (Lohit
Vijaya Renu via omalley)
Release 0.17.0 - 2008-05-18
INCOMPATIBLE CHANGES
HADOOP-2786. Move hbase out of hadoop core
HADOOP-2345. New HDFS transactions to support appending
to files. Disk layout version changed from -11 to -12. (dhruba)
HADOOP-2192. Error messages from "dfs mv" command improved.
(Mahadev Konar via dhruba)
HADOOP-1902. "dfs du" command without any arguments operates on the
current working directory. (Mahadev Konar via dhruba)
HADOOP-2873. Fixed bad disk format introduced by HADOOP-2345.
Disk layout version changed from -12 to -13. See changelist 630992
(dhruba)
HADOOP-1985. This addresses rack-awareness for Map tasks and for
HDFS in a uniform way. (ddas)
HADOOP-1986. Add support for a general serialization mechanism for
Map Reduce. (tomwhite)
HADOOP-771. FileSystem.delete() takes an explicit parameter that
specifies whether a recursive delete is intended.
(Mahadev Konar via dhruba)
HADOOP-2470. Remove getContentLength(String), open(String, long, long)
and isDir(String) from ClientProtocol. ClientProtocol version changed
from 26 to 27. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2822. Remove deprecated code for classes InputFormatBase and
PhasedFileSystem. (Amareshwari Sriramadasu via enis)
HADOOP-2116. Changes the layout of the task execution directory.
(Amareshwari Sriramadasu via ddas)
HADOOP-2828. The following deprecated methods in Configuration.java
have been removed
getObject(String name)
setObject(String name, Object value)
get(String name, Object defaultValue)
set(String name, Object value)
Iterator entries()
(Amareshwari Sriramadasu via ddas)
HADOOP-2824. Removes one deprecated constructor from MiniMRCluster.
(Amareshwari Sriramadasu via ddas)
HADOOP-2823. Removes deprecated methods getColumn(), getLine() from
org.apache.hadoop.record.compiler.generated.SimpleCharStream.
(Amareshwari Sriramadasu via ddas)
HADOOP-3060. Removes one unused constructor argument from MiniMRCluster.
(Amareshwari Sriramadasu via ddas)
HADOOP-2854. Remove deprecated o.a.h.ipc.Server::getUserInfo().
(lohit vijayarenu via cdouglas)
HADOOP-2563. Remove deprecated FileSystem::listPaths.
(lohit vijayarenu via cdouglas)
HADOOP-2818. Remove deprecated methods in Counters.
(Amareshwari Sriramadasu via tomwhite)
HADOOP-2831. Remove deprecated o.a.h.dfs.INode::getAbsoluteName()
(lohit vijayarenu via cdouglas)
HADOOP-2839. Remove deprecated FileSystem::globPaths.
(lohit vijayarenu via cdouglas)
HADOOP-2634. Deprecate ClientProtocol::exists.
(lohit vijayarenu via cdouglas)
HADOOP-2410. Make EC2 cluster nodes more independent of each other.
Multiple concurrent EC2 clusters are now supported, and nodes may be
added to a cluster on the fly with new nodes starting in the same EC2
availability zone as the cluster. Ganglia monitoring and large
instance sizes have also been added. (Chris K Wensel via tomwhite)
HADOOP-2826. Deprecated FileSplit.getFile(), LineRecordReader.readLine().
(Amareshwari Sriramadasu via ddas)
HADOOP-3239. getFileInfo() returns null for non-existing files instead
of throwing FileNotFoundException. (Lohit Vijayarenu via shv)
HADOOP-3266. Removed HOD changes from CHANGES.txt, as they are now inside
src/contrib/hod (Hemanth Yamijala via ddas)
HADOOP-3280. Separate the configuration of the virtual memory size
(mapred.child.ulimit) from the jvm heap size, so that 64 bit
streaming applications are supported even when running with 32 bit
jvms. (acmurthy via omalley)
NEW FEATURES
HADOOP-1398. Add HBase in-memory block cache. (tomwhite)
HADOOP-2178. Job History on DFS. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2063. A new parameter to dfs -get command to fetch a file
even if it is corrupted. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2219. A new command "df -count" that counts the number of
files and directories. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2906. Add an OutputFormat capable of using keys, values, and
config params to map records to different output files.
(Runping Qi via cdouglas)
HADOOP-2346. Utilities to support timeout while writing to sockets.
DFSClient and DataNode sockets have 10min write timeout. (rangadi)
HADOOP-2951. Add a contrib module that provides a utility to
build or update Lucene indexes using Map/Reduce. (Ning Li via cutting)
HADOOP-1622. Allow multiple jar files for map reduce.
(Mahadev Konar via dhruba)
HADOOP-2055. Allows users to set PathFilter on the FileInputFormat.
(Alejandro Abdelnur via ddas)
HADOOP-2551. More environment variables like HADOOP_NAMENODE_OPTS
for better control of HADOOP_OPTS for each component. (rangadi)
HADOOP-3001. Add job counters that measure the number of bytes
read and written to HDFS, S3, KFS, and local file systems. (omalley)
HADOOP-3048. A new Interface and a default implementation to convert
and restore serializations of objects to/from strings. (enis)
IMPROVEMENTS
HADOOP-2655. Copy on write for data and metadata files in the
presence of snapshots. Needed for supporting appends to HDFS
files. (dhruba)
HADOOP-1967. When a Path specifies the same scheme as the default
FileSystem but no authority, the default FileSystem's authority is
used. Also add warnings for old-format FileSystem names, accessor
methods for fs.default.name, and check for null authority in HDFS.
(cutting)
HADOOP-2895. Let the profiling string be configurable.
(Martin Traverso via cdouglas)
HADOOP-910. Enables Reduces to do merges for the on-disk map output files
in parallel with their copying. (Amar Kamat via ddas)
HADOOP-730. Use rename rather than copy for local renames. (cdouglas)
HADOOP-2810. Updated the Hadoop Core logo. (nigel)
HADOOP-2057. Streaming should optionally treat a non-zero exit status
of a child process as a failed task. (Rick Cox via tomwhite)