blob: 709688989cc622c1a3238495420a15e96911b603 [file] [log] [blame]
Hadoop Change Log
Release 0.19.3 - Unreleased
BUG FIXES
HADOOP-4963. Logs saying org.apache.hadoop.util.DiskChecker$DiskErrorException
in TaskTracker are not relevant ((Amareshwari Sriramadasu via ddas)
Release 0.19.2 - 2009-06-30
BUG FIXES
HADOOP-5379. CBZip2InputStream to throw IOException on data crc error.
(Rodrigo Schmidt via zshao)
HADOOP-5326. Fixes CBZip2OutputStream data corruption problem.
(Rodrigo Schmidt via zshao)
HADOOP-5154. Fixes a deadlock in the fairshare scheduler.
(Matei Zaharia via yhemanth)
HADOOP-5269. Fixes a problem to do with tasktracker holding on to FAILED_UNCLEAN
or KILLED_UNCLEAN tasks forever. (Amareshwari Sriramadasu via ddas)
HADOOP-5280. Adds a check to prevent a task state transition from FAILED to
any of UNASSIGNED, RUNNING, COMMIT_PENDING or SUCCEEDED. (ddas)
HADOOP-5241. Fixes a bug in disk-space resource estimation. Makes the estimation
formula linear where blowUp = Total-Output/Total-Input. (Sharad Agarwal via ddas)
HADOOP-5233. Addresses the three issues - Race condition in updating
status, NPE in TaskTracker task localization when the conf file is missing
(HADOOP-5234) and NPE in handling KillTaskAction of a cleanup task (HADOOP-5235).
(Amareshwari Sriramadasu via ddas)
HADOOP-5247. Introduces a broadcast of KillJobAction to all trackers when
a job finishes. This fixes a bunch of problems to do with NPE when a completed
job is not in memory and a tasktracker comes to the jobtracker with a status
report of a task belonging to that job. (Amar Kamat via ddas)
HADOOP-5146. Fixes a race condition that causes LocalDirAllocator to miss
files. (Devaraj Das via yhemanth)
HADOOP-4638. Fixes job recovery to not crash the job tracker for problems
with a single job file. (Amar Kamat via yhemanth)
HADOOP-5384. Fix a problem that DataNodeCluster creates blocks with
generationStamp == 1. (szetszwo)
HADOOP-5376. Fixes the code handling lost tasktrackers to set the task state
to KILLED_UNCLEAN only for relevant type of tasks.
(Amareshwari Sriramadasu via yhemanth)
HADOOP-5285. Fixes the issues - (1) obtainTaskCleanupTask checks whether job is
inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class
outside the TaskTracker and makes it a generic class that is used by the
JobTracker also for deleting the paths on the job's output fs. (3) Moves the
references to completedJobStore outside the block where the JobTracker is locked.
(ddas)
HADOOP-5392. Fixes a problem to do with JT crashing during recovery when
the job files are garbled. (Amar Kamat vi ddas)
HADOOP-5421. Removes the test TestRecoveryManager.java from the 0.19 branch
as it has compilation issues. (ddas)
HADOOP-5332. Appending to files is not allowed (by default) unless
dfs.support.append is set to true. (dhruba)
HADOOP-5333. libhdfs supports appending to files. (dhruba)
HADOOP-3998. Fix dfsclient exception when JVM is shutdown. (dhruba)
HADOOP-5440. Fixes a problem to do with removing a taskId from the list
of taskIds that the TaskTracker's TaskMemoryManager manages.
(Amareshwari Sriramadasu via ddas)
HADOOP-5446. Restore TaskTracker metrics. (cdouglas)
HADOOP-5449. Fixes the history cleaner thread.
(Amareshwari Sriramadasu via ddas)
HADOOP-5479. NameNode should not send empty block replication request to
DataNode. (hairong)
HADOOP-5522. Documents the setup/cleanup tasks in the mapred tutorial.
(Amareshwari Sriramadasu via ddas)
HADOOP-5549. ReplicationMonitor should schedule both replication and
deletion work in one iteration. (hairong)
HADOOP-5554. DataNodeCluster and CreateEditsLog should create blocks with
the same generation stamp value. (hairong via szetszwo)
HADOOP-5231. Clones the TaskStatus before passing it to the JobInProgress.
(Amareshwari Sriramadasu via ddas)
HADOOP-4719. Fix documentation of 'ls' format for FsShell. (Ravi Phulari
via cdouglas)
HADOOP-5374. Fixes a NPE problem in getTasksToSave method.
(Amareshwari Sriramadasu via ddas)
HADOOP-4780. Cache the size of directories in DistributedCache, avoiding
long delays in recalculating it. (He Yongqiang via cdouglas)
HADOOP-5551. Prevent directory destruction on file create.
(Brian Bockelman via shv)
HADOOP-5671. Fix FNF exceptions when copying from old versions of
HftpFileSystem. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-5213. Fix Null pointer exception caused when bzip2compression
was used and user closed a output stream without writing any data.
(Zheng Shao via dhruba)
HADOOP-5579. Set errno correctly in libhdfs for permission, quota, and FNF
conditions. (Brian Bockelman via cdouglas)
HADOOP-5728. Fixed FSEditLog.printStatistics IndexOutOfBoundsException.
(Wang Xu via johan)
HADOOP-5816. Fixes a problem in the KeyFieldBasedComparator to do with
ArrayIndexOutOfBounds exception. (He Yongqiang via ddas)
HADOOP-5951. Add Apache license header to StorageInfo.java. (Suresh
Srinivas via szetszwo)
Release 0.19.1 - 2009-02-23
HADOOP-5225. Workaround for tmp file handling in HDFS. sync() is
incomplete as a result. committed only to 0.19.x. (Raghu Angadi)
HADOOP-5224. HDFS append() is disabled. It throws
UnsupportedOperationException. committed only to 0.19.x (Raghu Angadi)
IMPROVEMENTS
HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
mapred tutorial. (Vivek Ratan via cdouglas)
HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
BUG FIXES
HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
blocks correctly. (Sriram Rao via cdouglas)
HADOOP-4420. Add null checks for job, caused by invalid job IDs.
(Aaron Kimball via tomwhite)
HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
current workding directory for scratch space. (Amar Kamat via cdouglas)
HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
szetszwo)
HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
fuse-dfs. (Brian Bockelman via szetszwo)
HADOOP-4731. Fix capacity scheduler to correctly remove job on completion
from waiting queue. (Amar Kamat via yhemanth)
HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
via szetszwo)
HADOOP-4821. Usage description in the Quotas guide documentations are
incorrect. (Boris Shkolnik via hairong)
HADOOP-4847. Moves the loading of OutputCommitter to the Task.
(Amareshwari Sriramadasu via ddas)
HADOOP-4966. Marks completed setup tasks for removal.
(Amareshwari Sriramadasu via ddas)
HADOOP-4992. Fixes a package name problem introduced by HADOOP-4847.
(Amareshwari Sriramadasu via ddas)
HADOOP-4982. TestFsck should run in Eclipse. (shv)
HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
fd unclosed. (hairong)
HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
(Zheng Shao via dhruba).
HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
tasks. (ddas)
HADOOP-5002. Fixes a problem to do with the order of initialization of
reduce task and instantiating the reducer class.
(Amareshwari Sriramadasu via ddas)
HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
verification log unclosed. (hairong)
HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
HADOOP-4955. Make DBOutputFormat us column names from setOutput().
(Kevin Peterson via enis)
HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
spurious IOExceptions logged by DataNode. (Raghu Angadi)
HADOOP-5034. NameNode should send both replication and deletion requests
to DataNode in one reply to a heartbeat. (hairong)
HADOOP-5156. TestHeartbeatHandling uses MiiDFSCluster.getNamesystem()
which does not exit in branch 0.19 and 0.20. (hairong)
HADOOP-4759. HADOOP-4759. Removes temporary output directory for failed and
killed tasks by launching special CLEANUP tasks for the same.
(Amareshwari Sriramadasu via ddas)
HADOOP-5161. Accepted sockets do not get placed in
DataXceiverServer#childSockets. (hairong)
HADOOP-5193. Correct calculation of edits modification time. (shv)
HADOOP-4494. Allow libhdfs to append to files.
(Pete Wyckoff via dhruba)
HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
for the JobTracker. (Amar Kamat via yhemanth).
HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
killed tasks correctly. (Amareshwari Sriramadasu via ddas)
HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
(enis)
Release 0.19.0 - 2008-11-18
INCOMPATIBLE CHANGES
HADOOP-3595. Remove deprecated methods for mapred.combine.once
functionality, which was necessary to providing backwards
compatible combiner semantics for 0.18. (cdouglas via omalley)
HADOOP-3667. Remove the following deprecated methods from JobConf:
addInputPath(Path)
getInputPaths()
getMapOutputCompressionType()
getOutputPath()
getSystemDir()
setInputPath(Path)
setMapOutputCompressionType(CompressionType style)
setOutputPath(Path)
(Amareshwari Sriramadasu via omalley)
HADOOP-3652. Remove deprecated class OutputFormatBase.
(Amareshwari Sriramadasu via cdouglas)
HADOOP-2885. Break the hadoop.dfs package into separate packages under
hadoop.hdfs that reflect whether they are client, server, protocol,
etc. DistributedFileSystem and DFSClient have moved and are now
considered package private. (Sanjay Radia via omalley)
HADOOP-2325. Require Java 6. (cutting)
HADOOP-372. Add support for multiple input paths with a different
InputFormat and Mapper for each path. (Chris Smith via tomwhite)
HADOOP-1700. Support appending to file in HDFS. (dhruba)
HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
zero for true and non-zero for false. (Ben Slusky via cdouglas)
HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
which is no longer needed. (tomwhite via omalley)
HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
EACCES is returned for permission problems. (Ben Slusky via omalley)
HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
so increment the InterTrackerProtocol version. (Hemanth Yamijala via
omalley)
HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
committing output files. Moves job setup to jobclient, and moves jobcleanup
to a separate task. (Amareshwari Sriramadasu via ddas)
HADOOP-3446. Keep map outputs in memory during the reduce. Remove
fs.inmemory.size.mb and replace with properties defining in memory map
output retention during the shuffle and reduce relative to maximum heap
usage. (cdouglas)
HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
jobs can be recovered from the history file. The history file format has
been modified to support recovery. The task attempt ID now has the
JobTracker start time to disinguish attempts of the same TIP across
restarts. (Amar Ramesh Kamat via ddas)
HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
(Sanjay Radia via hairong)
HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
HADOOP-2816. Cluster summary at name node web reports the space
utilization as:
Configured Capacity: capacity of all the data directories - Reserved space
Present Capacity: Space available for dfs,i.e. remaining+used space
DFS Used%: DFS used space/Present Capacity
(Suresh Srinivas via hairong)
HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
quotas in 0.18. (rangadi)
HADOOP-4293. Make Configuration Writable and remove unreleased
WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
HADOOP-4281. Change dfsadmin to report available disk space in a format
consistent with the web interface as defined in HADOOP-2816. (Suresh
Srinivas via cdouglas)
HADOOP-4430. Further change the cluster summary at name node web that was
changed in HADOOP-2816:
Non DFS Used - This indicates the disk space taken by non DFS file from
the Configured capacity
DFS Used % - DFS Used % of Configured Capacity
DFS Remaining % - Remaing % Configured Capacity available for DFS use
DFS command line report reflects the same change. Config parameter
dfs.datanode.du.pct is no longer used and is removed from the
hadoop-default.xml. (Suresh Srinivas via hairong)
HADOOP-4116. Balancer should provide better resource management. (hairong)
NEW FEATURES
HADOOP-3341. Allow streaming jobs to specify the field separator for map
and reduce input and output. The new configuration values are:
stream.map.input.field.separator
stream.map.output.field.separator
stream.reduce.input.field.separator
stream.reduce.output.field.separator
All of them default to "\t". (Zheng Shao via omalley)
HADOOP-3479. Defines the configuration file for the resource manager in
Hadoop. You can configure various parameters related to scheduling, such
as queues and queue properties here. The properties for a queue follow a
naming convention,such as, hadoop.rm.queue.queue-name.property-name.
(Hemanth Yamijala via ddas)
HADOOP-3149. Adds a way in which map/reducetasks can create multiple
outputs. (Alejandro Abdelnur via ddas)
HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
bash tab completion for the bin/hadoop script. See the README file
in the contrib directory for the installation. (Chris Smith via enis)
HADOOP-3730. Adds a new JobConf constructor that disables loading
default configurations. (Alejandro Abdelnur via ddas)
HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
(Ari Rabkin via acmurthy)
HADOOP-2302. Provides a comparator for numerical sorting of key fields.
(ddas)
HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
HADOOP-657. Free disk space should be modelled and used by the scheduler
to make scheduling decisions. (Ari Rabkin via omalley)
HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
and Eric Yang)
HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
number of files/bytes copied in a particular run to support incremental
updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3585. FailMon package for hardware failure monitoring and
analysis of anomalies. (Ioannis Koltsidas via dhruba)
HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3759. Provides ability to run memory intensive jobs without
affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
HADOOP-3828. Provides a way to write skipped records to DFS.
(Sharad Agarwal via ddas)
HADOOP-3948. Separate name-node edits and fsimage directories.
(Lohit Vijayarenu via shv)
HADOOP-3939. Add an option to DistCp to delete files at the destination
not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
(Craig Weisenfluh via omalley)
HADOOP-3698. Add access control to control who is allowed to submit or
modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
HADOOP-1869. Support access times for HDFS files. (dhruba)
HADOOP-3941. Extend FileSystem API to return file-checksums.
(szetszwo)
HADOOP-3581. Prevents memory intensive user tasks from taking down
nodes. (Vinod K V via ddas)
HADOOP-3970. Provides a way to recover counters written to JobHistory.
(Amar Kamat via ddas)
HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
chains of Maps and Reduces in a single Map/Reduce job, something like
MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
queues as a percentage of the cluster. (Vivek Ratan via omalley)
HADOOP-3992. Add a synthetic load generation facility to the test
directory. (hairong via szetszwo)
HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
and change DistCp to use file checksum for comparing src and dst files
(szetszwo)
HADOOP-3829. Narrown down skipped records based on user acceptable value.
(Sharad Agarwal via ddas)
HADOOP-3930. Add common interfaces for the pluggable schedulers and the
cli & gui clients. (Sreekanth Ramakrishnan via omalley)
HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
Configuration changes to hadoop-default.xml:
add mapred.job.reuse.jvm.num.tasks
(Devaraj Das via acmurthy)
HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
query language. (tomwhite)
HADOOP-2536. Implement a JDBC based database input and output formats to
allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
Enis Soztutar via acmurthy)
HADOOP-3019. A new library to support total order partitions.
(cdouglas via omalley)
HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
acmurthy)
IMPROVEMENTS
HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
(zshao)
HADOOP-4106. libhdfs: add time, permission and user attribute support
(part 2). (Pete Wyckoff through zshao)
HADOOP-4104. libhdfs: add time, permission and user attribute support.
(Pete Wyckoff through zshao)
HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
(Pete Wyckoff through zshao)
HADOOP-3732. Delay intialization of datanode block verification till
the verification thread is started. (rangadi)
HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
(rangadi)
HADOOP-3577. Tools to inject blocks into name node and simulated
data nodes for testing. (Sanjay Radia via hairong)
HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
may be processed by map/reduce. (cdouglas via omalley)
HADOOP-3655. Add additional ant properties to control junit. (Steve
Loughran via omalley)
HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
HADOOP-3587. Add a unit test for the contrib/data_join framework.
(cdouglas)
HADOOP-3402. Add terasort example program (omalley)
HADOOP-3660. Add replication factor for injecting blocks in simulated
datanodes. (Sanjay Radia via cdouglas)
HADOOP-3684. Add a cloning function to the contrib/data_join framework
permitting users to define a more efficient method for cloning values from
the reduce than serialization/deserialization. (Runping Qi via cdouglas)
HADOOP-3478. Improves the handling of map output fetching. Now the
randomization is by the hosts (and not the map outputs themselves).
(Jothi Padmanabhan via ddas)
HADOOP-3617. Removed redundant checks of accounting space in MapTask and
makes the spill thread persistent so as to avoid creating a new one for
each spill. (Chris Douglas via acmurthy)
HADOOP-3412. Factor the scheduler out of the JobTracker and make
it pluggable. (Tom White and Brice Arnould via omalley)
HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
hadoop-default.xml. (rangadi)
HADOOP-3747. Adds counter suport for MultipleOutputs.
(Alejandro Abdelnur via ddas)
HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
constructor. (TszWo (Nicholas), SZE via hairong)
HADOOP-3824. Move base functionality of StatusHttpServer to a core
package. (TszWo (Nicholas), SZE via cdouglas)
HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
may be processed by map/reduce. (Abdul Qadeer via cdouglas)
HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
(tomwhite via omalley)
HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
cdouglas)
HADOOP-3694. Improve unit test performance by changing
MiniDFSCluster to listen only on 127.0.0.1. (cutting)
HADOOP-3620. Namenode should synchronously resolve a datanode's network
location when the datanode registers. (hairong)
HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
benchmarks. (shv)
HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
HADOOP-3875. Change the time period between heartbeats to be relative to
the end of the heartbeat rpc, rather than the start. This causes better
behavior if the JobTracker is overloaded. (acmurthy via omalley)
HADOOP-3853. Move multiple input format (HADOOP-372) extension to
library package. (tomwhite via johan)
HADOOP-9. Use roulette scheduling for temporary space when the size
is not known. (Ari Rabkin via omalley)
HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
(Amareshwari Sriramadasu via omalley)
HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
via omalley)
HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
via omalley)
HADOOP-3780. Remove asynchronous resolution of network topology in the
JobTracker (Amar Kamat via omalley)
HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
error messages. (Steve Loughran via omalley)
HADOOP-3844. Include message of local exception in RPC client failures.
(Steve Loughran via omalley)
HADOOP-3935. Split out inner classes from DataNode.java. (johan)
HADOOP-3905. Create generic interfaces for edit log streams. (shv)
HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
traffic for HDFS reads/writes and MR shuffling. (cdouglas)
HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
generative javadoc for developers. (Sanjay Radia via omalley)
HADOOP-3944. Improve documentation for public TupleWritable class in
join package. (Chris Douglas via enis)
HADOOP-2330. Preallocate HDFS transaction log to improve performance.
(dhruba and hairong)
HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
Groshupf and Craig Macdonald via omalley)
HADOOP-3342. Change the kill task actions to require http post instead of
get to prevent accidental crawls from triggering it. (enis via omalley)
HADOOP-3937. Limit the job name in the job history filename to 50
characters. (Matei Zaharia via omalley)
HADOOP-3943. Remove unnecessary synchronization in
NetworkTopology.pseudoSortByDistance. (hairong via omalley)
HADOOP-3498. File globbing alternation should be able to span path
components. (tomwhite)
HADOOP-3361. Implement renames for NativeS3FileSystem.
(Albert Chern via tomwhite)
HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
unset. (Al Hoang via tomwhite)
HADOOP-4147. Remove unused class JobWithTaskContext from class
JobInProgress. (Amareshwari Sriramadasu via johan)
HADOOP-4151. Add a byte-comparable interface that both Text and
BytesWritable implement. (cdouglas via omalley)
HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
NamenodeProtocol. (shv via szetszwo)
HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
developing under git. (omalley)
HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
omalley)
HADOOP-4184. Break the module dependencies between core, hdfs, and
mapred. (tomwhite via omalley)
HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
(Ramya R via nigel)
HADOOP-4117. Improve configurability of Hadoop EC2 instances.
(tomwhite)
HADOOP-2411. Add support for larger CPU EC2 instance types.
(Chris K Wensel via tomwhite)
HADOOP-4083. Changed the configuration attribute queue.name to
mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
HADOOP-4194. Added the JobConf and JobID to job-related methods in
JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
HADOOP-3975. Change test-patch script to report working the dir
modifications preventing the suite from being run. (Ramya R via cdouglas)
HADOOP-4124. Added a command-line switch to allow users to set job
priorities, also allow it to be manipulated via the web-ui. (Hemanth
Yamijala via acmurthy)
HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
userlogs. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4062. Remove the synchronization on the output stream when a
connection is closed and also remove an undesirable exception when
a client is stoped while there is no pending RPC request. (hairong)
HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
(szetszwo)
HADOOP-4006. Clean up FSConstants and move some of the constants to
better places. (Sanjay Radia via rangadi)
HADOOP-4279. Trace the seeds of random sequences in append unit tests to
make itermitant failures reproducible. (szetszwo via cdouglas)
HADOOP-4209. Remove the change to the format of task attempt id by
incrementing the task attempt numbers by 1000 when the job restarts.
(Amar Kamat via omalley)
HADOOP-4301. Adds forrest doc for the skip bad records feature.
(Sharad Agarwal via ddas)
HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
(szetszwo)
HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
Yamijala via omalley)
HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
via cdouglas).
HADOOP-4438. Update forrest documentation to include missing FsShell
commands. (Suresh Srinivas via cdouglas)
HADOOP-4105. Add forrest documentation for libhdfs.
(Pete Wyckoff via cutting)
HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
OPTIMIZATIONS
HADOOP-3556. Removed lock contention in MD5Hash by changing the
singleton MessageDigester by an instance per Thread using
ThreadLocal. (Iv?n de Prado via omalley)
HADOOP-3328. When client is writing data to DFS, only the last
datanode in the pipeline needs to verify the checksum. Saves around
30% CPU on intermediate datanodes. (rangadi)
HADOOP-3863. Use a thread-local string encoder rather than a static one
that is protected by a lock. (acmurthy via omalley)
HADOOP-3864. Prevent the JobTracker from locking up when a job is being
initialized. (acmurthy via omalley)
HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
HADOOP-2130. Pipes submit job should have both blocking and non-blocking
versions. (acmurthy via omalley)
HADOOP-3769. Make the SampleMapper and SampleReducer from
GenericMRLoadGenerator public, so they can be used in other contexts.
(Lingyun Yang via omalley)
HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
it from a different .crc file. (Jothi Padmanabhan via ddas)
HADOOP-3638. Caches the iFile index files in memory to reduce seeks
(Jothi Padmanabhan via ddas)
HADOOP-4225. FSEditLog.logOpenFile() should persist accessTime
rather than modificationTime. (shv)
HADOOP-4380. Made several new classes (Child, JVMId,
JobTrackerInstrumentation, QueueManager, ResourceEstimator,
TaskTrackerInstrumentation, and TaskTrackerMetricsInst) in
org.apache.hadoop.mapred package private instead of public. (omalley)
BUG FIXES
HADOOP-3563. Refactor the distributed upgrade code so that it is
easier to identify datanode and namenode related code. (dhruba)
HADOOP-3640. Fix the read method in the NativeS3InputStream. (tomwhite via
omalley)
HADOOP-3711. Fixes the Streaming input parsing to properly find the
separator. (Amareshwari Sriramadasu via ddas)
HADOOP-3725. Prevent TestMiniMRMapDebugScript from swallowing exceptions.
(Steve Loughran via cdouglas)
HADOOP-3726. Throw exceptions from TestCLI setup and teardown instead of
swallowing them. (Steve Loughran via cdouglas)
HADOOP-3721. Refactor CompositeRecordReader and related mapred.join classes
to make them clearer. (cdouglas)
HADOOP-3720. Re-read the config file when dfsadmin -refreshNodes is invoked
so dfs.hosts and dfs.hosts.exclude are observed. (lohit vijayarenu via
cdouglas)
HADOOP-3485. Allow writing to files over fuse.
(Pete Wyckoff via dhruba)
HADOOP-3723. The flags to the libhdfs.create call can be treated as
a bitmask. (Pete Wyckoff via dhruba)
HADOOP-3643. Filter out completed tasks when asking for running tasks in
the JobTracker web/ui. (Amar Kamat via omalley)
HADOOP-3777. Ensure that Lzo compressors/decompressors correctly handle the
case where native libraries aren't available. (Chris Douglas via acmurthy)
HADOOP-3728. Fix SleepJob so that it doesn't depend on temporary files,
this ensures we can now run more than one instance of SleepJob
simultaneously. (Chris Douglas via acmurthy)
HADOOP-3795. Fix saving image files on Namenode with different checkpoint
stamps. (Lohit Vijayarenu via mahadev)
HADOOP-3624. Improving createeditslog to create tree directory structure.
(Lohit Vijayarenu via mahadev)
HADOOP-3778. DFSInputStream.seek() did not retry in case of some errors.
(LN via rangadi)
HADOOP-3661. The handling of moving files deleted through fuse-dfs to
Trash made similar to the behaviour from dfs shell.
(Pete Wyckoff via dhruba)
HADOOP-3819. Unset LANG and LC_CTYPE in saveVersion.sh to make it
compatible with non-English locales. (Rong-En Fan via cdouglas)
HADOOP-3848. Cache calls to getSystemDir in the TaskTracker instead of
calling it for each task start. (acmurthy via omalley)
HADOOP-3131. Fix reduce progress reporting for compressed intermediate
data. (Matei Zaharia via acmurthy)
HADOOP-3796. fuse-dfs configuration is implemented as file system
mount options. (Pete Wyckoff via dhruba)
HADOOP-3836. Fix TestMultipleOutputs to correctly clean up. (Alejandro
Abdelnur via acmurthy)
HADOOP-3805. Improve fuse-dfs write performance.
(Pete Wyckoff via zshao)
HADOOP-3846. Fix unit test CreateEditsLog to generate paths correctly.
(Lohit Vjayarenu via cdouglas)
HADOOP-3904. Fix unit tests using the old dfs package name.
(TszWo (Nicholas), SZE via johan)
HADOOP-3319. Fix some HOD error messages to go stderr instead of
stdout. (Vinod Kumar Vavilapalli via omalley)
HADOOP-3907. Move INodeDirectoryWithQuota to its own .java file.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3919. Fix attribute name in hadoop-default for
mapred.jobtracker.instrumentation. (Ari Rabkin via omalley)
HADOOP-3903. Change the package name for the servlets to be hdfs instead of
dfs. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3773. Change Pipes to set the default map output key and value
types correctly. (Koji Noguchi via omalley)
HADOOP-3952. Fix compilation error in TestDataJoin referencing dfs package.
(omalley)
HADOOP-3951. Fix package name for FSNamesystem logs and modify other
hard-coded Logs to use the class name. (cdouglas)
HADOOP-3889. Improve error reporting from HftpFileSystem, handling in
DistCp. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3946. Fix TestMapRed after hadoop-3664. (tomwhite via omalley)
HADOOP-3949. Remove duplicate jars from Chukwa. (Jerome Boulon via omalley)
HADOOP-3933. DataNode sometimes sends up to io.byte.per.checksum bytes
more than required to client. (Ning Li via rangadi)
HADOOP-3962. Shell command "fs -count" should support paths with different
file systems. (Tsz Wo (Nicholas), SZE via mahadev)
HADOOP-3957. Fix javac warnings in DistCp and TestCopyFiles. (Tsz Wo
(Nicholas), SZE via cdouglas)
HADOOP-3958. Fix TestMapRed to check the success of test-job. (omalley via
acmurthy)
HADOOP-3985. Fix TestHDFSServerPorts to use random ports. (Hairong Kuang
via omalley)
HADOOP-3964. Fix javadoc warnings introduced by FailMon. (dhruba)
HADOOP-3785. Fix FileSystem cache to be case-insensitive for scheme and
authority. (Bill de hOra via cdouglas)
HADOOP-3506. Fix a rare NPE caused by error handling in S3. (Tom White via
cdouglas)
HADOOP-3705. Fix mapred.join parser to accept InputFormats named with
underscore and static, inner classes. (cdouglas)
HADOOP-4023. Fix javadoc warnings introduced when the HDFS javadoc was
made private. (omalley)
HADOOP-4030. Remove lzop from the default list of codecs. (Arun Murthy via
cdouglas)
HADOOP-3961. Fix task disk space requirement estimates for virtual
input jobs. Delays limiting task placement until after 10% of the maps
have finished. (Ari Rabkin via omalley)
HADOOP-2168. Fix problem with C++ record reader's progress not being
reported to framework. (acmurthy via omalley)
HADOOP-3966. Copy findbugs generated output files to PATCH_DIR while
running test-patch. (Ramya R via lohit)
HADOOP-4037. Fix the eclipse plugin for versions of kfs and log4j. (nigel
via omalley)
HADOOP-3950. Cause the Mini MR cluster to wait for task trackers to
register before continuing. (enis via omalley)
HADOOP-3910. Remove unused ClusterTestDFSNamespaceLogging and
ClusterTestDFS. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3954. Disable record skipping by default. (Sharad Agarwal via
cdouglas)
HADOOP-4050. Fix TestFairScheduler to use absolute paths for the work
directory. (Matei Zaharia via omalley)
HADOOP-4069. Keep temporary test files from TestKosmosFileSystem under
test.build.data instead of /tmp. (lohit via omalley)
HADOOP-4078. Create test files for TestKosmosFileSystem in separate
directory under test.build.data. (lohit)
HADOOP-3968. Fix getFileBlockLocations calls to use FileStatus instead
of Path reflecting the new API. (Pete Wyckoff via lohit)
HADOOP-3963. libhdfs does not exit on its own, instead it returns error
to the caller and behaves as a true library. (Pete Wyckoff via dhruba)
HADOOP-4100. Removes the cleanupTask scheduling from the Scheduler
implementations and moves it to the JobTracker.
(Amareshwari Sriramadasu via ddas)
HADOOP-4097. Make hive work well with speculative execution turned on.
(Joydeep Sen Sarma via dhruba)
HADOOP-4113. Changes to libhdfs to not exit on its own, rather return
an error code to the caller. (Pete Wyckoff via dhruba)
HADOOP-4054. Remove duplicate lease removal during edit log loading.
(hairong)
HADOOP-4071. FSNameSystem.isReplicationInProgress should add an
underReplicated block to the neededReplication queue using method
"add" not "update". (hairong)
HADOOP-4154. Fix type warnings in WritableUtils. (szetszwo via omalley)
HADOOP-4133. Log files generated by Hive should reside in the
build directory. (Prasad Chakka via dhruba)
HADOOP-4094. Hive now has hive-default.xml and hive-site.xml similar
to core hadoop. (Prasad Chakka via dhruba)
HADOOP-4112. Handles cleanupTask in JobHistory
(Amareshwari Sriramadasu via ddas)
HADOOP-3831. Very slow reading clients sometimes failed while reading.
(rangadi)
HADOOP-4155. Use JobTracker's start time while initializing JobHistory's
JobTracker Unique String. (lohit)
HADOOP-4099. Fix null pointer when using HFTP from an 0.18 server.
(dhruba via omalley)
HADOOP-3570. Includes user specified libjar files in the client side
classpath path. (Sharad Agarwal via ddas)
HADOOP-4129. Changed memory limits of TaskTracker and Tasks to be in
KiloBytes rather than bytes. (Vinod Kumar Vavilapalli via acmurthy)
HADOOP-4139. Optimize Hive multi group-by.
(Namin Jain via dhruba)
HADOOP-3911. Add a check to fsck options to make sure -files is not
the first option to resolve conflicts with GenericOptionsParser
(lohit)
HADOOP-3623. Refactor LeaseManager. (szetszwo)
HADOOP-4125. Handles Reduce cleanup tip on the web ui.
(Amareshwari Sriramadasu via ddas)
HADOOP-4087. Hive Metastore API for php and python clients.
(Prasad Chakka via dhruba)
HADOOP-4197. Update DATA_TRANSFER_VERSION for HADOOP-3981. (szetszwo)
HADOOP-4138. Refactor the Hive SerDe library to better structure
the interfaces to the serializer and de-serializer.
(Zheng Shao via dhruba)
HADOOP-4195. Close compressor before returning to codec pool.
(acmurthy via omalley)
HADOOP-2403. Escapes some special characters before logging to
history files. (Amareshwari Sriramadasu via ddas)
HADOOP-4200. Fix a bug in the test-patch.sh script.
(Ramya R via nigel)
HADOOP-4084. Add explain plan capabilities to Hive Query Language.
(Ashish Thusoo via dhruba)
HADOOP-4121. Preserve cause for exception if the initialization of
HistoryViewer for JobHistory fails. (Amareshwari Sri Ramadasu via
acmurthy)
HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4077. Setting access and modification time for a file
requires write permissions on the file. (dhruba)
HADOOP-3592. Fix a couple of possible file leaks in FileUtil
(Bill de hOra via rangadi)
HADOOP-4120. Hive interactive shell records the time taken by a
query. (Raghotham Murthy via dhruba)
HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME
and then the path. (Raghotham Murthy via dhruba)
HADOOP-4242. Remove extra ";" in FSDirectory that blocks compilation
in some IDE's. (szetszwo via omalley)
HADOOP-4249. Fix eclipse path to include the hsqldb.jar. (szetszwo via
omalley)
HADOOP-4247. Move InputSampler into org.apache.hadoop.mapred.lib, so that
examples.jar doesn't depend on tools.jar. (omalley)
HADOOP-4269. Fix the deprecation of LineReader by extending the new class
into the old name and deprecating it. Also update the tests to test the
new class. (cdouglas via omalley)
HADOOP-4280. Fix conversions between seconds in C and milliseconds in
Java for access times for files. (Pete Wyckoff via rangadi)
HADOOP-4254. -setSpaceQuota command does not convert "TB" extenstion to
terabytes properly. Implementation now uses StringUtils for parsing this.
(Raghu Angadi)
HADOOP-4259. Findbugs should run over tools.jar also. (cdouglas via
omalley)
HADOOP-4275. Move public method isJobValidName from JobID to a private
method in JobTracker. (omalley)
HADOOP-4173. fix failures in TestProcfsBasedProcessTree and
TestTaskTrackerMemoryManager tests. ProcfsBasedProcessTree and
memory management in TaskTracker are disabled on Windows.
(Vinod K V via rangadi)
HADOOP-4189. Fixes the history blocksize & intertracker protocol version
issues introduced as part of HADOOP-3245. (Amar Kamat via ddas)
HADOOP-4190. Fixes the backward compatibility issue with Job History.
introduced by HADOOP-3245 and HADOOP-2403. (Amar Kamat via ddas)
HADOOP-4237. Fixes the TestStreamingBadRecords.testNarrowDown testcase.
(Sharad Agarwal via ddas)
HADOOP-4274. Capacity scheduler accidently modifies the underlying
data structures when browing the job lists. (Hemanth Yamijala via omalley)
HADOOP-4309. Fix eclipse-plugin compilation. (cdouglas)
HADOOP-4232. Fix race condition in JVM reuse when multiple slots become
free. (ddas via acmurthy)
HADOOP-4302. Fix a race condition in TestReduceFetch that can yield false
negatvies. (cdouglas)
HADOOP-3942. Update distcp documentation to include features introduced in
HADOOP-3873, HADOOP-3939. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4319. fuse-dfs dfs_read function returns as many bytes as it is
told to read unlesss end-of-file is reached. (Pete Wyckoff via dhruba)
HADOOP-4246. Ensure we have the correct lower bound on the number of
retries for fetching map-outputs; also fixed the case where the reducer
automatically kills on too many unique map-outputs could not be fetched
for small jobs. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-4163. Report FSErrors from map output fetch threads instead of
merely logging them. (Sharad Agarwal via cdouglas)
HADOOP-4261. Adds a setup task for jobs. This is required so that we
don't setup jobs that haven't been inited yet (since init could lead
to job failure). Only after the init has successfully happened do we
launch the setupJob task. (Amareshwari Sriramadasu via ddas)
HADOOP-4256. Removes Completed and Failed Job tables from
jobqueue_details.jsp. (Sreekanth Ramakrishnan via ddas)
HADOOP-4267. Occasional exceptions during shutting down HSQLDB is logged
but not rethrown. (enis)
HADOOP-4018. The number of tasks for a single job cannot exceed a
pre-configured maximum value. (dhruba)
HADOOP-4288. Fixes a NPE problem in CapacityScheduler.
(Amar Kamat via ddas)
HADOOP-4014. Create hard links with 'fsutil hardlink' on Windows. (shv)
HADOOP-4393. Merged org.apache.hadoop.fs.permission.AccessControlException
and org.apache.hadoop.security.AccessControlIOException into a single
class hadoop.security.AccessControlException. (omalley via acmurthy)
HADOOP-4287. Fixes an issue to do with maintaining counts of running/pending
maps/reduces. (Sreekanth Ramakrishnan via ddas)
HADOOP-4361. Makes sure that jobs killed from command line are killed
fast (i.e., there is a slot to run the cleanup task soon).
(Amareshwari Sriramadasu via ddas)
HADOOP-4400. Add "hdfs://" to fs.default.name on quickstart.html.
(Jeff Hammerbacher via omalley)
HADOOP-4378. Fix TestJobQueueInformation to use SleepJob rather than
WordCount via TestMiniMRWithDFS. (Sreekanth Ramakrishnan via acmurthy)
HADOOP-4376. Fix formatting in hadoop-default.xml for
hadoop.http.filter.initializers. (Enis Soztutar via acmurthy)
HADOOP-4410. Adds an extra arg to the API FileUtil.makeShellPath to
determine whether to canonicalize file paths or not.
(Amareshwari Sriramadasu via ddas)
HADOOP-4236. Ensure un-initialized jobs are killed correctly on
user-demand. (Sharad Agarwal via acmurthy)
HADOOP-4373. Fix calculation of Guaranteed Capacity for the
capacity-scheduler. (Hemanth Yamijala via acmurthy)
HADOOP-4053. Schedulers must be notified when jobs complete. (Amar Kamat via omalley)
HADOOP-4335. Fix FsShell -ls for filesystems without owners/groups. (David
Phillips via cdouglas)
HADOOP-4426. TestCapacityScheduler broke due to the two commits HADOOP-4053
and HADOOP-4373. This patch fixes that. (Hemanth Yamijala via ddas)
HADOOP-4418. Updates documentation in forrest for Mapred, streaming and pipes.
(Amareshwari Sriramadasu via ddas)
HADOOP-3155. Ensure that there is only one thread fetching
TaskCompletionEvents on TaskTracker re-init. (Dhruba Borthakur via
acmurthy)
HADOOP-4425. Fix EditLogInputStream to overload the bulk read method.
(cdouglas)
HADOOP-4427. Adds the new queue/job commands to the manual.
(Sreekanth Ramakrishnan via ddas)
HADOOP-4278. Increase debug logging for unit test TestDatanodeDeath.
Fix the case when primary is dead. (dhruba via szetszwo)
HADOOP-4423. Keep block length when the block recovery is triggered by
append. (szetszwo)
HADOOP-4449. Fix dfsadmin usage. (Raghu Angadi via cdouglas)
HADOOP-4455. Added TestSerDe so that unit tests can run successfully.
(Ashish Thusoo via dhruba)
HADOOP-4457. Fixes an input split logging problem introduced by
HADOOP-3245. (Amareshwari Sriramadasu via ddas)
HADOOP-4464. Separate out TestFileCreationClient from TestFileCreation.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-4404. saveFSImage() removes files from a storage directory that do
not correspond to its type. (shv)
HADOOP-4149. Fix handling of updates to the job priority, by changing the
list of jobs to be keyed by the priority, submit time, and job tracker id.
(Amar Kamat via omalley)
HADOOP-4296. Fix job client failures by not retiring a job as soon as it
is finished. (dhruba)
HADOOP-4439. Remove configuration variables that aren't usable yet, in
particular mapred.tasktracker.tasks.maxmemory and mapred.task.max.memory.
(Hemanth Yamijala via omalley)
HADOOP-4230. Fix for serde2 interface, limit operator, select * operator,
UDF trim functions and sampling. (Ashish Thusoo via dhruba)
HADOOP-4358. No need to truncate access time in INode. Also fixes NPE
in CreateEditsLog. (Raghu Angadi)
HADOOP-4387. TestHDFSFileSystemContract fails on windows nightly builds.
(Raghu Angadi)
HADOOP-4466. Ensure that SequenceFileOutputFormat isn't tied to Writables
and can be used with other Serialization frameworks. (Chris Wensel via
acmurthy)
HADOOP-4525. Fix ipc.server.ipcnodelay originally missed in in HADOOP-2232.
(cdouglas via Clint Morgan)
HADOOP-4498. Ensure that JobHistory correctly escapes the job name so that
regex patterns work. (Chris Wensel via acmurthy)
HADOOP-4446. Modify guaranteed capacity labels in capacity scheduler's UI
to reflect the information being displayed. (Sreekanth Ramakrishnan via
yhemanth)
HADOOP-4282. Some user facing URLs are not filtered by user filters.
(szetszwo)
HADOOP-4595. Fixes two race conditions - one to do with updating free slot count,
and another to do with starting the MapEventsFetcher thread. (ddas)
HADOOP-4552. Fix a deadlock in RPC server. (Raghu Angadi)
HADOOP-4471. Sort running jobs by priority in the capacity scheduler.
(Amar Kamat via yhemanth)
HADOOP-4500. Fix MultiFileSplit to get the FileSystem from the relevant
path rather than the JobClient. (Joydeep Sen Sarma via dhruba)
Release 0.18.4 - Unreleased
BUG FIXES
HADOOP-5114. Remove timeout for accept() in DataNode. This makes accept()
fail in JDK on Windows and causes many tests to fail. (Raghu Angadi)
HADOOP-5192. Block receiver should not remove a block that's created or
being written by other threads. (hairong)
HADOOP-5134. FSNamesystem#commitBlockSynchronization adds under-construction
block locations to blocksMap. (Dhruba Borthakur via hairong)
HADOOP-5412. Simulated DataNode should not write to a block that's being
written by another thread. (hairong)
HADOOP-5465. Fix the problem of blocks remaining under-replicated by
providing synchronized modification to the counter xmitsInProgress in
DataNode. (hairong)
HADOOP-5557. Fixes some minor problems in TestOverReplicatedBlocks.
(szetszwo)
HADOOP-5644. Namnode is stuck in safe mode. (Suresh Srinivas via hairong)
HADOOP-6017. Lease Manager in NameNode does not handle certain characters
in filenames. This results in fatal errors in Secondary NameNode and while
restrating NameNode. (Tsz Wo (Nicholas), SZE via rangadi)
Release 0.18.3 - 2009-01-27
IMPROVEMENTS
HADOOP-4150. Include librecordio in hadoop releases. (Giridharan Kesavan
via acmurthy)
BUG FIXES
HADOOP-4499. DFSClient should invoke checksumOk only once. (Raghu Angadi)
HADOOP-4597. Calculate mis-replicated blocks when safe-mode is turned
off manually. (shv)
HADOOP-3121. lsr should keep listing the remaining items but not
terminate if there is any IOException. (szetszwo)
HADOOP-4610. Always calculate mis-replicated blocks when safe-mode is
turned off. (shv)
HADOOP-3883. Limit namenode to assign at most one generation stamp for
a particular block within a short period. (szetszwo)
HADOOP-4556. Block went missing. (hairong)
HADOOP-4643. NameNode should exclude excessive replicas when counting
live replicas for a block. (hairong)
HADOOP-4703. Should not wait for proxy forever in lease recovering.
(szetszwo)
HADOOP-4647. NamenodeFsck should close the DFSClient it has created.
(szetszwo)
HADOOP-4616. Fuse-dfs can handle bad values from FileSystem.read call.
(Pete Wyckoff via dhruba)
HADOOP-4061. Throttle Datanode decommission monitoring in Namenode.
(szetszwo)
HADOOP-4659. Root cause of connection failure is being lost to code that
uses it for delaying startup. (Steve Loughran and Hairong via hairong)
HADOOP-4614. Lazily open segments when merging map spills to avoid using
too many file descriptors. (Yuri Pradkin via cdouglas)
HADOOP-4257. The DFS client should pick only one datanode as the candidate
to initiate lease recovery. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-4713. Fix librecordio to handle records larger than 64k. (Christian
Kunz via cdouglas)
HADOOP-4635. Fix a memory leak in fuse dfs. (pete wyckoff via mahadev)
HADOOP-4714. Report status between merges and make the number of records
between progress reports configurable. (Jothi Padmanabhan via cdouglas)
HADOOP-4726. Fix documentation typos "the the". (Edward J. Yoon via
szetszwo)
HADOOP-4679. Datanode prints tons of log messages: waiting for threadgroup
to exit, active threads is XX. (hairong)
HADOOP-4746. Job output directory should be normalized. (hairong)
HADOOP-4717. Removal of default port# in NameNode.getUri() causes a
map/reduce job failed to prompt temporary output. (hairong)
HADOOP-4778. Check for zero size block meta file when updating a block.
(szetszwo)
HADOOP-4742. Replica gets deleted by mistake. (Wang Xu via hairong)
HADOOP-4702. Failed block replication leaves an incomplete block in
receiver's tmp data directory. (hairong)
HADOOP-4613. Fix block browsing on Web UI. (Johan Oskarsson via shv)
HADOOP-4806. HDFS rename should not use src path as a regular expression.
(szetszwo)
HADOOP-4795. Prevent lease monitor getting into an infinite loop when
leases and the namespace tree does not match. (szetszwo)
HADOOP-4620. Fixes Streaming to handle well the cases of map/reduce with empty
input/output. (Ravi Gummadi via ddas)
HADOOP-4857. Fixes TestUlimit to have exactly 1 map in the jobs spawned.
(Ravi Gummadi via ddas)
HADOOP-4810. Data lost at cluster startup time. (hairong)
HADOOP-4797. Improve how RPC server reads and writes large buffers. Avoids
soft-leak of direct buffers and excess copies in NIO layer. (Raghu Angadi)
HADOOP-4840. TestNodeCount sometimes fails with NullPointerException.
(hairong)
HADOOP-4904. Fix deadlock while leaving safe mode. (shv)
HADOOP-1980. 'dfsadmin -safemode enter' should prevent the namenode from
leaving safemode automatically. (shv)
HADOOP-4951. Lease monitor should acquire the LeaseManager lock but not the
Monitor lock. (szetszwo)
HADOOP-4935. processMisReplicatedBlocks() should not clear
excessReplicateMap. (shv)
HADOOP-4961. Fix ConcurrentModificationException in lease recovery
of empty files. (shv)
HADOOP-4971. A long (unexpected) delay at datanodes could make subsequent
block reports from many datanode at the same time. (Raghu Angadi)
HADOOP-4910. NameNode should exclude replicas when choosing excessive
replicas to delete to avoid data lose. (hairong)
HADOOP-4983. Fixes a problem in updating Counters in the status reporting.
(Amareshwari Sriramadasu via ddas)
HADOOP-4924. Fixes a race condition in TaskTracker re-init. (ddas)
Release 0.18.2 - 2008-11-03
BUG FIXES
HADOOP-3614. Fix a bug that Datanode may use an old GenerationStamp to get
meta file. (szetszwo)
HADOOP-4314. Simulated datanodes should not include blocks that are still
being written in their block report. (Raghu Angadi)
HADOOP-4228. dfs datanode metrics, bytes_read and bytes_written, overflow
due to incorrect type used. (hairong)
HADOOP-4395. The FSEditLog loading is incorrect for the case OP_SET_OWNER.
(szetszwo)
HADOOP-4351. FSNamesystem.getBlockLocationsInternal throws
ArrayIndexOutOfBoundsException. (hairong)
HADOOP-4403. Make TestLeaseRecovery and TestFileCreation more robust.
(szetszwo)
HADOOP-4292. Do not support append() for LocalFileSystem. (hairong)
HADOOP-4399. Make fuse-dfs multi-thread access safe.
(Pete Wyckoff via dhruba)
HADOOP-4369. Use setMetric(...) instead of incrMetric(...) for metrics
averages. (Brian Bockelman via szetszwo)
HADOOP-4469. Rename and add the ant task jar file to the tar file. (nigel)
HADOOP-3914. DFSClient sends Checksum Ok only once for a block.
(Christian Kunz via hairong)
HADOOP-4467. SerializationFactory now uses the current context ClassLoader
allowing for user supplied Serialization instances. (Chris Wensel via
acmurthy)
HADOOP-4517. Release FSDataset lock before joining ongoing create threads.
(szetszwo)
HADOOP-4526. fsck failing with NullPointerException. (hairong)
HADOOP-4483 Honor the max parameter in DatanodeDescriptor.getBlockArray(..)
(Ahad Rana and Hairong Kuang via szetszwo)
HADOOP-4340. Correctly set the exit code from JobShell.main so that the
'hadoop jar' command returns the right code to the user. (acmurthy)
NEW FEATURES
HADOOP-2421. Add jdiff output to documentation, listing all API
changes from the prior release. (cutting)
Release 0.18.1 - 2008-09-17
IMPROVEMENTS
HADOOP-3934. Upgrade log4j to 1.2.15. (omalley)
BUG FIXES
HADOOP-3995. In case of quota failure on HDFS, rename does not restore
source filename. (rangadi)
HADOOP-3821. Prevent SequenceFile and IFile from duplicating codecs in
CodecPool when closed more than once. (Arun Murthy via cdouglas)
HADOOP-4040. Remove coded default of the IPC idle connection timeout
from the TaskTracker, which was causing HDFS client connections to not be
collected. (ddas via omalley)
HADOOP-4046. Made WritableComparable's constructor protected instead of
private to re-enable class derivation. (cdouglas via omalley)
HADOOP-3940. Fix in-memory merge condition to wait when there are no map
outputs or when the final map outputs are being fetched without contention.
(cdouglas)
Release 0.18.0 - 2008-08-19
INCOMPATIBLE CHANGES
HADOOP-2703. The default options to fsck skips checking files
that are being written to. The output of fsck is incompatible
with previous release. (lohit vijayarenu via dhruba)
HADOOP-2865. FsShell.ls() printout format changed to print file names
in the end of the line. (Edward J. Yoon via shv)
HADOOP-3283. The Datanode has a RPC server. It currently supports
two RPCs: the first RPC retrives the metadata about a block and the
second RPC sets the generation stamp of an existing block.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2797. Code related to upgrading to 0.14 (Block CRCs) is
removed. As result, upgrade to 0.18 or later from 0.13 or earlier
is not supported. If upgrading from 0.13 or earlier is required,
please upgrade to an intermediate version (0.14-0.17) and then
to this version. (rangadi)
HADOOP-544. This issue introduces new classes JobID, TaskID and
TaskAttemptID, which should be used instead of their string counterparts.
Functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and
TaskCompletionEvent that use string arguments are deprecated in favor
of the corresponding ones that use ID objects. Applications can use
xxxID.toString() and xxxID.forName() methods to convert/restore objects
to/from strings. (Enis Soztutar via ddas)
HADOOP-2188. RPC client sends a ping rather than throw timeouts.
RPC server does not throw away old RPCs. If clients and the server are on
different versions, they are not able to function well. In addition,
The property ipc.client.timeout is removed from the default hadoop
configuration. It also removes metrics RpcOpsDiscardedOPsNum. (hairong)
HADOOP-2181. This issue adds logging for input splits in Jobtracker log
and jobHistory log. Also adds web UI for viewing input splits in job UI
and history UI. (Amareshwari Sriramadasu via ddas)
HADOOP-3226. Run combiners multiple times over map outputs as they
are merged in both the map and the reduce tasks. (cdouglas via omalley)
HADOOP-3329. DatanodeDescriptor objects should not be stored in the
fsimage. (dhruba)
HADOOP-2656. The Block object has a generation stamp inside it.
Existing blocks get a generation stamp of 0. This is needed to support
appends. (dhruba)
HADOOP-3390. Removed deprecated ClientProtocol.abandonFileInProgress().
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3405. Made some map/reduce internal classes non-public:
MapTaskStatus, ReduceTaskStatus, JobSubmissionProtocol,
CompletedJobStatusStore. (enis via omaley)
HADOOP-3265. Removed depcrecated API getFileCacheHints().
(Lohit Vijayarenu via rangadi)
HADOOP-3310. The namenode instructs the primary datanode to do lease
recovery. The block gets a new generation stamp.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2909. Improve IPC idle connection management. Property
ipc.client.maxidletime is removed from the default configuration,
instead it is defined as twice of the ipc.client.connection.maxidletime.
A connection with outstanding requests won't be treated as idle.
(hairong)
HADOOP-3459. Change in the output format of dfs -ls to more closely match
/bin/ls. New format is: perm repl owner group size date name
(Mukund Madhugiri via omally)
HADOOP-3113. An fsync invoked on a HDFS file really really
persists data! The datanode moves blocks in the tmp directory to
the real block directory on a datanode-restart. (dhruba)
HADOOP-3452. Change fsck to return non-zero status for a corrupt
FileSystem. (lohit vijayarenu via cdouglas)
HADOOP-3193. Include the address of the client that found the corrupted
block in the log. Also include a CorruptedBlocks metric to track the size
of the corrupted block map. (cdouglas)
HADOOP-3512. Separate out the tools into a tools jar. (omalley)
HADOOP-3598. Ensure that temporary task-output directories are not created
if they are not necessary e.g. for Maps with no side-effect files.
(acmurthy)
HADOOP-3665. Modify WritableComparator so that it only creates instances
of the keytype if the type does not define a WritableComparator. Calling
the superclass compare will throw a NullPointerException. Also define
a RawComparator for NullWritable and permit it to be written as a key
to SequenceFiles. (cdouglas)
HADOOP-3673. Avoid deadlock caused by DataNode RPC receoverBlock().
(Tsz Wo (Nicholas), SZE via rangadi)
NEW FEATURES
HADOOP-3074. Provides a UrlStreamHandler for DFS and other FS,
relying on FileSystem (taton)
HADOOP-2585. Name-node imports namespace data from a recent checkpoint
accessible via a NFS mount. (shv)
HADOOP-3061. Writable types for doubles and bytes. (Andrzej
Bialecki via omalley)
HADOOP-2857. Allow libhdfs to set jvm options. (Craig Macdonald
via omalley)
HADOOP-3317. Add default port for HDFS namenode. The port in
"hdfs:" URIs now defaults to 8020, so that one may simply use URIs
of the form "hdfs://example.com/dir/file". (cutting)
HADOOP-2019. Adds support for .tar, .tgz and .tar.gz files in
DistributedCache (Amareshwari Sriramadasu via ddas)
HADOOP-3058. Add FSNamesystem status metrics.
(Lohit Vjayarenu via rangadi)
HADOOP-1915. Allow users to specify counters via strings instead
of enumerations. (tomwhite via omalley)
HADOOP-2065. Delay invalidating corrupt replicas of block until its
is removed from under replicated state. If all replicas are found to
be corrupt, retain all copies and mark the block as corrupt.
(Lohit Vjayarenu via rangadi)
HADOOP-3221. Adds org.apache.hadoop.mapred.lib.NLineInputFormat, which
splits files into splits each of N lines. N can be specified by
configuration property "mapred.line.input.format.linespermap", which
defaults to 1. (Amareshwari Sriramadasu via ddas)
HADOOP-3336. Direct a subset of annotated FSNamesystem calls for audit
logging. (cdouglas)
HADOOP-3400. A new API FileSystem.deleteOnExit() that facilitates
handling of temporary files in HDFS. (dhruba)
HADOOP-4. Add fuse-dfs to contrib, permitting one to mount an
HDFS filesystem on systems that support FUSE, e.g., Linux.
(Pete Wyckoff via cutting)
HADOOP-3246. Add FTPFileSystem. (Ankur Goel via cutting)
HADOOP-3250. Extend FileSystem API to allow appending to files.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3177. Implement Syncable interface for FileSystem.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-1328. Implement user counters in streaming. (tomwhite via
omalley)
HADOOP-3187. Quotas for namespace management. (Hairong Kuang via ddas)
HADOOP-3307. Support for Archives in Hadoop. (Mahadev Konar via ddas)
HADOOP-3460. Add SequenceFileAsBinaryOutputFormat to permit direct
writes of serialized data. (Koji Noguchi via cdouglas)
HADOOP-3230. Add ability to get counter values from command
line. (tomwhite via omalley)
HADOOP-930. Add support for native S3 files. (tomwhite via cutting)
HADOOP-3502. Quota API needs documentation in Forrest. (hairong)
HADOOP-3413. Allow SequenceFile.Reader to use serialization
framework. (tomwhite via omalley)
HADOOP-3541. Import of the namespace from a checkpoint documented
in hadoop user guide. (shv)
IMPROVEMENTS
HADOOP-3677. Simplify generation stamp upgrade by making is a
local upgrade on datandodes. Deleted distributed upgrade.
(rangadi)
HADOOP-2928. Remove deprecated FileSystem.getContentLength().
(Lohit Vijayarenu via rangadi)
HADOOP-3130. Make the connect timeout smaller for getFile.
(Amar Ramesh Kamat via ddas)
HADOOP-3160. Remove deprecated exists() from ClientProtocol and
FSNamesystem (Lohit Vjayarenu via rangadi)
HADOOP-2910. Throttle IPC Clients during bursts of requests or
server slowdown. Clients retry connection for up to 15 minutes
when socket connection times out. (hairong)
HADOOP-3295. Allow TextOutputFormat to use configurable spearators.
(Zheng Shao via cdouglas).
HADOOP-3308. Improve QuickSort by excluding values eq the pivot from the
partition. (cdouglas)
HADOOP-2461. Trim property names in configuration.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2799. Deprecate o.a.h.io.Closable in favor of java.io.Closable.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3345. Enhance the hudson-test-patch target to cleanup messages,
fix minor defects, and add eclipse plugin and python unit tests. (nigel)
HADOOP-3144. Improve robustness of LineRecordReader by defining a maximum
line length (mapred.linerecordreader.maxlength), thereby avoiding reading
too far into the following split. (Zheng Shao via cdouglas)
HADOOP-3334. Move lease handling from FSNamesystem into a seperate class.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3332. Reduces the amount of logging in Reducer's shuffle phase.
(Devaraj Das)
HADOOP-3355. Enhances Configuration class to accept hex numbers for getInt
and getLong. (Amareshwari Sriramadasu via ddas)
HADOOP-3350. Add an argument to distcp to permit the user to limit the
number of maps. (cdouglas)
HADOOP-3013. Add corrupt block reporting to fsck.
(lohit vijayarenu via cdouglas)
HADOOP-3377. Remove TaskRunner::replaceAll and replace with equivalent
String::replace. (Brice Arnould via cdouglas)
HADOOP-3398. Minor improvement to a utility function in that participates
in backoff calculation. (cdouglas)
HADOOP-3381. Clear referenced when directories are deleted so that
effect of memory leaks are not multiplied. (rangadi)
HADOOP-2867. Adds the task's CWD to its LD_LIBRARY_PATH.
(Amareshwari Sriramadasu via ddas)
HADOOP-3232. DU class runs the 'du' command in a seperate thread so
that it does not block user. DataNode misses heartbeats in large
nodes otherwise. (Johan Oskarsson via rangadi)
HADOOP-3035. During block transfers between datanodes, the receiving
datanode, now can report corrupt replicas received from src node to
the namenode. (Lohit Vijayarenu via rangadi)
HADOOP-3434. Retain the cause of the bind failure in Server::bind.
(Steve Loughran via cdouglas)
HADOOP-3429. Increases the size of the buffers used for the communication
for Streaming jobs. (Amareshwari Sriramadasu via ddas)
HADOOP-3486. Change default for initial block report to 0 seconds
and document it. (Sanjay Radia via omalley)
HADOOP-3448. Improve the text in the assertion making sure the
layout versions are consistent in the data node. (Steve Loughran
via omalley)
HADOOP-2095. Improve the Map-Reduce shuffle/merge by cutting down
buffer-copies; changed intermediate sort/merge to use the new IFile format
rather than SequenceFiles and compression of map-outputs is now
implemented by compressing the entire file rather than SequenceFile
compression. Shuffle also has been changed to use a simple byte-buffer
manager rather than the InMemoryFileSystem.
Configuration changes to hadoop-default.xml:
deprecated mapred.map.output.compression.type
(acmurthy)
HADOOP-236. JobTacker now refuses connection from a task tracker with a
different version number. (Sharad Agarwal via ddas)
HADOOP-3427. Improves the shuffle scheduler. It now waits for notifications
from shuffle threads when it has scheduled enough, before scheduling more.
(ddas)
HADOOP-2393. Moves the handling of dir deletions in the tasktracker to
a separate thread. (Amareshwari Sriramadasu via ddas)
HADOOP-3501. Deprecate InMemoryFileSystem. (cutting via omalley)
HADOOP-3366. Stall the shuffle while in-memory merge is in progress.
(acmurthy)
HADOOP-2916. Refactor src structure, but leave package structure alone.
(Raghu Angadi via mukund)
HADOOP-3492. Add forrest documentation for user archives.
(Mahadev Konar via hairong)
HADOOP-3467. Improve documentation for FileSystem::deleteOnExit.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3379. Documents stream.non.zero.exit.status.is.failure for Streaming.
(Amareshwari Sriramadasu via ddas)
HADOOP-3096. Improves documentation about the Task Execution Environment in
the Map-Reduce tutorial. (Amareshwari Sriramadasu via ddas)
HADOOP-2984. Add forrest documentation for DistCp. (cdouglas)
HADOOP-3406. Add forrest documentation for Profiling.
(Amareshwari Sriramadasu via ddas)
HADOOP-2762. Add forrest documentation for controls of memory limits on
hadoop daemons and Map-Reduce tasks. (Amareshwari Sriramadasu via ddas)
HADOOP-3535. Fix documentation and name of IOUtils.close to
reflect that it should only be used in cleanup contexts. (omalley)
HADOOP-3593. Updates the mapred tutorial. (ddas)
HADOOP-3547. Documents the way in which native libraries can be distributed
via the DistributedCache. (Amareshwari Sriramadasu via ddas)
HADOOP-3606. Updates the Streaming doc. (Amareshwari Sriramadasu via ddas)
HADOOP-3532. Add jdiff reports to the build scripts. (omalley)
HADOOP-3100. Develop tests to test the DFS command line interface. (mukund)
HADOOP-3688. Fix up HDFS docs. (Robert Chansler via hairong)
OPTIMIZATIONS
HADOOP-3274. The default constructor of BytesWritable creates empty
byte array. (Tsz Wo (Nicholas), SZE via shv)
HADOOP-3272. Remove redundant copy of Block object in BlocksMap.
(Lohit Vjayarenu via shv)
HADOOP-3164. Reduce DataNode CPU usage by using FileChannel.tranferTo().
On Linux DataNode takes 5 times less CPU while serving data. Results may
vary on other platforms. (rangadi)
HADOOP-3248. Optimization of saveFSImage. (Dhruba via shv)
HADOOP-3297. Fetch more task completion events from the job
tracker and task tracker. (ddas via omalley)
HADOOP-3364. Faster image and log edits loading. (shv)
HADOOP-3369. Fast block processing during name-node startup. (shv)
HADOOP-1702. Reduce buffer copies when data is written to DFS.
DataNodes take 30% less CPU while writing data. (rangadi)
HADOOP-3095. Speed up split generation in the FileInputSplit,
especially for non-HDFS file systems. Deprecates
InputFormat.validateInput. (tomwhite via omalley)
HADOOP-3552. Add forrest documentation for Hadoop commands.
(Sharad Agarwal via cdouglas)
BUG FIXES
HADOOP-2905. 'fsck -move' triggers NPE in NameNode.
(Lohit Vjayarenu via rangadi)
Increment ClientProtocol.versionID missed by HADOOP-2585. (shv)
HADOOP-3254. Restructure internal namenode methods that process
heartbeats to use well-defined BlockCommand object(s) instead of
using the base java Object. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3176. Change lease record when a open-for-write-file
gets renamed. (dhruba)
HADOOP-3269. Fix a case when namenode fails to restart
while processing a lease record. ((Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3282. Port issues in TestCheckpoint resolved. (shv)
HADOOP-3268. file:// URLs issue in TestUrlStreamHandler under Windows.
(taton)
HADOOP-3127. Deleting files in trash should really remove them.
(Brice Arnould via omalley)
HADOOP-3300. Fix locking of explicit locks in NetworkTopology.
(tomwhite via omalley)
HADOOP-3270. Constant DatanodeCommands are stored in static final
immutable variables for better code clarity.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2793. Fix broken links for worst performing shuffle tasks in
the job history page. (Amareshwari Sriramadasu via ddas)
HADOOP-3313. Avoid unnecessary calls to System.currentTimeMillis
in RPC::Invoker. (cdouglas)
HADOOP-3318. Recognize "Darwin" as an alias for "Mac OS X" to
support Soylatte. (Sam Pullara via omalley)
HADOOP-3301. Fix misleading error message when S3 URI hostname
contains an underscore. (tomwhite via omalley)
HADOOP-3338. Fix Eclipse plugin to compile after HADOOP-544 was
committed. Updated all references to use the new JobID representation.
(taton via nigel)
HADOOP-3337. Loading FSEditLog was broken by HADOOP-3283 since it
changed Writable serialization of DatanodeInfo. This patch handles it.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3101. Prevent JobClient from throwing an exception when printing
usage. (Edward J. Yoon via cdouglas)
HADOOP-3119. Update javadoc for Text::getBytes to better describe its
behavior. (Tim Nelson via cdouglas)
HADOOP-2294. Fix documentation in libhdfs to refer to the correct free
function. (Craig Macdonald via cdouglas)
HADOOP-3335. Prevent the libhdfs build from deleting the wrong
files on make clean. (cutting via omalley)
HADOOP-2930. Make {start,stop}-balancer.sh work even if hadoop-daemon.sh
is not in the PATH. (Spiros Papadimitriou via hairong)
HADOOP-3085. Catch Exception in metrics util classes to ensure that
misconfigured metrics don't prevent others from updating. (cdouglas)
HADOOP-3299. CompositeInputFormat should configure the sub-input
formats. (cdouglas via omalley)
HADOOP-3309. Lower io.sort.mb and fs.inmemory.size.mb for MiniMRDFSSort
unit test so it passes on Windows. (lohit vijayarenu via cdouglas)
HADOOP-3348. TestUrlStreamHandler should set URLStreamFactory after
DataNodes are initialized. (Lohit Vijayarenu via rangadi)
HADOOP-3371. Ignore InstanceAlreadyExistsException from
MBeanUtil::registerMBean. (lohit vijayarenu via cdouglas)
HADOOP-3349. A file rename was incorrectly changing the name inside a
lease record. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3365. Removes an unnecessary copy of the key from SegmentDescriptor
to MergeQueue. (Devaraj Das)
HADOOP-3388. Fix for TestDatanodeBlockScanner to handle blocks with
generation stamps in them. (dhruba)
HADOOP-3203. Fixes TaskTracker::localizeJob to pass correct file sizes
for the jarfile and the jobfile. (Amareshwari Sriramadasu via ddas)
HADOOP-3391. Fix a findbugs warning introduced by HADOOP-3248 (rangadi)
HADOOP-3393. Fix datanode shutdown to call DataBlockScanner::shutdown and
close its log, even if the scanner thread is not running. (lohit vijayarenu
via cdouglas)
HADOOP-3399. A debug message was logged at info level. (rangadi)
HADOOP-3396. TestDatanodeBlockScanner occationally fails.
(Lohit Vijayarenu via rangadi)
HADOOP-3339. Some of the failures on 3rd datanode in DFS write pipelie
are not detected properly. This could lead to hard failure of client's
write operation. (rangadi)
HADOOP-3409. Namenode should save the root inode into fsimage. (hairong)
HADOOP-3296. Fix task cache to work for more than two levels in the cache
hierarchy. This also adds a new counter to track cache hits at levels
greater than two. (Amar Kamat via cdouglas)
HADOOP-3375. Lease paths were sometimes not removed from
LeaseManager.sortedLeasesByPath. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3424. Values returned by getPartition should be checked to
make sure they are in the range 0 to #reduces - 1 (cdouglas via
omalley)
HADOOP-3408. Change FSNamesystem to send its metrics as integers to
accommodate collectors that don't support long values. (lohit vijayarenu
via cdouglas)
HADOOP-3403. Fixes a problem in the JobTracker to do with handling of lost
tasktrackers. (Arun Murthy via ddas)
HADOOP-1318. Completed maps are not failed if the number of reducers are
zero. (Amareshwari Sriramadasu via ddas).
HADOOP-3351. Fixes the history viewer tool to not do huge StringBuffer
allocations. (Amareshwari Sriramadasu via ddas)
HADOOP-3419. Fixes TestFsck to wait for updates to happen before
checking results to make the test more reliable. (Lohit Vijaya
Renu via omalley)
HADOOP-3259. Makes failure to read system properties due to a
security manager non-fatal. (Edward Yoon via omalley)
HADOOP-3451. Update libhdfs to use FileSystem::getFileBlockLocations
instead of removed getFileCacheHints. (lohit vijayarenu via cdouglas)
HADOOP-3401. Update FileBench to set the new
"mapred.work.output.dir" property to work post-3041. (cdouglas via omalley)
HADOOP-2669. DFSClient locks pendingCreates appropriately. (dhruba)
HADOOP-3410. Fix KFS implemenation to return correct file
modification time. (Sriram Rao via cutting)
HADOOP-3340. Fix DFS metrics for BlocksReplicated, HeartbeatsNum, and
BlockReportsAverageTime. (lohit vijayarenu via cdouglas)
HADOOP-3435. Remove the assuption in the scripts that bash is at
/bin/bash and fix the test patch to require bash instead of sh.
(Brice Arnould via omalley)
HADOOP-3471. Fix spurious errors from TestIndexedSort and add additional
logging to let failures be reproducible. (cdouglas)
HADOOP-3443. Avoid copying map output across partitions when renaming a
single spill. (omalley via cdouglas)
HADOOP-3454. Fix Text::find to search only valid byte ranges. (Chad Whipkey
via cdouglas)
HADOOP-3417. Removes the static configuration variable,
commandLineConfig from JobClient. Moves the cli parsing from
JobShell to GenericOptionsParser. Thus removes the class
org.apache.hadoop.mapred.JobShell. (Amareshwari Sriramadasu via
ddas)
HADOOP-2132. Only RUNNING/PREP jobs can be killed. (Jothi Padmanabhan
via ddas)
HADOOP-3476. Code cleanup in fuse-dfs.
(Peter Wyckoff via dhruba)
HADOOP-2427. Ensure that the cwd of completed tasks is cleaned-up
correctly on task-completion. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2565. Remove DFSPath cache of FileStatus.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3326. Cleanup the local-fs and in-memory merge in the ReduceTask by
spawing only one thread each for the on-disk and in-memory merge.
(Sharad Agarwal via acmurthy)
HADOOP-3493. Fix TestStreamingFailure to use FileUtil.fullyDelete to
ensure correct cleanup. (Lohit Vijayarenu via acmurthy)
HADOOP-3455. Fix NPE in ipc.Client in case of connection failure and
improve its synchronization. (hairong)
HADOOP-3240. Fix a testcase to not create files in the current directory.
Instead the file is created in the test directory (Mahadev Konar via ddas)
HADOOP-3496. Fix failure in TestHarFileSystem.testArchives due to change
in HADOOP-3095. (tomwhite)
HADOOP-3135. Get the system directory from the JobTracker instead of from
the conf. (Subramaniam Krishnan via ddas)
HADOOP-3503. Fix a race condition when client and namenode start
simultaneous recovery of the same block. (dhruba & Tsz Wo
(Nicholas), SZE)
HADOOP-3440. Fixes DistributedCache to not create symlinks for paths which
don't have fragments even when createSymLink is true.
(Abhijit Bagri via ddas)
HADOOP-3463. Hadoop-daemons script should cd to $HADOOP_HOME. (omalley)
HADOOP-3489. Fix NPE in SafeModeMonitor. (Lohit Vijayarenu via shv)
HADOOP-3509. Fix NPE in FSNamesystem.close. (Tsz Wo (Nicholas), SZE via
shv)
HADOOP-3491. Name-node shutdown causes InterruptedException in
ResolutionMonitor. (Lohit Vijayarenu via shv)
HADOOP-3511. Fixes namenode image to not set the root's quota to an
invalid value when the quota was not saved in the image. (hairong)
HADOOP-3516. Ensure the JobClient in HadoopArchives is initialized
with a configuration. (Subramaniam Krishnan via omalley)
HADOOP-3513. Improve NNThroughputBenchmark log messages. (shv)
HADOOP-3519. Fix NPE in DFS FileSystem rename. (hairong via tomwhite)
HADOOP-3528. Metrics FilesCreated and files_deleted metrics
do not match. (Lohit via Mahadev)
HADOOP-3418. When a directory is deleted, any leases that point to files
in the subdirectory are removed. ((Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3542. Diables the creation of _logs directory for the archives
directory. (Mahadev Konar via ddas)
HADOOP-3544. Fixes a documentation issue for hadoop archives.
(Mahadev Konar via ddas)
HADOOP-3517. Fixes a problem in the reducer due to which the last InMemory
merge may be missed. (Arun Murthy via ddas)
HADOOP-3548. Fixes build.xml to copy all *.jar files to the dist.
(Owen O'Malley via ddas)
HADOOP-3363. Fix unformatted storage detection in FSImage. (shv)
HADOOP-3560. Fixes a problem to do with split creation in archives.
(Mahadev Konar via ddas)
HADOOP-3545. Fixes a overflow problem in archives.
(Mahadev Konar via ddas)
HADOOP-3561. Prevent the trash from deleting its parent directories.
(cdouglas)
HADOOP-3575. Fix the clover ant target after package refactoring.
(Nigel Daley via cdouglas)
HADOOP-3539. Fix the tool path in the bin/hadoop script under
cygwin. (Tsz Wo (Nicholas), Sze via omalley)
HADOOP-3520. TestDFSUpgradeFromImage triggers a race condition in the
Upgrade Manager. Fixed. (dhruba)
HADOOP-3586. Provide deprecated, backwards compatibile semantics for the
combiner to be run once and only once on each record. (cdouglas)
HADOOP-3533. Add deprecated methods to provide API compatibility
between 0.18 and 0.17. Remove the deprecated methods in trunk. (omalley)
HADOOP-3580. Fixes a problem to do with specifying a har as an input to
a job. (Mahadev Konar via ddas)
HADOOP-3333. Don't assign a task to a tasktracker that it failed to
execute earlier (used to happen in the case of lost tasktrackers where
the tasktracker would reinitialize and bind to a different port).
(Jothi Padmanabhan and Arun Murthy via ddas)
HADOOP-3534. Log IOExceptions that happen in closing the name
system when the NameNode shuts down. (Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3546. TaskTracker re-initialization gets stuck in cleaning up.
(Amareshwari Sriramadasu via ddas)
HADOOP-3576. Fix NullPointerException when renaming a directory
to its subdirectory. (Tse Wo (Nicholas), SZE via hairong)
HADOOP-3320. Fix NullPointerException in NetworkTopology.getDistance().
(hairong)
HADOOP-3569. KFS input stream read() now correctly reads 1 byte
instead of 4. (Sriram Rao via omalley)
HADOOP-3599. Fix JobConf::setCombineOnceOnly to modify the instance rather
than a parameter. (Owen O'Malley via cdouglas)
HADOOP-3590. Null pointer exception in JobTracker when the task tracker is
not yet resolved. (Amar Ramesh Kamat via ddas)
HADOOP-3603. Fix MapOutputCollector to spill when io.sort.spill.percent is
1.0 and to detect spills when emitted records write no data. (cdouglas)
HADOOP-3615. Set DatanodeProtocol.versionID to the correct value.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3559. Fix the libhdfs test script and config to work with the
current semantics. (lohit vijayarenu via cdouglas)
HADOOP-3480. Need to update Eclipse template to reflect current trunk.
(Brice Arnould via tomwhite)
HADOOP-3588. Fixed usability issues with archives. (mahadev)
HADOOP-3635. Uncaught exception in DataBlockScanner.
(Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3639. Exception when closing DFSClient while multiple files are
open. (Benjamin Gufler via hairong)
HADOOP-3572. SetQuotas usage interface has some minor bugs. (hairong)
HADOOP-3649. Fix bug in removing blocks from the corrupted block map.
(Lohit Vijayarenu via shv)
HADOOP-3604. Work around a JVM synchronization problem observed while
retrieving the address of direct buffers from compression code by obtaining
a lock during this call. (Arun C Murthy via cdouglas)
HADOOP-3683. Fix dfs metrics to count file listings rather than files
listed. (lohit vijayarenu via cdouglas)
HADOOP-3597. Fix SortValidator to use filesystems other than the default as
input. Validation job still runs on default fs.
(Jothi Padmanabhan via cdouglas)
HADOOP-3693. Fix archives, distcp and native library documentation to
conform to style guidelines. (Amareshwari Sriramadasu via cdouglas)
HADOOP-3653. Fix test-patch target to properly account for Eclipse
classpath jars. (Brice Arnould via nigel)
HADOOP-3692. Fix documentation for Cluster setup and Quick start guides.
(Amareshwari Sriramadasu via ddas)
HADOOP-3691. Fix streaming and tutorial docs. (Jothi Padmanabhan via ddas)
HADOOP-3630. Fix NullPointerException in CompositeRecordReader from empty
sources (cdouglas)
HADOOP-3706. Fix a ClassLoader issue in the mapred.join Parser that
prevents it from loading user-specified InputFormats.
(Jingkei Ly via cdouglas)
HADOOP-3718. Fix KFSOutputStream::write(int) to output a byte instead of
an int, per the OutputStream contract. (Sriram Rao via cdouglas)
HADOOP-3647. Add debug logs to help track down a very occassional,
hard-to-reproduce, bug in shuffle/merge on the reducer. (acmurthy)
HADOOP-3716. Prevent listStatus in KosmosFileSystem from returning
null for valid, empty directories. (Sriram Rao via cdouglas)
HADOOP-3752. Fix audit logging to record rename events. (cdouglas)
HADOOP-3737. Fix CompressedWritable to call Deflater::end to release
compressor memory. (Grant Glouser via cdouglas)
HADOOP-3670. Fixes JobTracker to clear out split bytes when no longer
required. (Amareshwari Sriramadasu via ddas)
HADOOP-3755. Update gridmix to work with HOD 0.4 (Runping Qi via cdouglas)
HADOOP-3743. Fix -libjars, -files, -archives options to work even if
user code does not implement tools. (Amareshwari Sriramadasu via mahadev)
HADOOP-3774. Fix typos in shell output. (Tsz Wo (Nicholas), SZE via
cdouglas)
HADOOP-3762. Fixed FileSystem cache to work with the default port. (cutting
via omalley)
HADOOP-3798. Fix tests compilation. (Mukund Madhugiri via omalley)
HADOOP-3794. Return modification time instead of zero for KosmosFileSystem.
(Sriram Rao via cdouglas)
HADOOP-3806. Remove debug statement to stdout from QuickSort. (cdouglas)
HADOOP-3776. Fix NPE at NameNode when datanode reports a block after it is
deleted at NameNode. (rangadi)
HADOOP-3537. Disallow adding a datanode to a network topology when its
network location is not resolved. (hairong)
HADOOP-3571. Fix bug in block removal used in lease recovery. (shv)
HADOOP-3645. MetricsTimeVaryingRate returns wrong value for
metric_avg_time. (Lohit Vijayarenu via hairong)
HADOOP-3521. Reverted the missing cast to float for sending Counters' values
to Hadoop metrics which was removed by HADOOP-544. (acmurthy)
HADOOP-3820. Fixes two problems in the gridmix-env - a syntax error, and a
wrong definition of USE_REAL_DATASET by default. (Arun Murthy via ddas)
HADOOP-3724. Fixes two problems related to storing and recovering lease
in the fsimage. (dhruba)
HADOOP-3827. Fixed compression of empty map-outputs. (acmurthy)
HADOOP-3865. Remove reference to FSNamesystem from metrics preventing
garbage collection. (Lohit Vijayarenu via cdouglas)
HADOOP-3884. Fix so that Eclipse plugin builds against recent
Eclipse releases. (cutting)
HADOOP-3837. Streaming jobs report progress status. (dhruba)
HADOOP-3897. Fix a NPE in secondary namenode. (Lohit Vijayarenu via
cdouglas)
HADOOP-3901. Fix bin/hadoop to correctly set classpath under cygwin.
(Tsz Wo (Nicholas) Sze via omalley)
HADOOP-3947. Fix a problem in tasktracker reinitialization.
(Amareshwari Sriramadasu via ddas)
Release 0.17.3 - Unreleased
IMPROVEMENTS
HADOOP-4164. Chinese translation of the documentation. (Xuebing Yan via
omalley)
BUG FIXES
HADOOP-4277. Checksum verification was mistakenly disabled for
LocalFileSystem. (Raghu Angadi)
HADOOP-4271. Checksum input stream can sometimes return invalid
data to the user. (Ning Li via rangadi)
HADOOP-4318. DistCp should use absolute paths for cleanup. (szetszwo)
HADOOP-4326. ChecksumFileSystem does not override create(...) correctly.
(szetszwo)
Release 0.17.2 - 2008-08-11
BUG FIXES
HADOOP-3678. Avoid spurious exceptions logged at DataNode when clients
read from DFS. (rangadi)
HADOOP-3707. NameNode keeps a count of number of blocks scheduled
to be written to a datanode and uses it to avoid allocating more
blocks than a datanode can hold. (rangadi)
HADOOP-3760. Fix a bug with HDFS file close() mistakenly introduced
by HADOOP-3681. (Lohit Vijayarenu via rangadi)
HADOOP-3681. DFSClient can get into an infinite loop while closing
a file if there are some errors. (Lohit Vijayarenu via rangadi)
HADOOP-3002. Hold off block removal while in safe mode. (shv)
HADOOP-3685. Unbalanced replication target. (hairong)
HADOOP-3758. Shutdown datanode on version mismatch instead of retrying
continuously, preventing excessive logging at the namenode.
(lohit vijayarenu via cdouglas)
HADOOP-3633. Correct exception handling in DataXceiveServer, and throttle
the number of xceiver threads in a data-node. (shv)
HADOOP-3370. Ensure that the TaskTracker.runningJobs data-structure is
correctly cleaned-up on task completion. (Zheng Shao via acmurthy)
HADOOP-3813. Fix task-output clean-up on HDFS to use the recursive
FileSystem.delete rather than the FileUtil.fullyDelete. (Amareshwari
Sri Ramadasu via acmurthy)
HADOOP-3859. Allow the maximum number of xceivers in the data node to
be configurable. (Johan Oskarsson via omalley)
HADOOP-3931. Fix corner case in the map-side sort that causes some values
to be counted as too large and cause pre-mature spills to disk. Some values
will also bypass the combiner incorrectly. (cdouglas via omalley)
Release 0.17.1 - 2008-06-23
INCOMPATIBLE CHANGES
HADOOP-3565. Fix the Java serialization, which is not enabled by
default, to clear the state of the serializer between objects.
(tomwhite via omalley)
IMPROVEMENTS
HADOOP-3522. Improve documentation on reduce pointing out that
input keys and values will be reused. (omalley)
HADOOP-3487. Balancer uses thread pools for managing its threads;
therefore provides better resource management. (hairong)
BUG FIXES
HADOOP-2159 Namenode stuck in safemode. The counter blockSafe should
not be decremented for invalid blocks. (hairong)
HADOOP-3472 MapFile.Reader getClosest() function returns incorrect results
when before is true (Todd Lipcon via Stack)
HADOOP-3442. Limit recursion depth on the stack for QuickSort to prevent
StackOverflowErrors. To avoid O(n*n) cases, when partitioning depth exceeds
a multiple of log(n), change to HeapSort. (cdouglas)
HADOOP-3477. Fix build to not package contrib/*/bin twice in
distributions. (Adam Heath via cutting)
HADOOP-3475. Fix MapTask to correctly size the accounting allocation of
io.sort.mb. (cdouglas)
HADOOP-3550. Fix the serialization data structures in MapTask where the
value lengths are incorrectly calculated. (cdouglas)
HADOOP-3526. Fix contrib/data_join framework by cloning values retained
in the reduce. (Spyros Blanas via cdouglas)
HADOOP-1979. Speed up fsck by adding a buffered stream. (Lohit
Vijaya Renu via omalley)
Release 0.17.0 - 2008-05-18
INCOMPATIBLE CHANGES
HADOOP-2786. Move hbase out of hadoop core
HADOOP-2345. New HDFS transactions to support appending
to files. Disk layout version changed from -11 to -12. (dhruba)
HADOOP-2192. Error messages from "dfs mv" command improved.
(Mahadev Konar via dhruba)
HADOOP-1902. "dfs du" command without any arguments operates on the
current working directory. (Mahadev Konar via dhruba)
HADOOP-2873. Fixed bad disk format introduced by HADOOP-2345.
Disk layout version changed from -12 to -13. See changelist 630992
(dhruba)
HADOOP-1985. This addresses rack-awareness for Map tasks and for
HDFS in a uniform way. (ddas)
HADOOP-1986. Add support for a general serialization mechanism for
Map Reduce. (tomwhite)
HADOOP-771. FileSystem.delete() takes an explicit parameter that
specifies whether a recursive delete is intended.
(Mahadev Konar via dhruba)
HADOOP-2470. Remove getContentLength(String), open(String, long, long)
and isDir(String) from ClientProtocol. ClientProtocol version changed
from 26 to 27. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2822. Remove deprecated code for classes InputFormatBase and
PhasedFileSystem. (Amareshwari Sriramadasu via enis)
HADOOP-2116. Changes the layout of the task execution directory.
(Amareshwari Sriramadasu via ddas)
HADOOP-2828. The following deprecated methods in Configuration.java
have been removed
getObject(String name)
setObject(String name, Object value)
get(String name, Object defaultValue)
set(String name, Object value)
Iterator entries()
(Amareshwari Sriramadasu via ddas)
HADOOP-2824. Removes one deprecated constructor from MiniMRCluster.
(Amareshwari Sriramadasu via ddas)
HADOOP-2823. Removes deprecated methods getColumn(), getLine() from
org.apache.hadoop.record.compiler.generated.SimpleCharStream.
(Amareshwari Sriramadasu via ddas)
HADOOP-3060. Removes one unused constructor argument from MiniMRCluster.
(Amareshwari Sriramadasu via ddas)
HADOOP-2854. Remove deprecated o.a.h.ipc.Server::getUserInfo().
(lohit vijayarenu via cdouglas)
HADOOP-2563. Remove deprecated FileSystem::listPaths.
(lohit vijayarenu via cdouglas)
HADOOP-2818. Remove deprecated methods in Counters.
(Amareshwari Sriramadasu via tomwhite)
HADOOP-2831. Remove deprecated o.a.h.dfs.INode::getAbsoluteName()
(lohit vijayarenu via cdouglas)
HADOOP-2839. Remove deprecated FileSystem::globPaths.
(lohit vijayarenu via cdouglas)
HADOOP-2634. Deprecate ClientProtocol::exists.
(lohit vijayarenu via cdouglas)
HADOOP-2410. Make EC2 cluster nodes more independent of each other.
Multiple concurrent EC2 clusters are now supported, and nodes may be
added to a cluster on the fly with new nodes starting in the same EC2
availability zone as the cluster. Ganglia monitoring and large
instance sizes have also been added. (Chris K Wensel via tomwhite)
HADOOP-2826. Deprecated FileSplit.getFile(), LineRecordReader.readLine().
(Amareshwari Sriramadasu via ddas)
HADOOP-3239. getFileInfo() returns null for non-existing files instead
of throwing FileNotFoundException. (Lohit Vijayarenu via shv)
HADOOP-3266. Removed HOD changes from CHANGES.txt, as they are now inside
src/contrib/hod (Hemanth Yamijala via ddas)
HADOOP-3280. Separate the configuration of the virtual memory size
(mapred.child.ulimit) from the jvm heap size, so that 64 bit
streaming applications are supported even when running with 32 bit
jvms. (acmurthy via omalley)
NEW FEATURES
HADOOP-1398. Add HBase in-memory block cache. (tomwhite)
HADOOP-2178. Job History on DFS. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2063. A new parameter to dfs -get command to fetch a file
even if it is corrupted. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2219. A new command "df -count" that counts the number of
files and directories. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2906. Add an OutputFormat capable of using keys, values, and
config params to map records to different output files.
(Runping Qi via cdouglas)
HADOOP-2346. Utilities to support timeout while writing to sockets.
DFSClient and DataNode sockets have 10min write timeout. (rangadi)
HADOOP-2951. Add a contrib module that provides a utility to
build or update Lucene indexes using Map/Reduce. (Ning Li via cutting)
HADOOP-1622. Allow multiple jar files for map reduce.
(Mahadev Konar via dhruba)
HADOOP-2055. Allows users to set PathFilter on the FileInputFormat.
(Alejandro Abdelnur via ddas)
HADOOP-2551. More environment variables like HADOOP_NAMENODE_OPTS
for better control of HADOOP_OPTS for each component. (rangadi)
HADOOP-3001. Add job counters that measure the number of bytes
read and written to HDFS, S3, KFS, and local file systems. (omalley)
HADOOP-3048. A new Interface and a default implementation to convert
and restore serializations of objects to/from strings. (enis)
IMPROVEMENTS
HADOOP-2655. Copy on write for data and metadata files in the
presence of snapshots. Needed for supporting appends to HDFS
files. (dhruba)
HADOOP-1967. When a Path specifies the same scheme as the default
FileSystem but no authority, the default FileSystem's authority is
used. Also add warnings for old-format FileSystem names, accessor
methods for fs.default.name, and check for null authority in HDFS.
(cutting)
HADOOP-2895. Let the profiling string be configurable.
(Martin Traverso via cdouglas)
HADOOP-910. Enables Reduces to do merges for the on-disk map output files
in parallel with their copying. (Amar Kamat via ddas)
HADOOP-730. Use rename rather than copy for local renames. (cdouglas)
HADOOP-2810. Updated the Hadoop Core logo. (nigel)
HADOOP-2057. Streaming should optionally treat a non-zero exit status
of a child process as a failed task. (Rick Cox via tomwhite)
HADOOP-2765. Enables specifying ulimits for streaming/pipes tasks (ddas)
HADOOP-2888. Make gridmix scripts more readily configurable and amenable
to automated execution. (Mukund Madhugiri via cdouglas)
HADOOP-2908. A document that describes the DFS Shell command.
(Mahadev Konar via dhruba)
HADOOP-2981. Update README.txt to reflect the upcoming use of
cryptography. (omalley)
HADOOP-2804. Add support to publish CHANGES.txt as HTML when running
the Ant 'docs' target. (nigel)
HADOOP-2559. Change DFS block placement to allocate the first replica
locally, the second off-rack, and the third intra-rack from the
second. (lohit vijayarenu via cdouglas)
HADOOP-2939. Make the automated patch testing process an executable
Ant target, test-patch. (nigel)
HADOOP-2239. Add HsftpFileSystem to permit transferring files over ssl.
(cdouglas)
HADOOP-2886. Track individual RPC metrics.
(girish vaitheeswaran via dhruba)
HADOOP-2373. Improvement in safe-mode reporting. (shv)
HADOOP-3091. Modify FsShell command -put to accept multiple sources.
(Lohit Vijaya Renu via cdouglas)
HADOOP-3092. Show counter values from job -status command.
(Tom White via ddas)
HADOOP-1228. Ant task to generate Eclipse project files. (tomwhite)
HADOOP-3093. Adds Configuration.getStrings(name, default-value) and
the corresponding setStrings. (Amareshwari Sriramadasu via ddas)
HADOOP-3106. Adds documentation in forrest for debugging.
(Amareshwari Sriramadasu via ddas)
HADOOP-3099. Add an option to distcp to preserve user, group, and
permission information. (Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2841. Unwrap AccessControlException and FileNotFoundException
from RemoteException for DFSClient. (shv)
HADOOP-3152. Make index interval configuable when using
MapFileOutputFormat for map-reduce job. (Rong-En Fan via cutting)
HADOOP-3143. Decrease number of slaves from 4 to 3 in TestMiniMRDFSSort,
as Hudson generates false negatives under the current load.
(Nigel Daley via cdouglas)
HADOOP-3174. Illustrative example for MultipleFileInputFormat. (Enis
Soztutar via acmurthy)
HADOOP-2993. Clarify the usage of JAVA_HOME in the Quick Start guide.
(acmurthy via nigel)
HADOOP-3124. Make DataNode socket write timeout configurable. (rangadi)
OPTIMIZATIONS
HADOOP-2790. Fixed inefficient method hasSpeculativeTask by removing
repetitive calls to get the current time and late checking to see if
we want speculation on at all. (omalley)
HADOOP-2758. Reduce buffer copies in DataNode when data is read from
HDFS, without negatively affecting read throughput. (rangadi)
HADOOP-2399. Input key and value to combiner and reducer is reused.
(Owen O'Malley via ddas).
HADOOP-2423. Code optimization in FSNamesystem.mkdirs.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2606. ReplicationMonitor selects data-nodes to replicate directly
from needed replication blocks instead of looking up for the blocks for
each live data-node. (shv)
HADOOP-2148. Eliminate redundant data-node blockMap lookups. (shv)
HADOOP-2027. Return the number of bytes in each block in a file
via a single rpc to the namenode to speed up job planning.
(Lohit Vijaya Renu via omalley)
HADOOP-2902. Replace uses of "fs.default.name" with calls to the
accessor methods added in HADOOP-1967. (cutting)
HADOOP-2119. Optimize scheduling of jobs with large numbers of
tasks by replacing static arrays with lists of runnable tasks.
(Amar Kamat via omalley)
HADOOP-2919. Reduce the number of memory copies done during the
map output sorting. Also adds two config variables:
io.sort.spill.percent - the percentages of io.sort.mb that should
cause a spill (default 80%)
io.sort.record.percent - the percent of io.sort.mb that should
hold key/value indexes (default 5%)
(cdouglas via omalley)
HADOOP-3140. Doesn't add a task in the commit queue if the task hadn't
generated any output. (Amar Kamat via ddas)
HADOOP-3168. Reduce the amount of logging in streaming to an
exponentially increasing number of records (up to 10,000
records/log). (Zheng Shao via omalley)
BUG FIXES
HADOOP-2195. '-mkdir' behaviour is now closer to Linux shell in case of
errors. (Mahadev Konar via rangadi)
HADOOP-2190. bring behaviour '-ls' and '-du' closer to Linux shell
commands in case of errors. (Mahadev Konar via rangadi)
HADOOP-2193. 'fs -rm' and 'fs -rmr' show error message when the target
file does not exist. (Mahadev Konar via rangadi)
HADOOP-2738 Text is not subclassable because set(Text) and compareTo(Object)
access the other instance's private members directly. (jimk)
HADOOP-2779. Remove the references to HBase in the build.xml. (omalley)
HADOOP-2194. dfs cat on a non-existent file throws FileNotFoundException.
(Mahadev Konar via dhruba)
HADOOP-2767. Fix for NetworkTopology erroneously skipping the last leaf
node on a rack. (Hairong Kuang and Mark Butler via dhruba)
HADOOP-1593. FsShell works with paths in non-default FileSystem.
(Mahadev Konar via dhruba)
HADOOP-2191. du and dus command on non-existent directory gives
appropriate error message. (Mahadev Konar via dhruba)
HADOOP-2832. Remove tabs from code of DFSClient for better
indentation. (dhruba)
HADOOP-2844. distcp closes file handles for sequence files.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2727. Fix links in Web UI of the hadoop daemons and some docs
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2871. Fixes a problem to do with file: URI in the JobHistory init.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2800. Deprecate SetFile.Writer constructor not the whole class.
(Johan Oskarsson via tomwhite)
HADOOP-2891. DFSClient.close() closes all open files. (dhruba)
HADOOP-2845. Fix dfsadmin disk utilization report on Solaris.
(Martin Traverso via tomwhite)
HADOOP-2912. MiniDFSCluster restart should wait for namenode to exit
safemode. This was causing TestFsck to fail. (Mahadev Konar via dhruba)
HADOOP-2820. The following classes in streaming are removed :
StreamLineRecordReader StreamOutputFormat StreamSequenceRecordReader.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2819. The following methods in JobConf are removed:
getInputKeyClass() setInputKeyClass getInputValueClass()
setInputValueClass(Class theClass) setSpeculativeExecution
getSpeculativeExecution() (Amareshwari Sri Ramadasu via ddas)
HADOOP-2817. Removes deprecated mapred.tasktracker.tasks.maximum and
ClusterStatus.getMaxTasks(). (Amareshwari Sri Ramadasu via ddas)
HADOOP-2821. Removes deprecated ShellUtil and ToolBase classes from
the util package. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2934. The namenode was encountreing a NPE while loading
leases from the fsimage. Fixed. (dhruba)
HADOOP-2938. Some fs commands did not glob paths.
(Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-2943. Compression of intermediate map output causes failures
in the merge. (cdouglas)
HADOOP-2870. DataNode and NameNode closes all connections while
shutting down. (Hairong Kuang via dhruba)
HADOOP-2973. Fix TestLocalDFS for Windows platform.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2971. select multiple times if it returns early in
SocketIOWithTimeout. (rangadi)
HADOOP-2955. Fix TestCrcCorruption test failures caused by HADOOP-2758
(rangadi)
HADOOP-2657. A flush call on the DFSOutputStream flushes the last
partial CRC chunk too. (dhruba)
HADOOP-2974. IPC unit tests used "0.0.0.0" to connect to server, which
is not always supported. (rangadi)
HADOOP-2996. Fixes uses of StringBuffer in StreamUtils class.
(Dave Brosius via ddas)
HADOOP-2995. Fixes StreamBaseRecordReader's getProgress to return a
floating point number. (Dave Brosius via ddas)
HADOOP-2972. Fix for a NPE in FSDataset.invalidate.
(Mahadev Konar via dhruba)
HADOOP-2994. Code cleanup for DFSClient: remove redundant
conversions from string to string. (Dave Brosius via dhruba)
HADOOP-3009. TestFileCreation sometimes fails because restarting
minidfscluster sometimes creates datanodes with ports that are
different from their original instance. (dhruba)
HADOOP-2992. Distributed Upgrade framework works correctly with
more than one upgrade object. (Konstantin Shvachko via dhruba)
HADOOP-2679. Fix a typo in libhdfs. (Jason via dhruba)
HADOOP-2976. When a lease expires, the Namenode ensures that
blocks of the file are adequately replicated. (dhruba)
HADOOP-2901. Fixes the creation of info servers in the JobClient
and JobTracker. Removes the creation from JobClient and removes
additional info server from the JobTracker. Also adds the command
line utility to view the history files (HADOOP-2896), and fixes
bugs in JSPs to do with analysis - HADOOP-2742, HADOOP-2792.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2890. If different datanodes report the same block but
with different sizes to the namenode, the namenode picks the
replica(s) with the largest size as the only valid replica(s). (dhruba)
HADOOP-2825. Deprecated MapOutputLocation.getFile() is removed.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2806. Fixes a streaming document.
(Amareshwari Sriramadasu via ddas)
HADOOP-3008. SocketIOWithTimeout throws InterruptedIOException if the
thread is interrupted while it is waiting. (rangadi)
HADOOP-3006. Fix wrong packet size reported by DataNode when a block
is being replicated. (rangadi)
HADOOP-3029. Datanode prints log message "firstbadlink" only if
it detects a bad connection to another datanode in the pipeline. (dhruba)
HADOOP-3030. Release reserved space for file in InMemoryFileSystem if
checksum reservation fails. (Devaraj Das via cdouglas)
HADOOP-3036. Fix findbugs warnings in UpgradeUtilities. (Konstantin
Shvachko via cdouglas)
HADOOP-3025. ChecksumFileSystem supports the delete method with
the recursive flag. (Mahadev Konar via dhruba)
HADOOP-3012. dfs -mv file to user home directory throws exception if
the user home directory does not exist. (Mahadev Konar via dhruba)
HADOOP-3066. Should not require superuser privilege to query if hdfs is in
safe mode (jimk)
HADOOP-3040. If the input line starts with the separator char, the key
is set as empty. (Amareshwari Sriramadasu via ddas)
HADOOP-3080. Removes flush calls from JobHistory.
(Amareshwari Sriramadasu via ddas)
HADOOP-3086. Adds the testcase missed during commit of hadoop-3040.
(Amareshwari Sriramadasu via ddas)
HADOOP-3046. Fix the raw comparators for Text and BytesWritables
to use the provided length rather than recompute it. (omalley)
HADOOP-3094. Fix BytesWritable.toString to avoid extending the sign bit
(Owen O'Malley via cdouglas)
HADOOP-3067. DFSInputStream's position read does not close the sockets.
(rangadi)
HADOOP-3073. close() on SocketInputStream or SocketOutputStream should
close the underlying channel. (rangadi)
HADOOP-3087. Fixes a problem to do with refreshing of loadHistory.jsp.
(Amareshwari Sriramadasu via ddas)
HADOOP-3065. Better logging message if the rack location of a datanode
cannot be determined. (Devaraj Das via dhruba)
HADOOP-3064. Commas in a file path should not be treated as delimiters.
(Hairong Kuang via shv)
HADOOP-2997. Adds test for non-writable serialier. Also fixes a problem
introduced by HADOOP-2399. (Tom White via ddas)
HADOOP-3114. Fix TestDFSShell on Windows. (Lohit Vijaya Renu via cdouglas)
HADOOP-3118. Fix Namenode NPE while loading fsimage after a cluster
upgrade from older disk format. (dhruba)
HADOOP-3161. Fix FIleUtil.HardLink.getLinkCount on Mac OS. (nigel
via omalley)
HADOOP-2927. Fix TestDU to acurately calculate the expected file size.
(shv via nigel)
HADOOP-3123. Fix the native library build scripts to work on Solaris.
(tomwhite via omalley)
HADOOP-3089. Streaming should accept stderr from task before
first key arrives. (Rick Cox via tomwhite)
HADOOP-3146. A DFSOutputStream.flush method is renamed as
DFSOutputStream.fsync. (dhruba)
HADOOP-3165. -put/-copyFromLocal did not treat input file "-" as stdin.
(Lohit Vijayarenu via rangadi)
HADOOP-3041. Deprecate JobConf.setOutputPath and JobConf.getOutputPath.
Deprecate OutputFormatBase. Add FileOutputFormat. Existing output formats
extending OutputFormatBase, now extend FileOutputFormat. Add the following
APIs in FileOutputFormat: setOutputPath, getOutputPath, getWorkOutputPath.
(Amareshwari Sriramadasu via nigel)
HADOOP-3083. The fsimage does not store leases. This would have to be
reworked in the next release to support appends. (dhruba)
HADOOP-3166. Fix an ArrayIndexOutOfBoundsException in the spill thread
and make exception handling more promiscuous to catch this condition.
(cdouglas)
HADOOP-3050. DataNode sends one and only one block report after
it registers with the namenode. (Hairong Kuang)
HADOOP-3044. NNBench sets the right configuration for the mapper.
(Hairong Kuang)
HADOOP-3178. Fix GridMix scripts for small and medium jobs
to handle input paths differently. (Mukund Madhugiri via nigel)
HADOOP-1911. Fix an infinite loop in DFSClient when all replicas of a
block are bad (cdouglas)
HADOOP-3157. Fix path handling in DistributedCache and TestMiniMRLocalFS.
(Doug Cutting via rangadi)
HADOOP-3018. Fix the eclipse plug-in contrib wrt removed deprecated
methods (taton)
HADOOP-3183. Fix TestJobShell to use 'ls' instead of java.io.File::exists
since cygwin symlinks are unsupported.
(Mahadev konar via cdouglas)
HADOOP-3175. Fix FsShell.CommandFormat to handle "-" in arguments.
(Edward J. Yoon via rangadi)
HADOOP-3220. Safemode message corrected. (shv)
HADOOP-3208. Fix WritableDeserializer to set the Configuration on
deserialized Writables. (Enis Soztutar via cdouglas)
HADOOP-3224. 'dfs -du /dir' does not return correct size.
(Lohit Vjayarenu via rangadi)
HADOOP-3223. Fix typo in help message for -chmod. (rangadi)
HADOOP-1373. checkPath() should ignore case when it compares authoriy.
(Edward J. Yoon via rangadi)
HADOOP-3204. Fixes a problem to do with ReduceTask's LocalFSMerger not
catching Throwable. (Amar Ramesh Kamat via ddas)
HADOOP-3229. Report progress when collecting records from the mapper and
the combiner. (Doug Cutting via cdouglas)
HADOOP-3225. Unwrapping methods of RemoteException should initialize
detailedMassage field. (Mahadev Konar, shv, cdouglas)
HADOOP-3247. Fix gridmix scripts to use the correct globbing syntax and
change maxentToSameCluster to run the correct number of jobs.
(Runping Qi via cdouglas)
HADOOP-3242. Fix the RecordReader of SequenceFileAsBinaryInputFormat to
correctly read from the start of the split and not the beginning of the
file. (cdouglas via acmurthy)
HADOOP-3256. Encodes the job name used in the filename for history files.
(Arun Murthy via ddas)
HADOOP-3162. Ensure that comma-separated input paths are treated correctly
as multiple input paths. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-3263. Ensure that the job-history log file always follows the
pattern of hostname_timestamp_jobid_username_jobname even if username
and/or jobname are not specfied. This helps to avoid wrong assumptions
made about the job-history log filename in jobhistory.jsp. (acmurthy)
HADOOP-3251. Fixes getFilesystemName in JobTracker and LocalJobRunner to
use FileSystem.getUri instead of FileSystem.getName. (Arun Murthy via ddas)
HADOOP-3237. Fixes TestDFSShell.testErrOutPut on Windows platform.
(Mahadev Konar via ddas)
HADOOP-3279. TaskTracker checks for SUCCEEDED task status in addition to
COMMIT_PENDING status when it fails maps due to lost map.
(Devaraj Das)
HADOOP-3286. Prevent collisions in gridmix output dirs by increasing the
granularity of the timestamp. (Runping Qi via cdouglas)
HADOOP-3285. Fix input split locality when the splits align to
fs blocks. (omalley)
HADOOP-3372. Fix heap management in streaming tests. (Arun Murthy via
cdouglas)
HADOOP-3031. Fix javac warnings in test classes. (cdouglas)
HADOOP-3382. Fix memory leak when files are not cleanly closed (rangadi)
HADOOP-3322. Fix to push MetricsRecord for rpc metrics. (Eric Yang via
mukund)
Release 0.16.4 - 2008-05-05
BUG FIXES
HADOOP-3138. DFS mkdirs() should not throw an exception if the directory
already exists. (rangadi via mukund)
HADOOP-3294. Fix distcp to check the destination length and retry the copy
if it doesn't match the src length. (Tsz Wo (Nicholas), SZE via mukund)
HADOOP-3186. Fix incorrect permission checkding for mv and renameTo
in HDFS. (Tsz Wo (Nicholas), SZE via mukund)
Release 0.16.3 - 2008-04-16
BUG FIXES
HADOOP-3010. Fix ConcurrentModificationException in ipc.Server.Responder.
(rangadi)
HADOOP-3154. Catch all Throwables from the SpillThread in MapTask, rather
than IOExceptions only. (ddas via cdouglas)
HADOOP-3159. Avoid file system cache being overwritten whenever
configuration is modified. (Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3139. Remove the consistency check for the FileSystem cache in
closeAll() that causes spurious warnings and a deadlock.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3195. Fix TestFileSystem to be deterministic.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3069. Primary name-node should not truncate image when transferring
it from the secondary. (shv)
HADOOP-3182. Change permissions of the job-submission directory to 777
from 733 to ensure sharing of HOD clusters works correctly. (Tsz Wo
(Nicholas), Sze and Amareshwari Sri Ramadasu via acmurthy)
Release 0.16.2 - 2008-04-02
BUG FIXES
HADOOP-3011. Prohibit distcp from overwriting directories on the
destination filesystem with files. (cdouglas)
HADOOP-3033. The BlockReceiver thread in the datanode writes data to
the block file, changes file position (if needed) and flushes all by
itself. The PacketResponder thread does not flush block file. (dhruba)
HADOOP-2978. Fixes the JobHistory log format for counters.
(Runping Qi via ddas)
HADOOP-2985. Fixes LocalJobRunner to tolerate null job output path.
Also makes the _temporary a constant in MRConstants.java.
(Amareshwari Sriramadasu via ddas)
HADOOP-3003. FileSystem cache key is updated after a
FileSystem object is created. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3042. Updates the Javadoc in JobConf.getOutputPath to reflect
the actual temporary path. (Amareshwari Sriramadasu via ddas)
HADOOP-3007. Tolerate mirror failures while DataNode is replicating
blocks as it used to before. (rangadi)
HADOOP-2944. Fixes a "Run on Hadoop" wizard NPE when creating a
Location from the wizard. (taton)
HADOOP-3049. Fixes a problem in MultiThreadedMapRunner to do with
catching RuntimeExceptions. (Alejandro Abdelnur via ddas)
HADOOP-3039. Fixes a problem to do with exceptions in tasks not
killing jobs. (Amareshwari Sriramadasu via ddas)
HADOOP-3027. Fixes a problem to do with adding a shutdown hook in
FileSystem. (Amareshwari Sriramadasu via ddas)
HADOOP-3056. Fix distcp when the target is an empty directory by
making sure the directory is created first. (cdouglas and acmurthy
via omalley)
HADOOP-3070. Protect the trash emptier thread from null pointer
exceptions. (Koji Noguchi via omalley)
HADOOP-3084. Fix HftpFileSystem to work for zero-lenghth files.
(cdouglas)
HADOOP-3107. Fix NPE when fsck invokes getListings. (dhruba)
HADOOP-3104. Limit MultithreadedMapRunner to have a fixed length queue
between the RecordReader and the map threads. (Alejandro Abdelnur via
omalley)
HADOOP-2833. Do not use "Dr. Who" as the default user in JobClient.
A valid user name is required. (Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3128. Throw RemoteException in setPermissions and setOwner of
DistributedFileSystem. (shv via nigel)
Release 0.16.1 - 2008-03-13
INCOMPATIBLE CHANGES
HADOOP-2869. Deprecate SequenceFile.setCompressionType in favor of
SequenceFile.createWriter, SequenceFileOutputFormat.setCompressionType,
and JobConf.setMapOutputCompressionType. (Arun C Murthy via cdouglas)
Configuration changes to hadoop-default.xml:
deprecated io.seqfile.compression.type
IMPROVEMENTS
HADOOP-2371. User guide for file permissions in HDFS.
(Robert Chansler via rangadi)
HADOOP-3098. Allow more characters in user and group names while
using -chown and -chgrp commands. (rangadi)
BUG FIXES
HADOOP-2789. Race condition in IPC Server Responder that could close
connections early. (Raghu Angadi)
HADOOP-2785. minor. Fix a typo in Datanode block verification
(Raghu Angadi)
HADOOP-2788. minor. Fix help message for chgrp shell command (Raghu Angadi).
HADOOP-1188. fstime file is updated when a storage directory containing
namespace image becomes inaccessible. (shv)
HADOOP-2787. An application can set a configuration variable named
dfs.umask to set the umask that is used by DFS.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2780. The default socket buffer size for DataNodes is 128K.
(dhruba)
HADOOP-2716. Superuser privileges for the Balancer.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2754. Filter out .crc files from local file system listing.
(Hairong Kuang via shv)
HADOOP-2733. Fix compiler warnings in test code.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2725. Modify distcp to avoid leaving partially copied files at
the destination after encountering an error. (Tsz Wo (Nicholas), SZE
via cdouglas)
HADOOP-2391. Cleanup job output directory before declaring a job as
SUCCESSFUL. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2808. Minor fix to FileUtil::copy to mind the overwrite
formal. (cdouglas)
HADOOP-2683. Moving UGI out of the RPC Server.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2814. Fix for NPE in datanode in unit test TestDataTransferProtocol.
(Raghu Angadi via dhruba)
HADOOP-2811. Dump of counters in job history does not add comma between
groups. (runping via omalley)
HADOOP-2735. Enables setting TMPDIR for tasks.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2843. Fix protections on map-side join classes to enable derivation.
(cdouglas via omalley)
HADOOP-2840. Fix gridmix scripts to correctly invoke the java sort through
the proper jar. (Mukund Madhugiri via cdouglas)
HADOOP-2769. TestNNThroughputBnechmark should not use a fixed port for
the namenode http port. (omalley)
HADOOP-2852. Update gridmix benchmark to avoid an artifically long tail.
(cdouglas)
HADOOP-2894. Fix a problem to do with tasktrackers failing to connect to
JobTracker upon reinitialization. (Owen O'Malley via ddas).
HADOOP-2903. Fix exception generated by Metrics while using pushMetric().
(girish vaitheeswaran via dhruba)
HADOOP-2904. Fix to RPC metrics to log the correct host name.
(girish vaitheeswaran via dhruba)
HADOOP-2918. Improve error logging so that dfs writes failure with
"No lease on file" can be diagnosed. (dhruba)
HADOOP-2923. Add SequenceFileAsBinaryInputFormat, which was
missed in the commit for HADOOP-2603. (cdouglas via omalley)
HADOOP-2931. IOException thrown by DFSOutputStream had wrong stack
trace in some cases. (Michael Bieniosek via rangadi)
HADOOP-2883. Write failures and data corruptions on HDFS files.
The write timeout is back to what it was on 0.15 release. Also, the
datnodes flushes the block file buffered output stream before
sending a positive ack for the packet back to the client. (dhruba)
HADOOP-2756. NPE in DFSClient while closing DFSOutputStreams
under load. (rangadi)
HADOOP-2958. Fixed FileBench which broke due to HADOOP-2391 which performs
a check for existence of the output directory and a trivial bug in
GenericMRLoadGenerator where min/max word lenghts were identical since
they were looking at the same config variables (Chris Douglas via
acmurthy)
HADOOP-2915. Fixed FileSystem.CACHE so that a username is included
in the cache key. (Tsz Wo (Nicholas), SZE via nigel)
HADOOP-2813. TestDU unit test uses its own directory to run its
sequence of tests. (Mahadev Konar via dhruba)
Release 0.16.0 - 2008-02-07
INCOMPATIBLE CHANGES
HADOOP-1245. Use the mapred.tasktracker.tasks.maximum value
configured on each tasktracker when allocating tasks, instead of
the value configured on the jobtracker. InterTrackerProtocol
version changed from 5 to 6. (Michael Bieniosek via omalley)
HADOOP-1843. Removed code from Configuration and JobConf deprecated by
HADOOP-785 and a minor fix to Configuration.toString. Specifically the
important change is that mapred-default.xml is no longer supported and
Configuration no longer supports the notion of default/final resources.
(acmurthy)
HADOOP-1302. Remove deprecated abacus code from the contrib directory.
This also fixes a configuration bug in AggregateWordCount, so that the
job now works. (enis)
HADOOP-2288. Enhance FileSystem API to support access control.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2184. RPC Support for user permissions and authentication.
(Raghu Angadi via dhruba)
HADOOP-2185. RPC Server uses any available port if the specified
port is zero. Otherwise it uses the specified port. Also combines
the configuration attributes for the servers' bind address and
port from "x.x.x.x" and "y" to "x.x.x.x:y".
Deprecated configuration variables:
dfs.info.bindAddress
dfs.info.port
dfs.datanode.bindAddress
dfs.datanode.port
dfs.datanode.info.bindAdress
dfs.datanode.info.port
dfs.secondary.info.bindAddress
dfs.secondary.info.port
mapred.job.tracker.info.bindAddress
mapred.job.tracker.info.port
mapred.task.tracker.report.bindAddress
tasktracker.http.bindAddress
tasktracker.http.port
New configuration variables (post HADOOP-2404):
dfs.secondary.http.address
dfs.datanode.address
dfs.datanode.http.address
dfs.http.address
mapred.job.tracker.http.address
mapred.task.tracker.report.address
mapred.task.tracker.http.address
(Konstantin Shvachko via dhruba)
HADOOP-2401. Only the current leaseholder can abandon a block for
a HDFS file. ClientProtocol version changed from 20 to 21.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2381. Support permission information in FileStatus. Client
Protocol version changed from 21 to 22. (Raghu Angadi via dhruba)
HADOOP-2110. Block report processing creates fewer transient objects.
Datanode Protocol version changed from 10 to 11.
(Sanjay Radia via dhruba)
HADOOP-2567. Add FileSystem#getHomeDirectory(), which returns the
user's home directory in a FileSystem as a fully-qualified path.
FileSystem#getWorkingDirectory() is also changed to return a
fully-qualified path, which can break applications that attempt
to, e.g., pass LocalFileSystem#getWorkingDir().toString() directly
to java.io methods that accept file names. (cutting)
HADOOP-2514. Change trash feature to maintain a per-user trash
directory, named ".Trash" in the user's home directory. The
"fs.trash.root" parameter is no longer used. Full source paths
are also no longer reproduced within the trash.
HADOOP-2012. Periodic data verification on Datanodes.
(Raghu Angadi via dhruba)
HADOOP-1707. The DFSClient does not use a local disk file to cache
writes to a HDFS file. Changed Data Transfer Version from 7 to 8.
(dhruba)
HADOOP-2652. Fix permission issues for HftpFileSystem. This is an
incompatible change since distcp may not be able to copy files
from cluster A (compiled with this patch) to cluster B (compiled
with previous versions). (Tsz Wo (Nicholas), SZE via dhruba)
NEW FEATURES
HADOOP-1857. Ability to run a script when a task fails to capture stack
traces. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2299. Defination of a login interface. A simple implementation for
Unix users and groups. (Hairong Kuang via dhruba)
HADOOP-1652. A utility to balance data among datanodes in a HDFS cluster.
(Hairong Kuang via dhruba)
HADOOP-2085. A library to support map-side joins of consistently
partitioned and sorted data sets. (Chris Douglas via omalley)
HADOOP-2336. Shell commands to modify file permissions. (rangadi)
HADOOP-1298. Implement file permissions for HDFS.
(Tsz Wo (Nicholas) & taton via cutting)
HADOOP-2447. HDFS can be configured to limit the total number of
objects (inodes and blocks) in the file system. (dhruba)
HADOOP-2487. Added an option to get statuses for all submitted/run jobs.
This information can be used to develop tools for analysing jobs.
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-1873. Implement user permissions for Map/Reduce framework.
(Hairong Kuang via shv)
HADOOP-2532. Add to MapFile a getClosest method that returns the key
that comes just before if the key is not present. (stack via tomwhite)
HADOOP-1883. Add versioning to Record I/O. (Vivek Ratan via ddas)
HADOOP-2603. Add SeqeunceFileAsBinaryInputFormat, which reads
sequence files as BytesWritable/BytesWritable regardless of the
key and value types used to write the file. (cdouglas via omalley)
HADOOP-2367. Add ability to profile a subset of map/reduce tasks and fetch
the result to the local filesystem of the submitting application. Also
includes a general IntegerRanges extension to Configuration for setting
positive, ranged parameters. (Owen O'Malley via cdouglas)
IMPROVEMENTS
HADOOP-2045. Change committer list on website to a table, so that
folks can list their organization, timezone, etc. (cutting)
HADOOP-2058. Facilitate creating new datanodes dynamically in
MiniDFSCluster. (Hairong Kuang via dhruba)
HADOOP-1855. fsck verifies block placement policies and reports
violations. (Konstantin Shvachko via dhruba)
HADOOP-1604. An system administrator can finalize namenode upgrades
without running the cluster. (Konstantin Shvachko via dhruba)
HADOOP-1839. Link-ify the Pending/Running/Complete/Killed grid in
jobdetails.jsp to help quickly narrow down and see categorized TIPs'
details via jobtasks.jsp. (Amar Kamat via acmurthy)
HADOOP-1210. Log counters in job history. (Owen O'Malley via ddas)
HADOOP-1912. Datanode has two new commands COPY and REPLACE. These are
needed for supporting data rebalance. (Hairong Kuang via dhruba)
HADOOP-2086. This patch adds the ability to add dependencies to a job
(run via JobControl) after construction. (Adrian Woodhead via ddas)
HADOOP-1185. Support changing the logging level of a server without
restarting the server. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2134. Remove developer-centric requirements from overview.html and
keep it end-user focussed, specifically sections related to subversion and
building Hadoop. (Jim Kellerman via acmurthy)
HADOOP-1989. Support simulated DataNodes. This helps creating large virtual
clusters for testing purposes. (Sanjay Radia via dhruba)
HADOOP-1274. Support different number of mappers and reducers per
TaskTracker to allow administrators to better configure and utilize
heterogenous clusters.
Configuration changes to hadoop-default.xml:
add mapred.tasktracker.map.tasks.maximum (default value of 2)
add mapred.tasktracker.reduce.tasks.maximum (default value of 2)
remove mapred.tasktracker.tasks.maximum (deprecated for 0.16.0)
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2104. Adds a description to the ant targets. This makes the
output of "ant -projecthelp" sensible. (Chris Douglas via ddas)
HADOOP-2127. Added a pipes sort example to benchmark trivial pipes
application versus trivial java application. (omalley via acmurthy)
HADOOP-2113. A new shell command "dfs -text" to view the contents of
a gziped or SequenceFile. (Chris Douglas via dhruba)
HADOOP-2207. Add a "package" target for contrib modules that
permits each to determine what files are copied into release
builds. (stack via cutting)
HADOOP-1984. Makes the backoff for failed fetches exponential.
Earlier, it was a random backoff from an interval.
(Amar Kamat via ddas)
HADOOP-1327. Include website documentation for streaming. (Rob Weltman
via omalley)
HADOOP-2000. Rewrite NNBench to measure namenode performance accurately.
It now uses the map-reduce framework for load generation.
(Mukund Madhugiri via dhruba)
HADOOP-2248. Speeds up the framework w.r.t Counters. Also has API
updates to the Counters part. (Owen O'Malley via ddas)
HADOOP-2326. The initial block report at Datanode startup time has
a random backoff period. (Sanjay Radia via dhruba)
HADOOP-2432. HDFS includes the name of the file while throwing
"File does not exist" exception. (Jim Kellerman via dhruba)
HADOOP-2457. Added a 'forrest.home' property to the 'docs' target in
build.xml. (acmurthy)
HADOOP-2149. A new benchmark for three name-node operation: file create,
open, and block report, to evaluate the name-node performance
for optimizations or new features. (Konstantin Shvachko via shv)
HADOOP-2466. Change FileInputFormat.computeSplitSize to a protected
non-static method to allow sub-classes to provide alternate
implementations. (Alejandro Abdelnur via acmurthy)
HADOOP-2425. Change TextOutputFormat to handle Text specifically for better
performance. Make NullWritable implement Comparable. Make TextOutputFormat
treat NullWritable like null. (omalley)
HADOOP-1719. Improves the utilization of shuffle copier threads.
(Amar Kamat via ddas)
HADOOP-2390. Added documentation for user-controls for intermediate
map-outputs & final job-outputs and native-hadoop libraries. (acmurthy)
HADOOP-1660. Add the cwd of the map/reduce task to the java.library.path
of the child-jvm to support loading of native libraries distributed via
the DistributedCache. (acmurthy)
HADOOP-2285. Speeds up TextInputFormat. Also includes updates to the
Text API. (Owen O'Malley via cdouglas)
HADOOP-2233. Adds a generic load generator for modeling MR jobs. (cdouglas)
HADOOP-2369. Adds a set of scripts for simulating a mix of user map/reduce
workloads. (Runping Qi via cdouglas)
HADOOP-2547. Removes use of a 'magic number' in build.xml.
(Hrishikesh via nigel)
HADOOP-2268. Fix org.apache.hadoop.mapred.jobcontrol classes to use the
List/Map interfaces rather than concrete ArrayList/HashMap classes
internally. (Adrian Woodhead via acmurthy)
HADOOP-2406. Add a benchmark for measuring read/write performance through
the InputFormat interface, particularly with compression. (cdouglas)
HADOOP-2131. Allow finer-grained control over speculative-execution. Now
users can set it for maps and reduces independently.
Configuration changes to hadoop-default.xml:
deprecated mapred.speculative.execution
add mapred.map.tasks.speculative.execution
add mapred.reduce.tasks.speculative.execution
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-1965. Interleave sort/spill in teh map-task along with calls to the
Mapper.map method. This is done by splitting the 'io.sort.mb' buffer into
two and using one half for collecting map-outputs and the other half for
sort/spill. (Amar Kamat via acmurthy)
HADOOP-2464. Unit tests for chmod, chown, and chgrp using DFS.
(Raghu Angadi)
HADOOP-1876. Persist statuses of completed jobs in HDFS so that the
JobClient can query and get information about decommissioned jobs and also
across JobTracker restarts.
Configuration changes to hadoop-default.xml:
add mapred.job.tracker.persist.jobstatus.active (default value of false)
add mapred.job.tracker.persist.jobstatus.hours (default value of 0)
add mapred.job.tracker.persist.jobstatus.dir (default value of
/jobtracker/jobsInfo)
(Alejandro Abdelnur via acmurthy)
HADOOP-2077. Added version and build information to STARTUP_MSG for all
hadoop daemons to aid error-reporting, debugging etc. (acmurthy)
HADOOP-2398. Additional instrumentation for NameNode and RPC server.
Add support for accessing instrumentation statistics via JMX.
(Sanjay radia via dhruba)
HADOOP-2449. A return of the non-MR version of NNBench.
(Sanjay Radia via shv)
HADOOP-1989. Remove 'datanodecluster' command from bin/hadoop.
(Sanjay Radia via shv)
HADOOP-1742. Improve JavaDoc documentation for ClientProtocol, DFSClient,
and FSNamesystem. (Konstantin Shvachko)
HADOOP-2298. Add Ant target for a binary-only distribution.
(Hrishikesh via nigel)
HADOOP-2509. Add Ant target for Rat report (Apache license header
reports). (Hrishikesh via nigel)
HADOOP-2469. WritableUtils.clone should take a Configuration
instead of a JobConf. (stack via omalley)
HADOOP-2659. Introduce superuser permissions for admin operations.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2596. Added a SequenceFile.createWriter api which allows the user
to specify the blocksize, replication factor and the buffersize to be
used for the underlying HDFS file. (Alejandro Abdelnur via acmurthy)
HADOOP-2431. Test HDFS File Permissions. (Hairong Kuang via shv)
HADOOP-2232. Add an option to disable Nagle's algorithm in the IPC stack.
(Clint Morgan via cdouglas)
HADOOP-2342. Created a micro-benchmark for measuring
local-file versus hdfs reads. (Owen O'Malley via nigel)
HADOOP-2529. First version of HDFS User Guide. (Raghu Angadi)
HADOOP-2690. Add jar-test target to build.xml, separating compilation
and packaging of the test classes. (Enis Soztutar via cdouglas)
OPTIMIZATIONS
HADOOP-1898. Release the lock protecting the last time of the last stack
dump while the dump is happening. (Amareshwari Sri Ramadasu via omalley)
HADOOP-1900. Makes the heartbeat and task event queries interval
dependent on the cluster size. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2208. Counter update frequency (from TaskTracker to JobTracker) is
capped at 1 minute. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2284. Reduce the number of progress updates during the sorting in
the map task. (Amar Kamat via ddas)
BUG FIXES
HADOOP-2583. Fixes a bug in the Eclipse plug-in UI to edit locations.
Plug-in version is now synchronized with Hadoop version.
HADOOP-2100. Remove faulty check for existence of $HADOOP_PID_DIR and let
'mkdir -p' check & create it. (Michael Bieniosek via acmurthy)
HADOOP-1642. Ensure jobids generated by LocalJobRunner are unique to
avoid collissions and hence job-failures. (Doug Cutting via acmurthy)
HADOOP-2096. Close open file-descriptors held by streams while localizing
job.xml in the JobTracker and while displaying it on the webui in
jobconf.jsp. (Amar Kamat via acmurthy)
HADOOP-2098. Log start & completion of empty jobs to JobHistory, which
also ensures that we close the file-descriptor of the job's history log
opened during job-submission. (Amar Kamat via acmurthy)
HADOOP-2112. Adding back changes to build.xml lost while reverting
HADOOP-1622 i.e. http://svn.apache.org/viewvc?view=rev&revision=588771.
(acmurthy)
HADOOP-2089. Fixes the command line argument handling to handle multiple
-cacheArchive in Hadoop streaming. (Lohit Vijayarenu via ddas)
HADOOP-2071. Fix StreamXmlRecordReader to use a BufferedInputStream
wrapped over the DFSInputStream since mark/reset aren't supported by
DFSInputStream anymore. (Lohit Vijayarenu via acmurthy)
HADOOP-1348. Allow XML comments inside configuration files.
(Rajagopal Natarajan and Enis Soztutar via enis)
HADOOP-1952. Improve handling of invalid, user-specified classes while
configuring streaming jobs such as combiner, input/output formats etc.
Now invalid options are caught, logged and jobs are failed early. (Lohit
Vijayarenu via acmurthy)
HADOOP-2151. FileSystem.globPaths validates the list of Paths that
it returns. (Lohit Vijayarenu via dhruba)
HADOOP-2121. Cleanup DFSOutputStream when the stream encountered errors
when Datanodes became full. (Raghu Angadi via dhruba)
HADOOP-1130. The FileSystem.closeAll() method closes all existing
DFSClients. (Chris Douglas via dhruba)
HADOOP-2204. DFSTestUtil.waitReplication was not waiting for all replicas
to get created, thus causing unit test failure.
(Raghu Angadi via dhruba)
HADOOP-2078. An zero size file may have no blocks associated with it.
(Konstantin Shvachko via dhruba)
HADOOP-2212. ChecksumFileSystem.getSumBufferSize might throw
java.lang.ArithmeticException. The fix is to initialize bytesPerChecksum
to 0. (Michael Bieniosek via ddas)
HADOOP-2216. Fix jobtasks.jsp to ensure that it first collects the
taskids which satisfy the filtering criteria and then use that list to
print out only the required task-reports, previously it was oblivious to
the filtering and hence used the wrong index into the array of task-reports.
(Amar Kamat via acmurthy)
HADOOP-2272. Fix findbugs target to reflect changes made to the location
of the streaming jar file by HADOOP-2207. (Adrian Woodhead via nigel)
HADOOP-2244. Fixes the MapWritable.readFields to clear the instance
field variable every time readFields is called. (Michael Stack via ddas).
HADOOP-2245. Fixes LocalJobRunner to include a jobId in the mapId. Also,
adds a testcase for JobControl. (Adrian Woodhead via ddas).
HADOOP-2275. Fix erroneous detection of corrupted file when namenode
fails to allocate any datanodes for newly allocated block.
(Dhruba Borthakur via dhruba)
HADOOP-2256. Fix a buf in the namenode that could cause it to encounter
an infinite loop while deleting excess replicas that were created by
block rebalancing. (Hairong Kuang via dhruba)
HADOOP-2209. SecondaryNamenode process exits if it encounters exceptions
that it cannot handle. (Dhruba Borthakur via dhruba)
HADOOP-2314. Prevent TestBlockReplacement from occasionally getting
into an infinite loop. (Hairong Kuang via dhruba)
HADOOP-2300. This fixes a bug where mapred.tasktracker.tasks.maximum
would be ignored even if it was set in hadoop-site.xml.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2349. Improve code layout in file system transaction logging code.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2368. Fix unit tests on Windows.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2363. This fix allows running multiple instances of the unit test
in parallel. The bug was introduced in HADOOP-2185 that changed
port-rolling behaviour. (Konstantin Shvachko via dhruba)
HADOOP-2271. Fix chmod task to be non-parallel. (Adrian Woodhead via
omalley)
HADOOP-2313. Fail the build if building libhdfs fails. (nigel via omalley)
HADOOP-2359. Remove warning for interruptted exception when closing down
minidfs. (dhruba via omalley)
HADOOP-1841. Prevent slow clients from consuming threads in the NameNode.
(dhruba)
HADOOP-2323. JobTracker.close() should not print stack traces for
normal exit. (jimk via cutting)
HADOOP-2376. Prevents sort example from overriding the number of maps.
(Owen O'Malley via ddas)
HADOOP-2434. FSDatasetInterface read interface causes HDFS reads to occur
in 1 byte chunks, causing performance degradation.
(Raghu Angadi via dhruba)
HADOOP-2459. Fix package target so that src/docs/build files are not
included in the release. (nigel)
HADOOP-2215. Fix documentation in cluster_setup.html &
mapred_tutorial.html reflect that mapred.tasktracker.tasks.maximum has
been superceeded by mapred.tasktracker.{map|reduce}.tasks.maximum.
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2459. Fix package target so that src/docs/build files are not
included in the release. (nigel)
HADOOP-2352. Remove AC_CHECK_LIB for libz and liblzo to ensure that
libhadoop.so doesn't have a dependency on them. (acmurthy)
HADOOP-2453. Fix the configuration for wordcount-simple example in Hadoop
Pipes which currently produces an XML parsing error. (Amareshwari Sri
Ramadasu via acmurthy)
HADOOP-2476. Unit test failure while reading permission bits of local
file system (on Windows) fixed. (Raghu Angadi via dhruba)
HADOOP-2247. Fine-tune the strategies for killing mappers and reducers
due to failures while fetching map-outputs. Now the map-completion times
and number of currently running reduces are taken into account by the
JobTracker before killing the mappers, while the progress made by the
reducer and the number of fetch-failures vis-a-vis total number of
fetch-attempts are taken into account before teh reducer kills itself.
(Amar Kamat via acmurthy)
HADOOP-2452. Fix eclipse plug-in build.xml to refers to the right
location where hadoop-*-core.jar is generated. (taton)
HADOOP-2492. Additional debugging in the rpc server to better
diagnose ConcurrentModificationException. (dhruba)
HADOOP-2344. Enhance the utility for executing shell commands to read the
stdout/stderr streams while waiting for the command to finish (to free up
the buffers). Also, this patch throws away stderr of the DF utility.
@deprecated
org.apache.hadoop.fs.ShellCommand for org.apache.hadoop.util.Shell
org.apache.hadoop.util.ShellUtil for
org.apache.hadoop.util.Shell.ShellCommandExecutor
(Amar Kamat via acmurthy)
HADOOP-2511. Fix a javadoc warning in org.apache.hadoop.util.Shell
introduced by HADOOP-2344. (acmurthy)
HADOOP-2442. Fix TestLocalFileSystemPermission.testLocalFSsetOwner
to work on more platforms. (Raghu Angadi via nigel)
HADOOP-2488. Fix a regression in random read performance.
(Michael Stack via rangadi)
HADOOP-2523. Fix TestDFSShell.testFilePermissions on Windows.
(Raghu Angadi via nigel)
HADOOP-2535. Removed support for deprecated mapred.child.heap.size and
fixed some indentation issues in TaskRunner. (acmurthy)
Configuration changes to hadoop-default.xml:
remove mapred.child.heap.size
HADOOP-2512. Fix error stream handling in Shell. Use exit code to
detect shell command errors in RawLocalFileSystem. (Raghu Angadi)
HADOOP-2446. Fixes TestHDFSServerPorts and TestMRServerPorts so they
do not rely on statically configured ports and cleanup better. (nigel)
HADOOP-2537. Make build process compatible with Ant 1.7.0.
(Hrishikesh via nigel)
HADOOP-1281. Ensure running tasks of completed map TIPs (e.g. speculative
tasks) are killed as soon as the TIP completed. (acmurthy)
HADOOP-2571. Suppress a suprious warning in test code. (cdouglas)
HADOOP-2481. NNBench report its progress periodically.
(Hairong Kuang via dhruba)