| <?xml version="1.0" encoding="UTF-8"?> |
| <document xmlns="http://maven.apache.org/XDOC/2.0" |
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" |
| xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd"> |
| <head> |
| <title>DistCp</title> |
| </head> |
| <body> |
| <section name="Overview"> |
| <p> |
| DistCp (distributed copy) is a tool used for large inter/intra-cluster |
| copying. It uses Map/Reduce to effect its distribution, error |
| handling and recovery, and reporting. It expands a list of files and |
| directories into input to map tasks, each of which will copy a partition |
| of the files specified in the source list. |
| </p> |
| <p> |
| The erstwhile implementation of DistCp has its share of quirks and |
| drawbacks, both in its usage, as well as its extensibility and |
| performance. The purpose of the DistCp refactor was to fix these shortcomings, |
| enabling it to be used and extended programmatically. New paradigms have |
| been introduced to improve runtime and setup performance, while simultaneously |
| retaining the legacy behaviour as default. |
| </p> |
| <p> |
| This document aims to describe the design of the new DistCp, its spanking |
| new features, their optimal use, and any deviance from the legacy |
| implementation. |
| </p> |
| </section> |
| </body> |
| </document> |