| <!--- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. See accompanying LICENSE file. |
| --> |
| |
| Apache Hadoop ${project.version} |
| ================================ |
| |
| Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch. |
| |
| Overview of Changes |
| =================== |
| |
| Users are encouraged to read the full set of release notes. |
| This page provides an overview of the major changes. |
| |
| Azure ABFS: Critical Stream Prefetch Fix |
| --------------------------------------------- |
| |
| The abfs has a critical bug fix |
| [HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546). |
| *ABFS. Disable purging list of in-progress reads in abfs stream close().* |
| |
| All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade |
| or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0` |
| |
| Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521) |
| *ABFS ReadBufferManager buffer sharing across concurrent HTTP requests* |
| for root cause analysis, details on what is affected, and mitigations. |
| |
| |
| Vectored IO API |
| --------------- |
| |
| [HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103). |
| *High performance vectored read API in Hadoop* |
| |
| The `PositionedReadable` interface has now added an operation for |
| Vectored IO (also known as Scatter/Gather IO): |
| |
| ```java |
| void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate) |
| ``` |
| |
| All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously, |
| possibly in parallel, with results potentially coming in out-of-order. |
| |
| 1. The default implementation uses a series of `readFully()` calls, so delivers |
| equivalent performance. |
| 2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`. |
| 3. The S3A filesystem issues parallel HTTP GET requests in different threads. |
| |
| Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://` |
| show significant improvements in query performance. |
| |
| Further Reading: [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html). |
| |
| Mapreduce: Manifest Committer for Azure ABFS and google GCS |
| ---------------------------------------------------------- |
| |
| The new _Intermediate Manifest Committer_ uses a manifest file |
| to commit the work of successful task attempts, rather than |
| renaming directories. |
| Job commit is matter of reading all the manifests, creating the |
| destination directories (parallelized) and renaming the files, |
| again in parallel. |
| |
| This is both fast and correct on Azure Storage and Google GCS, |
| and should be used there instead of the classic v1/v2 file |
| output committers. |
| |
| It is also safe to use on HDFS, where it should be faster |
| than the v1 committer. It is however optimized for |
| cloud storage where list and rename operations are significantly |
| slower; the benefits may be less. |
| |
| More details are available in the |
| [manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html). |
| documentation. |
| |
| |
| HDFS: Router Based Federation |
| ----------------------------- |
| |
| A lot of effort has been invested into stabilizing/improving the HDFS Router Based Federation feature. |
| |
| 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router Based Federation. |
| 2. HDFS-13248: RBF supports Client Locality |
| |
| HDFS: Dynamic Datanode Reconfiguration |
| -------------------------------------- |
| |
| HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457. |
| |
| A number of Datanode configuration options can be changed without having to restart |
| the datanode. This makes it possible to tune deployment configurations without |
| cluster-wide Datanode Restarts. |
| |
| See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361) |
| for the list of dynamically reconfigurable attributes. |
| |
| |
| Transitive CVE fixes |
| -------------------- |
| |
| A lot of dependencies have been upgraded to address recent CVEs. |
| Many of the CVEs were not actually exploitable through the Hadoop |
| so much of this work is just due diligence. |
| However applications which have all the library is on a class path may |
| be vulnerable, and the ugprades should also reduce the number of false |
| positives security scanners report. |
| |
| We have not been able to upgrade every single dependency to the latest |
| version there is. Some of those changes are just going to be incompatible. |
| If you have concerns about the state of a specific library, consult the pache JIRA |
| issue tracker to see whether a JIRA has been filed, discussions have taken place about |
| the library in question, and whether or not there is already a fix in the pipeline. |
| *Please don't file new JIRAs about dependency-X.Y.Z having a CVE without |
| searching for any existing issue first* |
| |
| As an open source project, contributions in this area are always welcome, |
| especially in testing the active branches, testing applications downstream of |
| those branches and of whether updated dependencies trigger regressions. |
| |
| Getting Started |
| =============== |
| |
| The Hadoop documentation includes the information you need to get started using |
| Hadoop. Begin with the |
| [Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html) |
| which shows you how to set up a single-node Hadoop installation. |
| Then move on to the |
| [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html) |
| to learn how to set up a multi-node Hadoop installation. |
| |