hadoop-project/src/site/markdown/index.md.vm - hadoop - Git at Google

 <!---
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->

 Apache Hadoop ${project.version}
 ================================

 Apache Hadoop ${project.version} incorporates a number of significant
 enhancements over the previous major release line (hadoop-3.2).

 Overview
 ========

 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.

 ABFS: fix for Sever Name Indication (SNI)
 ----------
 ABFS: Bug fix to support Server Name Indication (SNI).


 Setting permissions on name directory fails on non posix compliant filesystems
 ----------
 Fixed namenode/journal startup on Windows.


 Backport HDFS persistent memory read cache support to branch-3.2**
 ----------
 Non-volatile storage class memory (SCM, also known as persistent memory) is
 supported in HDFS cache. To enable SCM cache, user just needs to configure SCM
 volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all
 HDFS cache directives keep unchanged. There are two implementations for HDFS
 SCM Cache, one is pure java code implementation and the other is native PMDK
 based implementation. The latter implementation can bring user better
 performance gain in cache write and cache read. If PMDK native libs could be
 loaded, it will use PMDK based implementation otherwise it will fallback to
 java code implementation. To enable PMDK based implementation, user should
 install PMDK library by referring to the official site http://pmem.io/. Then,
 build Hadoop with PMDK support by referring to "PMDK library build options"
 section in \`BUILDING.txt\` in the source code. If multiple SCM volumes are
 configured, a round-robin policy is used to select an available volume for
 caching a block. Consistent with DRAM cache, SCM cache also has no cache
 eviction mechanism. When DataNode receives a data read request from a client,
 if the corresponding block is cached into SCM, DataNode will instantiate an
 InputStream with the block location path on SCM (pure java implementation) or
 cache address on SCM (PMDK based implementation). Once the InputStream is
 created, DataNode will send the cached data to the client. Please refer
 "Centralized Cache Management" guide for more details.


 Consistent Reads from Standby Node
 ----------
 Observer is a new type of a NameNode in addition to Active and Standby Nodes in
 HA settings. An Observer Node maintains a replica of the namespace same as a
 Standby Node.  It additionally allows execution of clients read requests. To
 ensure read-after-write consistency within a single client, a state ID is
 introduced in RPC headers. The Observer responds to the client request only
 after its own state has caught up with the client’s state ID, which it
 previously received from the Active NameNode.

 Clients can explicitly invoke a new client protocol call msync(), which ensures
 that subsequent reads by this client from an Observer are consistent.

 A new client-side ObserverReadProxyProvider is introduced to provide automatic
 switching between Active and Observer NameNodes for submitting respectively
 write and read requests.


 Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0
 ----------
 Updated checkstyle to 8.26 and updated maven-checkstyle-plugin to 3.1.0.


 ZKFC ignores dfs.namenode.rpc-bind-host and uses dfs.namenode.rpc-address to bind to host address
 ----------
 ZKFC binds host address to "dfs.namenode.servicerpc-bind-host", if configured.
 Otherwise, it binds to "dfs.namenode.rpc-bind-host". If neither of those is
 configured, ZKFC binds itself to NameNode RPC server address (effectively
 "dfs.namenode.rpc-address").


 ListStatus on ViewFS root (ls "/") should list the linkFallBack root (configured target root).
 ----------
 ViewFS#listStatus on root("/") considers listing from fallbackLink if available.
 If the same directory name is present in configured mount path as well as in
 fallback link, then only the configured mount path will be listed in the
 returned result.


 NMs should supply a health status when registering with RM
 ----------
 Improved node registration with node health status.


 Getting Started
 ===============

 The Hadoop documentation includes the information you need to get started using
 Hadoop. Begin with the
 [Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html)
 which shows you how to set up a single-node Hadoop installation.
 Then move on to the
 [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
 to learn how to set up a multi-node Hadoop installation.
	<!---
	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License. See accompanying LICENSE file.
	-->

	Apache Hadoop ${project.version}
	================================

	Apache Hadoop ${project.version} incorporates a number of significant
	enhancements over the previous major release line (hadoop-3.2).

	Overview
	========

	Users are encouraged to read the full set of release notes.
	This page provides an overview of the major changes.

	ABFS: fix for Sever Name Indication (SNI)
	----------
	ABFS: Bug fix to support Server Name Indication (SNI).


	Setting permissions on name directory fails on non posix compliant filesystems
	----------
	Fixed namenode/journal startup on Windows.


	Backport HDFS persistent memory read cache support to branch-3.2**
	----------
	Non-volatile storage class memory (SCM, also known as persistent memory) is
	supported in HDFS cache. To enable SCM cache, user just needs to configure SCM
	volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all
	HDFS cache directives keep unchanged. There are two implementations for HDFS
	SCM Cache, one is pure java code implementation and the other is native PMDK
	based implementation. The latter implementation can bring user better
	performance gain in cache write and cache read. If PMDK native libs could be
	loaded, it will use PMDK based implementation otherwise it will fallback to
	java code implementation. To enable PMDK based implementation, user should
	install PMDK library by referring to the official site http://pmem.io/. Then,
	build Hadoop with PMDK support by referring to "PMDK library build options"
	section in \`BUILDING.txt\` in the source code. If multiple SCM volumes are
	configured, a round-robin policy is used to select an available volume for
	caching a block. Consistent with DRAM cache, SCM cache also has no cache
	eviction mechanism. When DataNode receives a data read request from a client,
	if the corresponding block is cached into SCM, DataNode will instantiate an
	InputStream with the block location path on SCM (pure java implementation) or
	cache address on SCM (PMDK based implementation). Once the InputStream is
	created, DataNode will send the cached data to the client. Please refer
	"Centralized Cache Management" guide for more details.


	Consistent Reads from Standby Node
	----------
	Observer is a new type of a NameNode in addition to Active and Standby Nodes in
	HA settings. An Observer Node maintains a replica of the namespace same as a
	Standby Node. It additionally allows execution of clients read requests. To
	ensure read-after-write consistency within a single client, a state ID is
	introduced in RPC headers. The Observer responds to the client request only
	after its own state has caught up with the client’s state ID, which it
	previously received from the Active NameNode.

	Clients can explicitly invoke a new client protocol call msync(), which ensures
	that subsequent reads by this client from an Observer are consistent.

	A new client-side ObserverReadProxyProvider is introduced to provide automatic
	switching between Active and Observer NameNodes for submitting respectively
	write and read requests.


	Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0
	----------
	Updated checkstyle to 8.26 and updated maven-checkstyle-plugin to 3.1.0.


	ZKFC ignores dfs.namenode.rpc-bind-host and uses dfs.namenode.rpc-address to bind to host address
	----------
	ZKFC binds host address to "dfs.namenode.servicerpc-bind-host", if configured.
	Otherwise, it binds to "dfs.namenode.rpc-bind-host". If neither of those is
	configured, ZKFC binds itself to NameNode RPC server address (effectively
	"dfs.namenode.rpc-address").


	ListStatus on ViewFS root (ls "/") should list the linkFallBack root (configured target root).
	----------
	ViewFS#listStatus on root("/") considers listing from fallbackLink if available.
	If the same directory name is present in configured mount path as well as in
	fallback link, then only the configured mount path will be listed in the
	returned result.


	NMs should supply a health status when registering with RM
	----------
	Improved node registration with node health status.


	Getting Started
	===============

	The Hadoop documentation includes the information you need to get started using
	Hadoop. Begin with the
	[Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html)
	which shows you how to set up a single-node Hadoop installation.
	Then move on to the
	[Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
	to learn how to set up a multi-node Hadoop installation.