branch-2.0.4-alpha/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm - hadoop - Git at Google

 ~~ Licensed under the Apache License, Version 2.0 (the "License");
 ~~ you may not use this file except in compliance with the License.
 ~~ You may obtain a copy of the License at
 ~~
 ~~   http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing, software
 ~~ distributed under the License is distributed on an "AS IS" BASIS,
 ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License. See accompanying LICENSE file.

   ---
   HDFS Permissions Guide
   ---
   ---
   ${maven.build.timestamp}

 HDFS Permissions Guide

   \[ {{{./index.html}Go Back}} \]

 %{toc|section=1|fromDepth=0}

 * Overview

    The Hadoop Distributed File System (HDFS) implements a permissions
    model for files and directories that shares much of the POSIX model.
    Each file and directory is associated with an owner and a group. The
    file or directory has separate permissions for the user that is the
    owner, for other users that are members of the group, and for all other
    users. For files, the r permission is required to read the file, and
    the w permission is required to write or append to the file. For
    directories, the r permission is required to list the contents of the
    directory, the w permission is required to create or delete files or
    directories, and the x permission is required to access a child of the
    directory.

    In contrast to the POSIX model, there are no setuid or setgid bits for
    files as there is no notion of executable files. For directories, there
    are no setuid or setgid bits directory as a simplification. The Sticky
    bit can be set on directories, preventing anyone except the superuser,
    directory owner or file owner from deleting or moving the files within
    the directory. Setting the sticky bit for a file has no effect.
    Collectively, the permissions of a file or directory are its mode. In
    general, Unix customs for representing and displaying modes will be
    used, including the use of octal numbers in this description. When a
    file or directory is created, its owner is the user identity of the
    client process, and its group is the group of the parent directory (the
    BSD rule).

    Each client process that accesses HDFS has a two-part identity composed
    of the user name, and groups list. Whenever HDFS must do a permissions
    check for a file or directory foo accessed by a client process,

      * If the user name matches the owner of foo, then the owner
        permissions are tested;
      * Else if the group of foo matches any of member of the groups list,
        then the group permissions are tested;
      * Otherwise the other permissions of foo are tested.

    If a permissions check fails, the client operation fails.

 * User Identity

    As of Hadoop 0.22, Hadoop supports two different modes of operation to
    determine the user's identity, specified by the
    hadoop.security.authentication property:

    * <<simple>>

           In this mode of operation, the identity of a client process is
           determined by the host operating system. On Unix-like systems,
           the user name is the equivalent of `whoami`.

    * <<kerberos>>

           In Kerberized operation, the identity of a client process is
           determined by its Kerberos credentials. For example, in a
           Kerberized environment, a user may use the kinit utility to
           obtain a Kerberos ticket-granting-ticket (TGT) and use klist to
           determine their current principal. When mapping a Kerberos
           principal to an HDFS username, all components except for the
           primary are dropped. For example, a principal
           todd/foobar@CORP.COMPANY.COM will act as the simple username
           todd on HDFS.

    Regardless of the mode of operation, the user identity mechanism is
    extrinsic to HDFS itself. There is no provision within HDFS for
    creating user identities, establishing groups, or processing user
    credentials.

 * Group Mapping

    Once a username has been determined as described above, the list of
    groups is determined by a group mapping service, configured by the
    hadoop.security.group.mapping property. The default implementation,
    org.apache.hadoop.security.ShellBasedUnixGroupsMapping, will shell out
    to the Unix bash -c groups command to resolve a list of groups for a
    user.

    An alternate implementation, which connects directly to an LDAP server
    to resolve the list of groups, is available via
    org.apache.hadoop.security.LdapGroupsMapping. However, this provider
    should only be used if the required groups reside exclusively in LDAP,
    and are not materialized on the Unix servers. More information on
    configuring the group mapping service is available in the Javadocs.

    For HDFS, the mapping of users to groups is performed on the NameNode.
    Thus, the host system configuration of the NameNode determines the
    group mappings for the users.

    Note that HDFS stores the user and group of a file or directory as
    strings; there is no conversion from user and group identity numbers as
    is conventional in Unix.

 * Understanding the Implementation

    Each file or directory operation passes the full path name to the name
    node, and the permissions checks are applied along the path for each
    operation. The client framework will implicitly associate the user
    identity with the connection to the name node, reducing the need for
    changes to the existing client API. It has always been the case that
    when one operation on a file succeeds, the operation might fail when
    repeated because the file, or some directory on the path, no longer
    exists. For instance, when the client first begins reading a file, it
    makes a first request to the name node to discover the location of the
    first blocks of the file. A second request made to find additional
    blocks may fail. On the other hand, deleting a file does not revoke
    access by a client that already knows the blocks of the file. With the
    addition of permissions, a client's access to a file may be withdrawn
    between requests. Again, changing permissions does not revoke the
    access of a client that already knows the file's blocks.

 * Changes to the File System API

    All methods that use a path parameter will throw <<<AccessControlException>>>
    if permission checking fails.

    New methods:

      * <<<public FSDataOutputStream create(Path f, FsPermission permission,
        boolean overwrite, int bufferSize, short replication, long
        blockSize, Progressable progress) throws IOException;>>>

      * <<<public boolean mkdirs(Path f, FsPermission permission) throws
        IOException;>>>

      * <<<public void setPermission(Path p, FsPermission permission) throws
        IOException;>>>

      * <<<public void setOwner(Path p, String username, String groupname)
        throws IOException;>>>

      * <<<public FileStatus getFileStatus(Path f) throws IOException;>>>

        will additionally return the user, group and mode associated with the
        path.

    The mode of a new file or directory is restricted my the umask set as a
    configuration parameter. When the existing <<<create(path, …)>>> method
    (without the permission parameter) is used, the mode of the new file is
    <<<0666 & ^umask>>>. When the new <<<create(path, permission, …)>>> method
    (with the permission parameter P) is used, the mode of the new file is
    <<<P & ^umask & 0666>>>. When a new directory is created with the existing
    <<<mkdirs(path)>>>
    method (without the permission parameter), the mode of the new
    directory is <<<0777 & ^umask>>>. When the new <<<mkdirs(path, permission)>>>
    method (with the permission parameter P) is used, the mode of new
    directory is <<<P & ^umask & 0777>>>.

 * Changes to the Application Shell

    New operations:

      * <<<chmod [-R] mode file …>>>

        Only the owner of a file or the super-user is permitted to change
        the mode of a file.

      * <<<chgrp [-R] group file …>>>

        The user invoking chgrp must belong to the specified group and be
        the owner of the file, or be the super-user.

      * <<<chown [-R] [owner][:[group]] file …>>>

        The owner of a file may only be altered by a super-user.

      * <<<ls file …>>>

      * <<<lsr file …>>>

        The output is reformatted to display the owner, group and mode.

 * The Super-User

    The super-user is the user with the same identity as name node process
    itself. Loosely, if you started the name node, then you are the
    super-user. The super-user can do anything in that permissions checks
    never fail for the super-user. There is no persistent notion of who was
    the super-user; when the name node is started the process identity
    determines who is the super-user for now. The HDFS super-user does not
    have to be the super-user of the name node host, nor is it necessary
    that all clusters have the same super-user. Also, an experimenter
    running HDFS on a personal workstation, conveniently becomes that
    installation's super-user without any configuration.

    In addition, the administrator my identify a distinguished group using
    a configuration parameter. If set, members of this group are also
    super-users.

 * The Web Server

    By default, the identity of the web server is a configuration
    parameter. That is, the name node has no notion of the identity of the
    real user, but the web server behaves as if it has the identity (user
    and groups) of a user chosen by the administrator. Unless the chosen
    identity matches the super-user, parts of the name space may be
    inaccessible to the web server.

 * Configuration Parameters

      * <<<dfs.permissions = true>>>

        If yes use the permissions system as described here. If no,
        permission checking is turned off, but all other behavior is
        unchanged. Switching from one parameter value to the other does not
        change the mode, owner or group of files or directories.
        Regardless of whether permissions are on or off, chmod, chgrp and
        chown always check permissions. These functions are only useful in
        the permissions context, and so there is no backwards compatibility
        issue. Furthermore, this allows administrators to reliably set
        owners and permissions in advance of turning on regular permissions
        checking.

      * <<<dfs.web.ugi = webuser,webgroup>>>

        The user name to be used by the web server. Setting this to the
        name of the super-user allows any web client to see everything.
        Changing this to an otherwise unused identity allows web clients to
        see only those things visible using "other" permissions. Additional
        groups may be added to the comma-separated list.

      * <<<dfs.permissions.superusergroup = supergroup>>>

        The name of the group of super-users.

      * <<<fs.permissions.umask-mode = 0022>>>

        The umask used when creating files and directories. For
        configuration files, the decimal value 18 may be used.

      * <<<dfs.cluster.administrators = ACL-for-admins>>>

        The administrators for the cluster specified as an ACL. This
        controls who can access the default servlets, etc. in the HDFS.
	~~ Licensed under the Apache License, Version 2.0 (the "License");
	~~ you may not use this file except in compliance with the License.
	~~ You may obtain a copy of the License at
	~~
	~~ http://www.apache.org/licenses/LICENSE-2.0
	~~
	~~ Unless required by applicable law or agreed to in writing, software
	~~ distributed under the License is distributed on an "AS IS" BASIS,
	~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	~~ See the License for the specific language governing permissions and
	~~ limitations under the License. See accompanying LICENSE file.

	---
	HDFS Permissions Guide
	---
	---
	${maven.build.timestamp}

	HDFS Permissions Guide

	\[ {{{./index.html}Go Back}} \]

	%{toc\|section=1\|fromDepth=0}

	* Overview

	The Hadoop Distributed File System (HDFS) implements a permissions
	model for files and directories that shares much of the POSIX model.
	Each file and directory is associated with an owner and a group. The
	file or directory has separate permissions for the user that is the
	owner, for other users that are members of the group, and for all other
	users. For files, the r permission is required to read the file, and
	the w permission is required to write or append to the file. For
	directories, the r permission is required to list the contents of the
	directory, the w permission is required to create or delete files or
	directories, and the x permission is required to access a child of the
	directory.

	In contrast to the POSIX model, there are no setuid or setgid bits for
	files as there is no notion of executable files. For directories, there
	are no setuid or setgid bits directory as a simplification. The Sticky
	bit can be set on directories, preventing anyone except the superuser,
	directory owner or file owner from deleting or moving the files within
	the directory. Setting the sticky bit for a file has no effect.
	Collectively, the permissions of a file or directory are its mode. In
	general, Unix customs for representing and displaying modes will be
	used, including the use of octal numbers in this description. When a
	file or directory is created, its owner is the user identity of the
	client process, and its group is the group of the parent directory (the
	BSD rule).

	Each client process that accesses HDFS has a two-part identity composed
	of the user name, and groups list. Whenever HDFS must do a permissions
	check for a file or directory foo accessed by a client process,

	* If the user name matches the owner of foo, then the owner
	permissions are tested;
	* Else if the group of foo matches any of member of the groups list,
	then the group permissions are tested;
	* Otherwise the other permissions of foo are tested.

	If a permissions check fails, the client operation fails.

	* User Identity

	As of Hadoop 0.22, Hadoop supports two different modes of operation to
	determine the user's identity, specified by the
	hadoop.security.authentication property:

	* <<simple>>

	In this mode of operation, the identity of a client process is
	determined by the host operating system. On Unix-like systems,
	the user name is the equivalent of `whoami`.

	* <<kerberos>>

	In Kerberized operation, the identity of a client process is
	determined by its Kerberos credentials. For example, in a
	Kerberized environment, a user may use the kinit utility to
	obtain a Kerberos ticket-granting-ticket (TGT) and use klist to
	determine their current principal. When mapping a Kerberos
	principal to an HDFS username, all components except for the
	primary are dropped. For example, a principal
	todd/foobar@CORP.COMPANY.COM will act as the simple username
	todd on HDFS.

	Regardless of the mode of operation, the user identity mechanism is
	extrinsic to HDFS itself. There is no provision within HDFS for
	creating user identities, establishing groups, or processing user
	credentials.

	* Group Mapping

	Once a username has been determined as described above, the list of
	groups is determined by a group mapping service, configured by the
	hadoop.security.group.mapping property. The default implementation,
	org.apache.hadoop.security.ShellBasedUnixGroupsMapping, will shell out
	to the Unix bash -c groups command to resolve a list of groups for a
	user.

	An alternate implementation, which connects directly to an LDAP server
	to resolve the list of groups, is available via
	org.apache.hadoop.security.LdapGroupsMapping. However, this provider
	should only be used if the required groups reside exclusively in LDAP,
	and are not materialized on the Unix servers. More information on
	configuring the group mapping service is available in the Javadocs.

	For HDFS, the mapping of users to groups is performed on the NameNode.
	Thus, the host system configuration of the NameNode determines the
	group mappings for the users.

	Note that HDFS stores the user and group of a file or directory as
	strings; there is no conversion from user and group identity numbers as
	is conventional in Unix.

	* Understanding the Implementation

	Each file or directory operation passes the full path name to the name
	node, and the permissions checks are applied along the path for each
	operation. The client framework will implicitly associate the user
	identity with the connection to the name node, reducing the need for
	changes to the existing client API. It has always been the case that
	when one operation on a file succeeds, the operation might fail when
	repeated because the file, or some directory on the path, no longer
	exists. For instance, when the client first begins reading a file, it
	makes a first request to the name node to discover the location of the
	first blocks of the file. A second request made to find additional
	blocks may fail. On the other hand, deleting a file does not revoke
	access by a client that already knows the blocks of the file. With the
	addition of permissions, a client's access to a file may be withdrawn
	between requests. Again, changing permissions does not revoke the
	access of a client that already knows the file's blocks.

	* Changes to the File System API

	All methods that use a path parameter will throw <<<AccessControlException>>>
	if permission checking fails.

	New methods:

	* <<<public FSDataOutputStream create(Path f, FsPermission permission,
	boolean overwrite, int bufferSize, short replication, long
	blockSize, Progressable progress) throws IOException;>>>

	* <<<public boolean mkdirs(Path f, FsPermission permission) throws
	IOException;>>>

	* <<<public void setPermission(Path p, FsPermission permission) throws
	IOException;>>>

	* <<<public void setOwner(Path p, String username, String groupname)
	throws IOException;>>>

	* <<<public FileStatus getFileStatus(Path f) throws IOException;>>>

	will additionally return the user, group and mode associated with the
	path.

	The mode of a new file or directory is restricted my the umask set as a
	configuration parameter. When the existing <<<create(path, …)>>> method
	(without the permission parameter) is used, the mode of the new file is
	<<<0666 & ^umask>>>. When the new <<<create(path, permission, …)>>> method
	(with the permission parameter P) is used, the mode of the new file is
	<<<P & ^umask & 0666>>>. When a new directory is created with the existing
	<<<mkdirs(path)>>>
	method (without the permission parameter), the mode of the new
	directory is <<<0777 & ^umask>>>. When the new <<<mkdirs(path, permission)>>>
	method (with the permission parameter P) is used, the mode of new
	directory is <<<P & ^umask & 0777>>>.

	* Changes to the Application Shell

	New operations:

	* <<<chmod [-R] mode file …>>>

	Only the owner of a file or the super-user is permitted to change
	the mode of a file.

	* <<<chgrp [-R] group file …>>>

	The user invoking chgrp must belong to the specified group and be
	the owner of the file, or be the super-user.

	* <<<chown [-R] [owner][:[group]] file …>>>

	The owner of a file may only be altered by a super-user.

	* <<<ls file …>>>

	* <<<lsr file …>>>

	The output is reformatted to display the owner, group and mode.

	* The Super-User

	The super-user is the user with the same identity as name node process
	itself. Loosely, if you started the name node, then you are the
	super-user. The super-user can do anything in that permissions checks
	never fail for the super-user. There is no persistent notion of who was
	the super-user; when the name node is started the process identity
	determines who is the super-user for now. The HDFS super-user does not
	have to be the super-user of the name node host, nor is it necessary
	that all clusters have the same super-user. Also, an experimenter
	running HDFS on a personal workstation, conveniently becomes that
	installation's super-user without any configuration.

	In addition, the administrator my identify a distinguished group using
	a configuration parameter. If set, members of this group are also
	super-users.

	* The Web Server

	By default, the identity of the web server is a configuration
	parameter. That is, the name node has no notion of the identity of the
	real user, but the web server behaves as if it has the identity (user
	and groups) of a user chosen by the administrator. Unless the chosen
	identity matches the super-user, parts of the name space may be
	inaccessible to the web server.

	* Configuration Parameters

	* <<<dfs.permissions = true>>>

	If yes use the permissions system as described here. If no,
	permission checking is turned off, but all other behavior is
	unchanged. Switching from one parameter value to the other does not
	change the mode, owner or group of files or directories.
	Regardless of whether permissions are on or off, chmod, chgrp and
	chown always check permissions. These functions are only useful in
	the permissions context, and so there is no backwards compatibility
	issue. Furthermore, this allows administrators to reliably set
	owners and permissions in advance of turning on regular permissions
	checking.

	* <<<dfs.web.ugi = webuser,webgroup>>>

	The user name to be used by the web server. Setting this to the
	name of the super-user allows any web client to see everything.
	Changing this to an otherwise unused identity allows web clients to
	see only those things visible using "other" permissions. Additional
	groups may be added to the comma-separated list.

	* <<<dfs.permissions.superusergroup = supergroup>>>

	The name of the group of super-users.

	* <<<fs.permissions.umask-mode = 0022>>>

	The umask used when creating files and directories. For
	configuration files, the decimal value 18 may be used.

	* <<<dfs.cluster.administrators = ACL-for-admins>>>

	The administrators for the cluster specified as an ACL. This
	controls who can access the default servlets, etc. in the HDFS.