blob: 25a9bb52f5a44b8298caa8b9b8dd973038cbb193 [file] [log] [blame]
---+ Securing Falcon
---++ Overview
Apache Falcon provides the following security features:
* Credential provider alias for passwords used in Falcon server.
* Authentication to identify proper users.
* Authorization to specify resource access permission for users or groups.
* Cross-Site Request Forgery (CSRF) prevention.
* SSL to provide transport level security for data confidentiality and integrity.
---++ Credential Provider Alias for Passwords
Server-side configuration properties (i.e. startup.properties) contain passwords and other sensitive information.
In addition to specifying properties in plain text, we provide the user an option to use credential provider alias in the property file.
Take SMTP password for example. The user can store the password in a
[[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#credential][Hadoop credential provider]]
with the alias name _SMTPPasswordAlias_. In startup.properties where SMTP password is needed, the user can refer to its
alias name _SMTPPasswordAlias_ instead of providing the real password.
The alias property to be resolved through Hadoop credential provider should have the format:
_credential.provider.alias.for.[property-key]_. For example,
_credential.provider.alias.for.falcon.email.smtp.password=SMTPPasswordAlias_ for SMTP password.
Falcon server, during the start, will automatically retrieve the real password provided the alias name.
The user can specify the provider path with the property key _credential.provider.path_,
e.g. _credential.provider.path=jceks://file/tmp/test.jceks_.
If not specified, Falcon will use the default Hadoop credential provider path in core-site.xml.
---++ Authentication (User Identity)
Apache Falcon enforces authentication on protected resources. Once authentication has been established it sets a
signed HTTP Cookie that contains an authentication token with the user name, user principal,
authentication type and expiration time.
It does so by using [[http://hadoop.apache .org/docs/current/hadoop-auth/index.html][Hadoop Auth]].
Hadoop Auth is a Java library consisting of a client and a server components to enable Kerberos SPNEGO authentication
for HTTP. Hadoop Auth also supports additional authentication mechanisms on the client and the server side via 2
simple interfaces.
---+++ Authentication Methods
It supports 2 authentication methods, simple and kerberos out of the box.
---++++ Pseudo/Simple Authentication
Falcon authenticates the user by simply trusting the value of the query string parameter 'user.name'. This is the
default mode Falcon is configured with.
---++++ Kerberos Authentication
Falcon uses HTTP Kerberos SPNEGO to authenticate the user.
---++ Authorization
Falcon also enforces authorization on Entities using ACLs (Access Control Lists). ACLs are useful
for implementing permission requirements and provide a way to set different permissions for
specific users or named groups.
By default, support for authorization is disabled and can be enabled in startup.properties.
---+++ ACLs in Entity
All Entities now have ACL which needs to be present if authorization is enabled. Only owners who
own or created the entity will be allowed to update or delete their entities.
An entity has ACLs (Access Control Lists) that are useful for implementing permission requirements
and provide a way to set different permissions for specific users or named groups.
<verbatim>
<ACL owner="test-user" group="test-group" permission="*"/>
</verbatim>
ACL indicates the Access control list for this cluster.
owner is the Owner of this entity.
group is the one which has access to read.
permission indicates the rwx is not enforced at this time.
---+++ Super-User
The super-user is the user with the same identity as falcon process itself. Loosely, if you
started the falcon, then you are the super-user. The super-user can do anything in that
permissions checks never fail for the super-user. There is no persistent notion of who was the
super-user; when the falcon is started the process identity determines who is the super-user
for now. The Falcon super-user does not have to be the super-user of the falcon host, nor is it
necessary that all clusters have the same super-user. Also, an experimenter running Falcon on a
personal workstation, conveniently becomes that installation's super-user without any configuration.
Falcon also allows users to configure a super user group and allows users belonging to this
group to be a super user.
ACL owner and group must be valid even if the authenticated user is a super-user.
---+++ Group Memberships
Once a user has been authenticated and a username has been determined, the list of groups is
determined by a group mapping service, configured by the hadoop.security.group.mapping property
in Hadoop. The default implementation, org.apache.hadoop.security.ShellBasedUnixGroupsMapping,
will shell out to the Unix bash -c groups command to resolve a list of groups for a user.
Note that Falcon stores the user and group of an Entity as strings; there is no
conversion from user and group identity numbers as is conventional in Unix.
The only limitation is that a user cannot add a group in ACL that he does not belong to.
---+++ Authorization Provider
Falcon provides a plugin-able provider interface for Authorization. It also ships with a default
implementation that enforces the following authorization policy.
---++++ Entity and Instance Management Operations Policy
* All Entity and Instance operations are authorized for users who created them, Owners and users with group memberships
* Reference to entities with in a feed or process is allowed with out enforcing permissions
Any Feed or Process can refer to a Cluster entity not owned by the Feed or Process owner. Any Process can refer to a Feed entity not owned by the Process owner
The authorization is enforced in the following way:
* if admin resource,
* If authenticated user name matches the admin users configuration
* Else if groups of the authenticated user matches the admin groups configuration
* Else authorization exception is thrown
* Else if entities or instance resource
* If the authenticated user matches the owner in ACL for the entity
* Else if the groups of the authenticated user matches the group in ACL for the entity
* Else authorization exception is thrown
* Else if lineage resource
* All have read-only permissions, reason being folks should be able to examine the dependency and allow reuse
To authenticate user for REST api calls, user should append "user.name=<username>" to the query.
*operations on Entity Resource*
| *Resource* | *Description* | *Authorization* |
| [[restapi/EntityValidate][api/entities/validate/:entity-type]] | Validate the entity | Owner/Group |
| [[restapi/EntitySubmit][api/entities/submit/:entity-type]] | Submit the entity | Owner/Group |
| [[restapi/EntityUpdate][api/entities/update/:entity-type/:entity-name]] | Update the entity | Owner/Group |
| [[restapi/EntitySubmitAndSchedule][api/entities/submitAndSchedule/:entity-type]] | Submit & Schedule the entity | Owner/Group |
| [[restapi/EntitySchedule][api/entities/schedule/:entity-type/:entity-name]] | Schedule the entity | Owner/Group |
| [[restapi/EntitySuspend][api/entities/suspend/:entity-type/:entity-name]] | Suspend the entity | Owner/Group |
| [[restapi/EntityResume][api/entities/resume/:entity-type/:entity-name]] | Resume the entity | Owner/Group |
| [[restapi/EntityDelete][api/entities/delete/:entity-type/:entity-name]] | Delete the entity | Owner/Group |
| [[restapi/EntityStatus][api/entities/status/:entity-type/:entity-name]] | Get the status of the entity | Owner/Group |
| [[restapi/EntityDefinition][api/entities/definition/:entity-type/:entity-name]] | Get the definition of the entity | Owner/Group |
| [[restapi/EntityList][api/entities/list/:entity-type?fields=:fields]] | Get the list of entities | Owner/Group |
| [[restapi/EntityDependencies][api/entities/dependencies/:entity-type/:entity-name]] | Get the dependencies of the entity | Owner/Group |
*REST Call on Feed and Process Instances*
| *Resource* | *Description* | *Authorization* |
| [[restapi/InstanceRunning][api/instance/running/:entity-type/:entity-name]] | List of running instances. | Owner/Group |
| [[restapi/InstanceStatus][api/instance/status/:entity-type/:entity-name]] | Status of a given instance | Owner/Group |
| [[restapi/InstanceKill][api/instance/kill/:entity-type/:entity-name]] | Kill a given instance | Owner/Group |
| [[restapi/InstanceSuspend][api/instance/suspend/:entity-type/:entity-name]] | Suspend a running instance | Owner/Group |
| [[restapi/InstanceResume][api/instance/resume/:entity-type/:entity-name]] | Resume a given instance | Owner/Group |
| [[restapi/InstanceRerun][api/instance/rerun/:entity-type/:entity-name]] | Rerun a given instance | Owner/Group |
| [[InstanceLogs][api/instance/logs/:entity-type/:entity-name]] | Get logs of a given instance | Owner/Group |
---++++ Admin Resources Policy
Only users belonging to admin users or groups have access to this resource. Admin membership is
determined by a static configuration parameter.
| *Resource* | *Description* | *Authorization* |
| [[restapi/AdminVersion][api/admin/version]] | Get version of the server | No restriction |
| [[restapi/AdminStack][api/admin/stack]] | Get stack of the server | Admin User/Group |
| [[restapi/AdminConfig][api/admin/config/:config-type]] | Get configuration information of the server | Admin User/Group |
---++++ Lineage Resource Policy
Lineage is read-only and hence all users can look at lineage for their respective entities.
*Note:* This gap will be fixed in a later release.
---++ Authentication Configuration
Following is the Server Side Configuration Setup for Authentication.
---+++ Common Configuration Parameters
<verbatim>
# Authentication type must be specified: simple|kerberos
*.falcon.authentication.type=kerberos
</verbatim>
---+++ Kerberos Configuration
<verbatim>
##### Service Configuration
# Indicates the Kerberos principal to be used in Falcon Service.
*.falcon.service.authentication.kerberos.principal=falcon/_HOST@EXAMPLE.COM
# Location of the keytab file with the credentials for the Service principal.
*.falcon.service.authentication.kerberos.keytab=/etc/security/keytabs/falcon.service.keytab
# name node principal to talk to config store
*.dfs.namenode.kerberos.principal=nn/_HOST@EXAMPLE.COM
# Indicates how long (in seconds) falcon authentication token is valid before it has to be renewed.
*.falcon.service.authentication.token.validity=86400
##### SPNEGO Configuration
# Authentication type must be specified: simple|kerberos|<class>
# org.apache.falcon.security.RemoteUserInHeaderBasedAuthenticationHandler can be used for backwards compatibility
*.falcon.http.authentication.type=kerberos
# Indicates how long (in seconds) an authentication token is valid before it has to be renewed.
*.falcon.http.authentication.token.validity=36000
# The signature secret for signing the authentication tokens.
*.falcon.http.authentication.signature.secret=falcon
# The domain to use for the HTTP cookie that stores the authentication token.
*.falcon.http.authentication.cookie.domain=
# Indicates if anonymous requests are allowed when using 'simple' authentication.
*.falcon.http.authentication.simple.anonymous.allowed=true
# Indicates the Kerberos principal to be used for HTTP endpoint.
# The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification.
*.falcon.http.authentication.kerberos.principal=HTTP/_HOST@EXAMPLE.COM
# Location of the keytab file with the credentials for the HTTP principal.
*.falcon.http.authentication.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab
# The kerberos names rules is to resolve kerberos principal names, refer to Hadoop's KerberosName for more details.
*.falcon.http.authentication.kerberos.name.rules=DEFAULT
# Comma separated list of black listed users
*.falcon.http.authentication.blacklisted.users=
# Increase Jetty request buffer size to accommodate the generated Kerberos token
*.falcon.jetty.request.buffer.size=16192
</verbatim>
---+++ Pseudo/Simple Configuration
<verbatim>
##### SPNEGO Configuration
# Authentication type must be specified: simple|kerberos|<class>
# org.apache.falcon.security.RemoteUserInHeaderBasedAuthenticationHandler can be used for backwards compatibility
*.falcon.http.authentication.type=simple
# Indicates how long (in seconds) an authentication token is valid before it has to be renewed.
*.falcon.http.authentication.token.validity=36000
# The signature secret for signing the authentication tokens.
*.falcon.http.authentication.signature.secret=falcon
# The domain to use for the HTTP cookie that stores the authentication token.
*.falcon.http.authentication.cookie.domain=
# Indicates if anonymous requests are allowed when using 'simple' authentication.
*.falcon.http.authentication.simple.anonymous.allowed=true
# Comma separated list of black listed users
*.falcon.http.authentication.blacklisted.users=
</verbatim>
---++ Authorization Configuration
---+++ Enabling Authorization
By default, support for authorization is disabled and specifying ACLs in entities are optional.
To enable support for authorization, set falcon.security.authorization.enabled to true in the
startup configuration.
<verbatim>
# Authorization Enabled flag: false|true
*.falcon.security.authorization.enabled=true
</verbatim>
---+++ Authorization Provider
Falcon provides a basic implementation for Authorization bundled, org.apache.falcon.security .DefaultFalconAuthorizationProvider.
This can be overridden by custom implementations in the startup configuration.
<verbatim>
# Authorization Provider Fully Qualified Class Name
*.falcon.security.authorization.provider=org.apache.falcon.security.DefaultAuthorizationProvider
</verbatim>
---+++ Super User Group
Super user group is determined by the configuration:
<verbatim>
# The name of the group of super-users
*.falcon.security.authorization.superusergroup=falcon
</verbatim>
---+++ Admin Membership
Administrative users are determined by the configuration:
<verbatim>
# Admin Users, comma separated users
*.falcon.security.authorization.admin.users=falcon,ambari-qa,seetharam
</verbatim>
Administrative groups are determined by the configuration:
<verbatim>
# Admin Group Membership, comma separated users
*.falcon.security.authorization.admin.groups=falcon,testgroup,staff
</verbatim>
---++ Cross-Site Request Forgery (CSRF) Prevention
Cross-Site Request Forgery (CSRF) is an attack that forces an end user to execute unwanted actions on a web application in which they're currently authenticated.
Falcon provides an option to prevent CSRF with Hadoop CSRF filter for REST APIs. By default, Falcon CSRF filter is disabled.
To enable the support for CSRF prevention, set falcon.security.csrf.enabled to true in the startup configuration.
We also provide options to configure custom header and browser user agents.
<verbatim>
# CSRF filter enabled flag: false (default) | true
*.falcon.security.csrf.enabled=true
# Custom header for CSRF filter
*.falcon.security.csrf.header=FALCON-CSRF-FILTER
# Browser user agents to be filtered
*.falcon.security.csrf.browser=^Mozilla.*,^Opera.*
</verbatim>
---++ SSL
Falcon provides transport level security ensuring data confidentiality and integrity. This is
enabled by default for communicating over HTTP between the client and the server.
---+++ SSL Configuration
<verbatim>
*.falcon.enableTLS=true
*.keystore.file=/path/to/keystore/file
*.keystore.password=password
</verbatim>
---+++ Distributed Falcon Setup
Falcon should be configured to communicate with Prism over TLS in secure mode. Its not enabled by default.
---++ Changes to ownership and permissions of directories managed by Falcon
| *Directory* | *Location* | *Owner* | *Permissions* |
| Configuration Store | ${config.store.uri} | falcon | 700 |
| Cluster Staging Location | ${cluster.staging-location} | falcon | 777 |
| Cluster Working Location | ${cluster.working-location} | falcon | 755 |
| Shared libs | {cluster.working}/{lib,libext} | falcon | 755 |
| Oozie coord/bundle XMLs | ${cluster.staging-location}/workflows/{entity}/{entity-name} | $user | cluster umask |
| App logs | ${cluster.staging-location}/workflows/{entity}/{entity-name}/logs | $user | cluster umask |
*Note:* Please note that the cluster staging and working locations MUST be created prior to
submitting a cluster entity to Falcon. Also, note that the the parent dirs must have execute
permissions.
---++ Backwards compatibility
---+++ Scheduled Entities
Entities already scheduled with an earlier version of Falcon are not compatible with this version
---+++ Falcon Clients
Older Falcon clients are backwards compatible wrt Authentication and user information sent as part of the HTTP
header, Remote-User is still honoured when the authentication type is configured as below:
<verbatim>
*.falcon.http.authentication.type=org.apache.falcon.security.RemoteUserInHeaderBasedAuthenticationHandler
</verbatim>
---+++ Blacklisted super users for authentication
The blacklist users used to have the following super users: hdfs, mapreduce, oozie, and falcon.
The list is externalized from code into Startup.properties file and is empty now and needs to be
configured specifically in the file.
---+++ Falcon Dashboard
To initialize the current user for dashboard, user should append query param "user.name=<username>" to the REST api call.
If dashboard user wishes to change the current user, they should do the following.
* delete the hadoop.auth cookie from browser cache.
* append query param "user.name=<new_user>" to the next REST API call.
In Kerberos method, the browser must support HTTP Kerberos SPNEGO.
---++ Known Limitations
* ActiveMQ topics are not secure but will be in the near future
* Entities already scheduled with an earlier version of Falcon are not compatible with this version as new
workflow parameters are being passed back into Falcon such as the user are required
* Use of hftp as the scheme for read only interface in cluster entity [[https://issues.apache.org/jira/browse/HADOOP-10215][will not work in Oozie]]
The alternative is to use webhdfs scheme instead and its been tested with DistCp.
---++ Examples
---+++ Accessing the server using Falcon CLI (Java client)
There is no change in the way the CLI is used. The CLI has been changed to work with the configured authentication
method.
---+++ Accessing the server using curl
Try accessing protected resources using curl. The protected resources are:
<verbatim>
$ kinit
Please enter the password for venkatesh@LOCALHOST:
$ curl http://localhost:15000/api/admin/version
$ curl http://localhost:15000/api/admin/version?user.name=venkatesh
$ curl --negotiate -u foo -b ~/cookiejar.txt -c ~/cookiejar.txt curl http://localhost:15000/api/admin/version
</verbatim>