| ---+ Securing Falcon |
| |
| ---++ Overview |
| |
| Apache Falcon provides the following security features: |
| * Credential provider alias for passwords used in Falcon server. |
| * Authentication to identify proper users. |
| * Authorization to specify resource access permission for users or groups. |
| * Cross-Site Request Forgery (CSRF) prevention. |
| * SSL to provide transport level security for data confidentiality and integrity. |
| |
| ---++ Credential Provider Alias for Passwords |
| Server-side configuration properties (i.e. startup.properties) contain passwords and other sensitive information. |
| In addition to specifying properties in plain text, we provide the user an option to use credential provider alias in the property file. |
| |
| Take SMTP password for example. The user can store the password in a |
| [[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html#credential][Hadoop credential provider]] |
| with the alias name _SMTPPasswordAlias_. In startup.properties where SMTP password is needed, the user can refer to its |
| alias name _SMTPPasswordAlias_ instead of providing the real password. |
| |
| The alias property to be resolved through Hadoop credential provider should have the format: |
| _credential.provider.alias.for.[property-key]_. For example, |
| _credential.provider.alias.for.falcon.email.smtp.password=SMTPPasswordAlias_ for SMTP password. |
| Falcon server, during the start, will automatically retrieve the real password provided the alias name. |
| |
| The user can specify the provider path with the property key _credential.provider.path_, |
| e.g. _credential.provider.path=jceks://file/tmp/test.jceks_. |
| If not specified, Falcon will use the default Hadoop credential provider path in core-site.xml. |
| |
| |
| ---++ Authentication (User Identity) |
| |
| Apache Falcon enforces authentication on protected resources. Once authentication has been established it sets a |
| signed HTTP Cookie that contains an authentication token with the user name, user principal, |
| authentication type and expiration time. |
| |
| It does so by using [[http://hadoop.apache .org/docs/current/hadoop-auth/index.html][Hadoop Auth]]. |
| Hadoop Auth is a Java library consisting of a client and a server components to enable Kerberos SPNEGO authentication |
| for HTTP. Hadoop Auth also supports additional authentication mechanisms on the client and the server side via 2 |
| simple interfaces. |
| |
| |
| ---+++ Authentication Methods |
| |
| It supports 2 authentication methods, simple and kerberos out of the box. |
| |
| ---++++ Pseudo/Simple Authentication |
| |
| Falcon authenticates the user by simply trusting the value of the query string parameter 'user.name'. This is the |
| default mode Falcon is configured with. |
| |
| ---++++ Kerberos Authentication |
| |
| Falcon uses HTTP Kerberos SPNEGO to authenticate the user. |
| |
| |
| ---++ Authorization |
| |
| Falcon also enforces authorization on Entities using ACLs (Access Control Lists). ACLs are useful |
| for implementing permission requirements and provide a way to set different permissions for |
| specific users or named groups. |
| |
| By default, support for authorization is disabled and can be enabled in startup.properties. |
| |
| ---+++ ACLs in Entity |
| |
| All Entities now have ACL which needs to be present if authorization is enabled. Only owners who |
| own or created the entity will be allowed to update or delete their entities. |
| |
| An entity has ACLs (Access Control Lists) that are useful for implementing permission requirements |
| and provide a way to set different permissions for specific users or named groups. |
| <verbatim> |
| <ACL owner="test-user" group="test-group" permission="*"/> |
| </verbatim> |
| ACL indicates the Access control list for this cluster. |
| owner is the Owner of this entity. |
| group is the one which has access to read. |
| permission indicates the rwx is not enforced at this time. |
| |
| ---+++ Super-User |
| |
| The super-user is the user with the same identity as falcon process itself. Loosely, if you |
| started the falcon, then you are the super-user. The super-user can do anything in that |
| permissions checks never fail for the super-user. There is no persistent notion of who was the |
| super-user; when the falcon is started the process identity determines who is the super-user |
| for now. The Falcon super-user does not have to be the super-user of the falcon host, nor is it |
| necessary that all clusters have the same super-user. Also, an experimenter running Falcon on a |
| personal workstation, conveniently becomes that installation's super-user without any configuration. |
| |
| Falcon also allows users to configure a super user group and allows users belonging to this |
| group to be a super user. |
| |
| ACL owner and group must be valid even if the authenticated user is a super-user. |
| |
| ---+++ Group Memberships |
| |
| Once a user has been authenticated and a username has been determined, the list of groups is |
| determined by a group mapping service, configured by the hadoop.security.group.mapping property |
| in Hadoop. The default implementation, org.apache.hadoop.security.ShellBasedUnixGroupsMapping, |
| will shell out to the Unix bash -c groups command to resolve a list of groups for a user. |
| |
| Note that Falcon stores the user and group of an Entity as strings; there is no |
| conversion from user and group identity numbers as is conventional in Unix. |
| |
| The only limitation is that a user cannot add a group in ACL that he does not belong to. |
| |
| ---+++ Authorization Provider |
| |
| Falcon provides a plugin-able provider interface for Authorization. It also ships with a default |
| implementation that enforces the following authorization policy. |
| |
| ---++++ Entity and Instance Management Operations Policy |
| |
| * All Entity and Instance operations are authorized for users who created them, Owners and users with group memberships |
| * Reference to entities with in a feed or process is allowed with out enforcing permissions |
| |
| Any Feed or Process can refer to a Cluster entity not owned by the Feed or Process owner. Any Process can refer to a Feed entity not owned by the Process owner |
| |
| The authorization is enforced in the following way: |
| |
| * if admin resource, |
| * If authenticated user name matches the admin users configuration |
| * Else if groups of the authenticated user matches the admin groups configuration |
| * Else authorization exception is thrown |
| * Else if entities or instance resource |
| * If the authenticated user matches the owner in ACL for the entity |
| * Else if the groups of the authenticated user matches the group in ACL for the entity |
| * Else authorization exception is thrown |
| * Else if lineage resource |
| * All have read-only permissions, reason being folks should be able to examine the dependency and allow reuse |
| |
| To authenticate user for REST api calls, user should append "user.name=<username>" to the query. |
| |
| *operations on Entity Resource* |
| |
| | *Resource* | *Description* | *Authorization* | |
| | [[restapi/EntityValidate][api/entities/validate/:entity-type]] | Validate the entity | Owner/Group | |
| | [[restapi/EntitySubmit][api/entities/submit/:entity-type]] | Submit the entity | Owner/Group | |
| | [[restapi/EntityUpdate][api/entities/update/:entity-type/:entity-name]] | Update the entity | Owner/Group | |
| | [[restapi/EntitySubmitAndSchedule][api/entities/submitAndSchedule/:entity-type]] | Submit & Schedule the entity | Owner/Group | |
| | [[restapi/EntitySchedule][api/entities/schedule/:entity-type/:entity-name]] | Schedule the entity | Owner/Group | |
| | [[restapi/EntitySuspend][api/entities/suspend/:entity-type/:entity-name]] | Suspend the entity | Owner/Group | |
| | [[restapi/EntityResume][api/entities/resume/:entity-type/:entity-name]] | Resume the entity | Owner/Group | |
| | [[restapi/EntityDelete][api/entities/delete/:entity-type/:entity-name]] | Delete the entity | Owner/Group | |
| | [[restapi/EntityStatus][api/entities/status/:entity-type/:entity-name]] | Get the status of the entity | Owner/Group | |
| | [[restapi/EntityDefinition][api/entities/definition/:entity-type/:entity-name]] | Get the definition of the entity | Owner/Group | |
| | [[restapi/EntityList][api/entities/list/:entity-type?fields=:fields]] | Get the list of entities | Owner/Group | |
| | [[restapi/EntityDependencies][api/entities/dependencies/:entity-type/:entity-name]] | Get the dependencies of the entity | Owner/Group | |
| |
| *REST Call on Feed and Process Instances* |
| |
| | *Resource* | *Description* | *Authorization* | |
| | [[restapi/InstanceRunning][api/instance/running/:entity-type/:entity-name]] | List of running instances. | Owner/Group | |
| | [[restapi/InstanceStatus][api/instance/status/:entity-type/:entity-name]] | Status of a given instance | Owner/Group | |
| | [[restapi/InstanceKill][api/instance/kill/:entity-type/:entity-name]] | Kill a given instance | Owner/Group | |
| | [[restapi/InstanceSuspend][api/instance/suspend/:entity-type/:entity-name]] | Suspend a running instance | Owner/Group | |
| | [[restapi/InstanceResume][api/instance/resume/:entity-type/:entity-name]] | Resume a given instance | Owner/Group | |
| | [[restapi/InstanceRerun][api/instance/rerun/:entity-type/:entity-name]] | Rerun a given instance | Owner/Group | |
| | [[InstanceLogs][api/instance/logs/:entity-type/:entity-name]] | Get logs of a given instance | Owner/Group | |
| |
| ---++++ Admin Resources Policy |
| |
| Only users belonging to admin users or groups have access to this resource. Admin membership is |
| determined by a static configuration parameter. |
| |
| | *Resource* | *Description* | *Authorization* | |
| | [[restapi/AdminVersion][api/admin/version]] | Get version of the server | No restriction | |
| | [[restapi/AdminStack][api/admin/stack]] | Get stack of the server | Admin User/Group | |
| | [[restapi/AdminConfig][api/admin/config/:config-type]] | Get configuration information of the server | Admin User/Group | |
| |
| |
| ---++++ Lineage Resource Policy |
| |
| Lineage is read-only and hence all users can look at lineage for their respective entities. |
| *Note:* This gap will be fixed in a later release. |
| |
| |
| ---++ Authentication Configuration |
| |
| Following is the Server Side Configuration Setup for Authentication. |
| |
| ---+++ Common Configuration Parameters |
| |
| <verbatim> |
| # Authentication type must be specified: simple|kerberos |
| *.falcon.authentication.type=kerberos |
| </verbatim> |
| |
| ---+++ Kerberos Configuration |
| |
| <verbatim> |
| ##### Service Configuration |
| |
| # Indicates the Kerberos principal to be used in Falcon Service. |
| *.falcon.service.authentication.kerberos.principal=falcon/_HOST@EXAMPLE.COM |
| |
| # Location of the keytab file with the credentials for the Service principal. |
| *.falcon.service.authentication.kerberos.keytab=/etc/security/keytabs/falcon.service.keytab |
| |
| # name node principal to talk to config store |
| *.dfs.namenode.kerberos.principal=nn/_HOST@EXAMPLE.COM |
| |
| # Indicates how long (in seconds) falcon authentication token is valid before it has to be renewed. |
| *.falcon.service.authentication.token.validity=86400 |
| |
| ##### SPNEGO Configuration |
| |
| # Authentication type must be specified: simple|kerberos|<class> |
| # org.apache.falcon.security.RemoteUserInHeaderBasedAuthenticationHandler can be used for backwards compatibility |
| *.falcon.http.authentication.type=kerberos |
| |
| # Indicates how long (in seconds) an authentication token is valid before it has to be renewed. |
| *.falcon.http.authentication.token.validity=36000 |
| |
| # The signature secret for signing the authentication tokens. |
| *.falcon.http.authentication.signature.secret=falcon |
| |
| # The domain to use for the HTTP cookie that stores the authentication token. |
| *.falcon.http.authentication.cookie.domain= |
| |
| # Indicates if anonymous requests are allowed when using 'simple' authentication. |
| *.falcon.http.authentication.simple.anonymous.allowed=true |
| |
| # Indicates the Kerberos principal to be used for HTTP endpoint. |
| # The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification. |
| *.falcon.http.authentication.kerberos.principal=HTTP/_HOST@EXAMPLE.COM |
| |
| # Location of the keytab file with the credentials for the HTTP principal. |
| *.falcon.http.authentication.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab |
| |
| # The kerberos names rules is to resolve kerberos principal names, refer to Hadoop's KerberosName for more details. |
| *.falcon.http.authentication.kerberos.name.rules=DEFAULT |
| |
| # Comma separated list of black listed users |
| *.falcon.http.authentication.blacklisted.users= |
| |
| # Increase Jetty request buffer size to accommodate the generated Kerberos token |
| *.falcon.jetty.request.buffer.size=16192 |
| </verbatim> |
| |
| ---+++ Pseudo/Simple Configuration |
| |
| <verbatim> |
| ##### SPNEGO Configuration |
| |
| # Authentication type must be specified: simple|kerberos|<class> |
| # org.apache.falcon.security.RemoteUserInHeaderBasedAuthenticationHandler can be used for backwards compatibility |
| *.falcon.http.authentication.type=simple |
| |
| # Indicates how long (in seconds) an authentication token is valid before it has to be renewed. |
| *.falcon.http.authentication.token.validity=36000 |
| |
| # The signature secret for signing the authentication tokens. |
| *.falcon.http.authentication.signature.secret=falcon |
| |
| # The domain to use for the HTTP cookie that stores the authentication token. |
| *.falcon.http.authentication.cookie.domain= |
| |
| # Indicates if anonymous requests are allowed when using 'simple' authentication. |
| *.falcon.http.authentication.simple.anonymous.allowed=true |
| |
| # Comma separated list of black listed users |
| *.falcon.http.authentication.blacklisted.users= |
| </verbatim> |
| |
| ---++ Authorization Configuration |
| |
| ---+++ Enabling Authorization |
| By default, support for authorization is disabled and specifying ACLs in entities are optional. |
| To enable support for authorization, set falcon.security.authorization.enabled to true in the |
| startup configuration. |
| |
| <verbatim> |
| # Authorization Enabled flag: false|true |
| *.falcon.security.authorization.enabled=true |
| </verbatim> |
| |
| ---+++ Authorization Provider |
| |
| Falcon provides a basic implementation for Authorization bundled, org.apache.falcon.security .DefaultFalconAuthorizationProvider. |
| This can be overridden by custom implementations in the startup configuration. |
| |
| <verbatim> |
| # Authorization Provider Fully Qualified Class Name |
| *.falcon.security.authorization.provider=org.apache.falcon.security.DefaultAuthorizationProvider |
| </verbatim> |
| |
| ---+++ Super User Group |
| |
| Super user group is determined by the configuration: |
| |
| <verbatim> |
| # The name of the group of super-users |
| *.falcon.security.authorization.superusergroup=falcon |
| </verbatim> |
| |
| ---+++ Admin Membership |
| |
| Administrative users are determined by the configuration: |
| |
| <verbatim> |
| # Admin Users, comma separated users |
| *.falcon.security.authorization.admin.users=falcon,ambari-qa,seetharam |
| </verbatim> |
| |
| Administrative groups are determined by the configuration: |
| |
| <verbatim> |
| # Admin Group Membership, comma separated users |
| *.falcon.security.authorization.admin.groups=falcon,testgroup,staff |
| </verbatim> |
| |
| ---++ Cross-Site Request Forgery (CSRF) Prevention |
| |
| Cross-Site Request Forgery (CSRF) is an attack that forces an end user to execute unwanted actions on a web application in which they're currently authenticated. |
| Falcon provides an option to prevent CSRF with Hadoop CSRF filter for REST APIs. By default, Falcon CSRF filter is disabled. |
| To enable the support for CSRF prevention, set falcon.security.csrf.enabled to true in the startup configuration. |
| We also provide options to configure custom header and browser user agents. |
| |
| <verbatim> |
| # CSRF filter enabled flag: false (default) | true |
| *.falcon.security.csrf.enabled=true |
| # Custom header for CSRF filter |
| *.falcon.security.csrf.header=FALCON-CSRF-FILTER |
| # Browser user agents to be filtered |
| *.falcon.security.csrf.browser=^Mozilla.*,^Opera.* |
| </verbatim> |
| |
| |
| ---++ SSL |
| |
| Falcon provides transport level security ensuring data confidentiality and integrity. This is |
| enabled by default for communicating over HTTP between the client and the server. |
| |
| ---+++ SSL Configuration |
| |
| <verbatim> |
| *.falcon.enableTLS=true |
| *.keystore.file=/path/to/keystore/file |
| *.keystore.password=password |
| </verbatim> |
| |
| ---+++ Distributed Falcon Setup |
| |
| Falcon should be configured to communicate with Prism over TLS in secure mode. Its not enabled by default. |
| |
| |
| ---++ Changes to ownership and permissions of directories managed by Falcon |
| |
| | *Directory* | *Location* | *Owner* | *Permissions* | |
| | Configuration Store | ${config.store.uri} | falcon | 700 | |
| | Cluster Staging Location | ${cluster.staging-location} | falcon | 777 | |
| | Cluster Working Location | ${cluster.working-location} | falcon | 755 | |
| | Shared libs | {cluster.working}/{lib,libext} | falcon | 755 | |
| | Oozie coord/bundle XMLs | ${cluster.staging-location}/workflows/{entity}/{entity-name} | $user | cluster umask | |
| | App logs | ${cluster.staging-location}/workflows/{entity}/{entity-name}/logs | $user | cluster umask | |
| |
| *Note:* Please note that the cluster staging and working locations MUST be created prior to |
| submitting a cluster entity to Falcon. Also, note that the the parent dirs must have execute |
| permissions. |
| |
| |
| ---++ Backwards compatibility |
| |
| ---+++ Scheduled Entities |
| |
| Entities already scheduled with an earlier version of Falcon are not compatible with this version |
| |
| ---+++ Falcon Clients |
| |
| Older Falcon clients are backwards compatible wrt Authentication and user information sent as part of the HTTP |
| header, Remote-User is still honoured when the authentication type is configured as below: |
| |
| <verbatim> |
| *.falcon.http.authentication.type=org.apache.falcon.security.RemoteUserInHeaderBasedAuthenticationHandler |
| </verbatim> |
| |
| ---+++ Blacklisted super users for authentication |
| |
| The blacklist users used to have the following super users: hdfs, mapreduce, oozie, and falcon. |
| The list is externalized from code into Startup.properties file and is empty now and needs to be |
| configured specifically in the file. |
| |
| |
| ---+++ Falcon Dashboard |
| |
| To initialize the current user for dashboard, user should append query param "user.name=<username>" to the REST api call. |
| |
| If dashboard user wishes to change the current user, they should do the following. |
| * delete the hadoop.auth cookie from browser cache. |
| * append query param "user.name=<new_user>" to the next REST API call. |
| |
| In Kerberos method, the browser must support HTTP Kerberos SPNEGO. |
| |
| |
| ---++ Known Limitations |
| |
| * ActiveMQ topics are not secure but will be in the near future |
| * Entities already scheduled with an earlier version of Falcon are not compatible with this version as new |
| workflow parameters are being passed back into Falcon such as the user are required |
| * Use of hftp as the scheme for read only interface in cluster entity [[https://issues.apache.org/jira/browse/HADOOP-10215][will not work in Oozie]] |
| The alternative is to use webhdfs scheme instead and its been tested with DistCp. |
| |
| |
| ---++ Examples |
| |
| ---+++ Accessing the server using Falcon CLI (Java client) |
| |
| There is no change in the way the CLI is used. The CLI has been changed to work with the configured authentication |
| method. |
| |
| ---+++ Accessing the server using curl |
| |
| Try accessing protected resources using curl. The protected resources are: |
| |
| <verbatim> |
| $ kinit |
| Please enter the password for venkatesh@LOCALHOST: |
| |
| $ curl http://localhost:15000/api/admin/version |
| |
| $ curl http://localhost:15000/api/admin/version?user.name=venkatesh |
| |
| $ curl --negotiate -u foo -b ~/cookiejar.txt -c ~/cookiejar.txt curl http://localhost:15000/api/admin/version |
| </verbatim> |