| <noautolink> |
| |
| [[index][::Go back to Oozie Documentation Index::]] |
| |
| ---+!! Oozie Installation and Configuration |
| |
| %TOC% |
| |
| ---++ Basic Setup |
| |
| Follow the instructions at [[DG_QuickStart][Oozie Quick Start]]. |
| |
| ---++ Environment Setup |
| |
| *IMPORTANT:* Oozie ignores any set value for =OOZIE_HOME=, Oozie computes its home automatically. |
| |
| When running Oozie with its embedded Tomcat server, the =conf/oozie-env.sh= file can be |
| used to configure the following environment variables used by Oozie: |
| |
| *CATALINA_OPTS* : settings for the Embedded Tomcat that runs Oozie Java System properties |
| for Oozie should be specified in this variable. No default value. |
| |
| *OOZIE_CONFIG_FILE* : Oozie configuration file to load from Oozie configuration directory. |
| Default value =oozie-site.xml=. |
| |
| *OOZIE_LOGS* : Oozie logs directory. Default value =logs/= directory in the Oozie installation |
| directory. |
| |
| *OOZIE_LOG4J_FILE* : Oozie Log4J configuration file to load from Oozie configuration directory. |
| Default value =oozie-log4j.properties=. |
| |
| *OOZIE_LOG4J_RELOAD* : Reload interval of the Log4J configuration file, in seconds. |
| Default value =10= |
| |
| *OOZIE_HTTP_PORT* : The port Oozie server runs. Default value =11000=. |
| |
| *OOZIE_ADMIN_PORT* : The admin port Oozie server runs. Default value =11001=. |
| |
| *OOZIE_HTTP_HOSTNAME* : The host name Oozie server runs on. Default value is the output of the |
| command =hostname -f=. |
| |
| *OOZIE_BASE_URL* : The base URL for actions callback URLs to Oozie. The default value |
| is =http://${OOZIE_HTTP_HOSTNAME}:${OOZIE_HTTP_PORT}/oozie=. |
| |
| *OOZIE_CHECK_OWNER* : If set to =true=, Oozie setup/start/run/stop scripts will check that the |
| owner of the Oozie installation directory matches the user invoking the script. The default |
| value is undefined and interpreted as a =false=. |
| |
| If Oozie is configured to use HTTPS (SSL), then the following environment variables are also used: |
| |
| *OOZIE_HTTPS_PORT* : The port Oozie server runs when using HTTPS. Default value =11443=. |
| |
| *OOZIE_HTTPS_KEYSTORE_FILE* : The location of the keystore file containing the certificate information. |
| Default value =${HOME}/.keystore= (i.e. the home dir of the Oozie user). |
| |
| *OOZIE_HTTPS_KEYSTORE_PASS* : The password of the keystore file. Default value =password=. |
| |
| *OOZIE_INSTANCE_ID* : The instance id of the Oozie server. When using HA, each server instance should have a unique instance id. |
| Default value =${OOZIE_HTTP_HOSTNAME}= |
| |
| ---++ Oozie Server Setup |
| |
| The =oozie-setup.sh= script prepares the embedded Tomcat server to run Oozie. |
| |
| The =oozie-setup.sh= script options are: |
| |
| <verbatim> |
| Usage : oozie-setup.sh <Command and OPTIONS> |
| prepare-war [-d directory] [-secure] (-d identifies an alternative directory for processing jars |
| -secure will configure the war file to use HTTPS (SSL)) |
| sharelib create -fs FS_URI [-locallib SHARED_LIBRARY] [-concurrency CONCURRENCY] |
| (create sharelib for oozie, |
| FS_URI is the fs.default.name |
| for hdfs uri; SHARED_LIBRARY, path to the |
| Oozie sharelib to install, it can be a tarball |
| or an expanded version of it. If omitted, |
| the Oozie sharelib tarball from the Oozie |
| installation directory will be used. |
| CONCURRENCY is a number of threads to be used |
| for copy operations. |
| By default 1 thread will be used) |
| (action fails if sharelib is already installed |
| in HDFS) |
| sharelib upgrade -fs FS_URI [-locallib SHARED_LIBRARY] ([deprecated][use create command to create new version] |
| upgrade existing sharelib, fails if there |
| is no existing sharelib installed in HDFS) |
| db create|upgrade|postupgrade -run [-sqlfile <FILE>] (create, upgrade or postupgrade oozie db with an |
| optional sql File) |
| export <file> exports the oozie database to the specified |
| file in zip format |
| import <file> imports the oozie database from the zip file |
| created by export |
| (without options prints this usage information) |
| </verbatim> |
| |
| If a directory =libext/= is present in Oozie installation directory, the =oozie-setup.sh= script |
| include all JARs in the =libext/= directory in Oozie WAR file. |
| |
| If the ExtJS ZIP file is present in the =libext/= directory, it will be added to Oozie WAR as well. |
| The ExtJS library file name be =ext-2.2.zip=. |
| |
| ---+++ Setting Up Oozie with an Alternate Tomcat |
| |
| Use the =addtowar.sh= script to prepare the Oozie server only if Oozie will run with a different |
| servlet container than the embedded Tomcat provided with the distribution. |
| |
| The =addtowar.sh= script adds Hadoop JARs, JDBC JARs and the ExtJS library to the Oozie WAR file. |
| |
| The =addtowar.sh= script options are: |
| |
| <verbatim> |
| Usage : addtowar <OPTIONS> |
| Options: -inputwar INPUT_OOZIE_WAR |
| -outputwar OUTPUT_OOZIE_WAR |
| [-hadoop HADOOP_VERSION HADOOP_PATH] |
| [-extjs EXTJS_PATH] |
| [-jars JARS_PATH] (multiple JAR path separated by ':') |
| [-secureWeb WEB_XML_PATH] (path to secure web.xml) |
| </verbatim> |
| |
| The original =oozie.war= file is in the Oozie server installation directory. |
| |
| After the Hadoop JARs and the ExtJS library has been added to the =oozie.war= file Oozie is ready to run. |
| |
| Delete any previous deployment of the =oozie.war= from the servlet container (if using Tomcat, delete |
| =oozie.war= and =oozie= directory from Tomcat's =webapps/= directory) |
| |
| Deploy the prepared =oozie.war= file (the one that contains the Hadoop JARs and the ExtJS library) in the |
| servlet container (if using Tomcat, copy the prepared =oozie.war= file to Tomcat's =webapps/= directory). |
| |
| *IMPORTANT:* Only one Oozie instance can be deployed per Tomcat instance. |
| |
| ---++ Database Configuration |
| |
| Oozie works with HSQL, Derby, MySQL, Oracle, PostgreSQL or SQL Server databases. |
| |
| By default, Oozie is configured to use Embedded Derby. |
| |
| Oozie bundles the JDBC drivers for HSQL, Embedded Derby and PostgreSQL. |
| |
| HSQL is normally used for test cases as it is an in-memory database and all data is lost every time Oozie is stopped. |
| |
| If using Derby, MySQL, Oracle, PostgreSQL, or SQL Server, the Oozie database schema must be created using the =ooziedb.sh= command |
| line tool. |
| |
| If using MySQL, Oracle, or SQL Server, the corresponding JDBC driver JAR file must be copied to Oozie's =libext/= directory and |
| it must be added to Oozie WAR file using the =bin/addtowar.sh= or the =oozie-setup.sh= scripts using the =-jars= option. |
| |
| *IMPORTANT:* It is recommended to set the database's timezone to GMT (consult your database's documentation on how to do this). |
| Databases don't handle Daylight Saving Time shifts correctly, and may cause problems if you run any Coordinators with actions |
| scheduled to materialize during the 1 hour period where we "fall back". For Derby, you can add '-Duser.timezone=GMT' |
| to =CATALINA_OPTS= in oozie-env.sh to set this. Alternatively, if using MySQL, you can have Oozie use GMT with MySQL without |
| setting MySQL's timezone to GMT by adding 'useLegacyDatetimeCode=false&serverTimezone=GMT' arguments to the JDBC |
| URL, =oozie.service.JPAService.jdbc.url=. Be advised that changing the timezone on an existing Oozie database while Coordinators |
| are already running may cause Coordinators to shift by the offset of their timezone from GMT once after making this change. |
| |
| The SQL database used by Oozie is configured using the following configuration properties (default values shown): |
| |
| <verbatim> |
| oozie.db.schema.name=oozie |
| oozie.service.JPAService.create.db.schema=false |
| oozie.service.JPAService.validate.db.connection=false |
| oozie.service.JPAService.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver |
| oozie.service.JPAService.jdbc.url=jdbc:derby:${oozie.data.dir}/${oozie.db.schema.name}-db;create=true |
| oozie.service.JPAService.jdbc.username=sa |
| oozie.service.JPAService.jdbc.password= |
| oozie.service.JPAService.pool.max.active.conn=10 |
| </verbatim> |
| |
| *NOTE:* If the =oozie.db.schema.create= property is set to =true= (default value is =false=) the Oozie tables |
| will be created automatically without having to use the =ooziedb= command line tool. Setting this property to |
| =true= it is recommended only for development. |
| |
| *NOTE:* If the =oozie.db.schema.create= property is set to true, the =oozie.service.JPAService.validate.db.connection= |
| property value is ignored and Oozie handles it as set to =false=. |
| |
| Once =oozie-site.xml= has been configured with the database configuration execute the =ooziedb.sh= command line tool to |
| create the database: |
| |
| <verbatim> |
| $ bin/ooziedb.sh create -sqlfile oozie.sql -run |
| |
| Validate DB Connection. |
| DONE |
| Check DB schema does not exist |
| DONE |
| Check OOZIE_SYS table does not exist |
| DONE |
| Create SQL schema |
| DONE |
| DONE |
| Create OOZIE_SYS table |
| DONE |
| |
| Oozie DB has been created for Oozie version '3.2.0' |
| |
| The SQL commands have been written to: oozie.sql |
| |
| $ |
| </verbatim> |
| |
| NOTE: If using MySQL, Oracle, or SQL Server, copy the corresponding JDBC driver JAR file to the =libext/= directory before running |
| the =ooziedb.sh= command line tool. |
| |
| NOTE: If instead using the '-run' option, the '-sqlfile <FILE>' option is used, then all the |
| database changes will be written to the specified file and the database won't be modified. |
| |
| If using HSQL there is no need to use the =ooziedb= command line tool as HSQL is an in-memory database. Use the |
| following configuration properties in the oozie-site.xml: |
| |
| <verbatim> |
| oozie.db.schema.name=oozie |
| oozie.service.JPAService.create.db.schema=true |
| oozie.service.JPAService.validate.db.connection=false |
| oozie.service.JPAService.jdbc.driver=org.hsqldb.jdbcDriver |
| oozie.service.JPAService.jdbc.url=jdbc:hsqldb:mem:${oozie.db.schema.name} |
| oozie.service.JPAService.jdbc.username=sa |
| oozie.service.JPAService.jdbc.password= |
| oozie.service.JPAService.pool.max.active.conn=10 |
| </verbatim> |
| |
| ---++ Database Migration |
| |
| Oozie provides an easy way to switch between databases without losing any data. Oozie servers should be stopped during the |
| database migraition process. |
| The export of the database can be done using the following command: |
| <verbatim> |
| $ bin/oozie-setup.sh export /tmp/oozie_db.zip |
| 1 rows exported from OOZIE_SYS |
| 50 rows exported from WF_JOBS |
| 340 rows exported from WF_ACTIONS |
| 10 rows exported from COORD_JOBS |
| 70 rows exported from COORD_ACTIONS |
| 0 rows exported from BUNDLE_JOBS |
| 0 rows exported from BUNDLE_ACTIONS |
| 0 rows exported from SLA_REGISTRATION |
| 0 rows exported from SLA_SUMMARY |
| </verbatim> |
| |
| The database configuration is read from =oozie-site.xml=. After updating the configuration to point to the new database, |
| the tables have to be created with ooziedb.sh in the [[AG_Install#Database_Configuration][Database configuration]] |
| section above. |
| Once the tables are created, they can be filled with data using the following command: |
| |
| <verbatim> |
| $ bin/oozie-setup.sh import /tmp/oozie_db.zip |
| Loading to Oozie database version 3 |
| 50 rows imported to WF_JOBS |
| 340 rows imported to WF_ACTIONS |
| 10 rows imported to COORD_JOBS |
| 70 rows imported to COORD_ACTIONS |
| 0 rows imported to BUNDLE_JOBS |
| 0 rows imported to BUNDLE_ACTIONS |
| 0 rows imported to SLA_REGISTRATION |
| 0 rows imported to SLA_SUMMARY |
| </verbatim> |
| |
| NOTE: The database version of the zip must match the version of the Oozie database it's imported to. |
| |
| After starting the Oozie server, the history and the currently running workflows should be available. |
| |
| *IMPORTANT:* The tool was primarily developed to make the migration from embedded databases (e.g. Derby) to standalone databases |
| (e.g. MySQL, Posgresql, Oracle, MS SQL Server), though it will work between any supported databases. |
| It is *not* optimized to handle databases over 1 Gb. If the database size is larger, it should be purged before migration. |
| |
| ---++ Oozie Configuration |
| |
| By default, Oozie configuration is read from Oozie's =conf/= directory |
| |
| The Oozie configuration is distributed in 3 different files: |
| |
| * =oozie-site.xml= : Oozie server configuration |
| * =oozie-log4j.properties= : Oozie logging configuration |
| * =adminusers.txt= : Oozie admin users list |
| |
| ---+++ Oozie Configuration Properties |
| |
| All Oozie configuration properties and their default values are defined in the =oozie-default.xml= file. |
| |
| Oozie resolves configuration property values in the following order: |
| |
| * If a Java System property is defined, it uses its value |
| * Else, if the Oozie configuration file (=oozie-site.xml=) contains the property, it uses its value |
| * Else, it uses the default value documented in the =oozie-default.xml= file |
| |
| *NOTE:* The =oozie-default.xml= file found in Oozie's =conf/= directory is not used by Oozie, it is there |
| for reference purposes only. |
| |
| ---+++ Logging Configuration |
| |
| By default, Oozie log configuration is defined in the =oozie-log4j.properties= configuration file. |
| |
| If the Oozie log configuration file changes, Oozie reloads the new settings automatically. |
| |
| By default, Oozie logs to Oozie's =logs/= directory. |
| |
| Oozie logs in 4 different files: |
| |
| * oozie.log: web services log streaming works from this log |
| * oozie-ops.log: messages for Admin/Operations to monitor |
| * oozie-instrumentation.log: instrumentation data, every 60 seconds (configurable) |
| * oozie-audit.log: audit messages, workflow jobs changes |
| |
| The embedded Tomcat and embedded Derby log files are also written to Oozie's =logs/= directory. |
| |
| ---+++ Oozie User Authentication Configuration |
| |
| Oozie supports Kerberos HTTP SPNEGO authentication, pseudo/simple authentication and anonymous access |
| for client connections. |
| |
| Anonymous access (*default*) does not require the user to authenticate and the user ID is obtained from |
| the job properties on job submission operations, other operations are anonymous. |
| |
| Pseudo/simple authentication requires the user to specify the user name on the request, this is done by |
| the PseudoAuthenticator class by injecting the =user.name= parameter in the query string of all requests. |
| The =user.name= parameter value is taken from the client process Java System property =user.name=. |
| |
| Kerberos HTTP SPNEGO authentication requires the user to perform a Kerberos HTTP SPNEGO authentication sequence. |
| |
| If Pseudo/simple or Kerberos HTTP SPNEGO authentication mechanisms are used, Oozie will return the user an |
| authentication token HTTP Cookie that can be used in later requests as identity proof. |
| |
| Oozie uses Apache Hadoop-Auth (Java HTTP SPNEGO) library for authentication. |
| This library can be extended to support other authentication mechanisms. |
| |
| Oozie user authentication is configured using the following configuration properties (default values shown): |
| |
| <verbatim> |
| oozie.authentication.type=simple |
| oozie.authentication.token.validity=36000 |
| oozie.authentication.signature.secret= |
| oozie.authentication.cookie.domain= |
| oozie.authentication.simple.anonymous.allowed=true |
| oozie.authentication.kerberos.principal=HTTP/localhost@${local.realm} |
| oozie.authentication.kerberos.keytab=${oozie.service.HadoopAccessorService.keytab.file} |
| </verbatim> |
| |
| The =type= defines authentication used for Oozie HTTP endpoint, the supported values are: |
| simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#. |
| |
| The =token.validity= indicates how long (in seconds) an authentication token is valid before it has |
| to be renewed. |
| |
| The =signature.secret= is the signature secret for signing the authentication tokens. It is recommended to not set this, in which |
| case Oozie will randomly generate one on startup. |
| |
| The =oozie.authentication.cookie.domain= The domain to use for the HTTP cookie that stores the |
| authentication token. In order to authentication to work correctly across all Hadoop nodes web-consoles |
| the domain must be correctly set. |
| |
| The =simple.anonymous.allowed= indicates if anonymous requests are allowed. This setting is meaningful |
| only when using 'simple' authentication. |
| |
| The =kerberos.principal= indicates the Kerberos principal to be used for HTTP endpoint. |
| The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification. |
| |
| The =kerberos.keytab= indicates the location of the keytab file with the credentials for the principal. |
| It should be the same keytab file Oozie uses for its Kerberos credentials for Hadoop. |
| |
| ---+++ Oozie Hadoop Authentication Configuration |
| |
| Oozie works with Hadoop versions which support Kerberos authentication. |
| |
| Oozie Hadoop authentication is configured using the following configuration properties (default values shown): |
| |
| <verbatim> |
| oozie.service.HadoopAccessorService.kerberos.enabled=false |
| local.realm=LOCALHOST |
| oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab |
| oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm} |
| </verbatim> |
| |
| The above default values are for a Hadoop 0.20 secure distribution (with support for Kerberos authentication). |
| |
| To enable Kerberos authentication, the following property must be set: |
| |
| <verbatim> |
| oozie.service.HadoopAccessorService.kerberos.enabled=true |
| </verbatim> |
| |
| When using Kerberos authentication, the following properties must be set to the correct values (default values shown): |
| |
| <verbatim> |
| local.realm=LOCALHOST |
| oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab |
| oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm} |
| </verbatim> |
| |
| *IMPORTANT:* When using Oozie with a Hadoop 20 with Security distribution, the Oozie user in Hadoop must be configured |
| as a proxy user. |
| |
| ---+++ User ProxyUser Configuration |
| |
| Oozie supports impersonation or proxyuser functionality (identical to Hadoop proxyuser capabilities and conceptually |
| similar to Unix 'sudo'). |
| |
| Proxyuser enables other systems that are Oozie clients to submit jobs on behalf of other users. |
| |
| Because proxyuser is a powerful capability, Oozie provides the following restriction capabilities |
| (similar to Hadoop): |
| |
| * Proxyuser is an explicit configuration on per proxyuser user basis. |
| * A proxyuser user can be restricted to impersonate other users from a set of hosts. |
| * A proxyuser user can be restricted to impersonate users belonging to a set of groups. |
| |
| There are 2 configuration properties needed to set up a proxyuser: |
| |
| * oozie.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where the user #USER# can impersonate other users. |
| * oozie.service.ProxyUserService.proxyuser.#USER#.groups: groups the users being impersonated by user #USER# must belong to. |
| |
| Both properties support the '*' wildcard as value. Although this is recommended only for testing/development. |
| |
| ---+++ User Authorization Configuration |
| |
| Oozie has a basic authorization model: |
| |
| * Users have read access to all jobs |
| * Users have write access to their own jobs |
| * Users have write access to jobs based on an Access Control List (list of users and groups) |
| * Users have read access to admin operations |
| * Admin users have write access to all jobs |
| * Admin users have write access to admin operations |
| |
| If security is disabled all users are admin users. |
| |
| Oozie security is set via the following configuration property (default value shown): |
| |
| <verbatim> |
| oozie.service.AuthorizationService.security.enabled=false |
| </verbatim> |
| |
| NOTE: the old ACL model where a group was provided is still supported if the following property is set |
| in =oozie-site.xml=: |
| |
| <verbatim> |
| oozie.service.AuthorizationService.default.group.as.acl=true |
| </verbatim> |
| |
| Admin users are determined from the list of admin groups, specified in |
| =oozie.service.AuthorizationService.admin.groups= property. Use commas to separate multiple groups, spaces, tabs |
| and ENTER characters are trimmed. |
| |
| If the above property for admin groups is not set, then the admin users are the users specified in the |
| =conf/adminusers.txt= file. The syntax of this file is: |
| |
| * One user name per line |
| * Empty lines and lines starting with '#' are ignored |
| |
| ---+++ Oozie System ID Configuration |
| |
| Oozie has a system ID that is is used to generate the Oozie temporary runtime directory, the workflow job IDs, and the |
| workflow action IDs. |
| |
| Two Oozie systems running with the same ID will not have any conflict but in case of troubleshooting it will be easier |
| to identify resources created/used by the different Oozie systems if they have different system IDs (default value |
| shown): |
| |
| <verbatim> |
| oozie.system.id=oozie-${user.name} |
| </verbatim> |
| |
| ---+++ Filesystem Configuration |
| |
| Oozie lets you to configure the allowed Filesystems by using the following configuration property in oozie-site.xml: |
| <verbatim> |
| <property> |
| <name>oozie.service.HadoopAccessorService.supported.filesystems</name> |
| <value>hdfs</value> |
| </property> |
| </verbatim> |
| |
| The above value, =hdfs=, which is the default, means that Oozie will only allow HDFS filesystems to be used. Examples of other |
| filesystems that Oozie is compatible with are: hdfs, hftp, webhdfs, and viewfs. Multiple filesystems can be specified as |
| comma-separated values. Putting a * will allow any filesystem type, effectively disabling this check. |
| |
| ---+++ HCatalog Configuration |
| |
| Refer to the [[DG_HCatalogIntegration][Oozie HCatalog Integration]] document for a overview of HCatalog and |
| integration of Oozie with HCatalog. This section explains the various settings to be configured in oozie-site.xml on |
| the Oozie server to enable Oozie to work with HCatalog. |
| |
| *Adding HCatalog jars to Oozie war:* |
| |
| For Oozie server to talk to HCatalog server, HCatalog and hive jars need to be in the server classpath. |
| hive-site.xml which has the configuration to talk to the HCatalog server also needs to be in the classpath or specified by the |
| following configuration property in oozie-site.xml: |
| <verbatim> |
| <property> |
| <name>oozie.service.HCatAccessorService.hcat.configuration</name> |
| <value>/local/filesystem/path/to/hive-site.xml</value> |
| </property> |
| </verbatim> |
| The hive-site.xml can also be placed in a location on HDFS and the above property can have a value |
| of =hdfs://HOST:PORT/path/to/hive-site.xml= to point there instead of the local file system. |
| |
| The oozie-[version]-hcataloglibs.tar.gz in the oozie distribution bundles the required hcatalog and hive jars that |
| needs to be placed in the Oozie server classpath. If using a version of HCatalog bundled in |
| Oozie hcataloglibs/, copy the corresponding HCatalog jars from hcataloglibs/ to the libext/ directory. If using a |
| different version of HCatalog, copy the required HCatalog jars from such version in the libext/ directory. |
| This needs to be done before running the =oozie-setup.sh= script so that these jars get added to the Oozie WAR file. |
| |
| *Configure HCatalog URI Handling:* |
| |
| <verbatim> |
| <property> |
| <name>oozie.service.URIHandlerService.uri.handlers</name> |
| <value>org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler</value> |
| <description> |
| Enlist the different uri handlers supported for data availability checks. |
| </description> |
| </property> |
| </verbatim> |
| |
| The above configuration defines the different uri handlers which check for existence of data dependencies defined in a |
| Coordinator. The default value is =org.apache.oozie.dependency.FSURIHandler=. FSURIHandler supports uris with |
| schemes defined in the configuration =oozie.service.HadoopAccessorService.supported.filesystems= which are hdfs, hftp |
| and webhcat by default. HCatURIHandler supports uris with the scheme as hcat. |
| |
| *Configure HCatalog services:* |
| |
| <verbatim> |
| <property> |
| <name>oozie.services.ext</name> |
| <value> |
| org.apache.oozie.service.JMSAccessorService, |
| org.apache.oozie.service.PartitionDependencyManagerService, |
| org.apache.oozie.service.HCatAccessorService |
| </value> |
| <description> |
| To add/replace services defined in 'oozie.services' with custom implementations. |
| Class names must be separated by commas. |
| </description> |
| </property> |
| </verbatim> |
| |
| PartitionDependencyManagerService and HCatAccessorService are required to work with HCatalog and support Coordinators |
| having HCatalog uris as data dependency. If the HCatalog server is configured to publish partition availability |
| notifications to a JMS compliant messaging provider like ActiveMQ, then JMSAccessorService needs to be added |
| to =oozie.services.ext= to handle those notifications. |
| |
| *Configure JMS Provider JNDI connection mapping for HCatalog:* |
| |
| <verbatim> |
| <property> |
| <name>oozie.service.HCatAccessorService.jmsconnections</name> |
| <value> |
| hcat://hcatserver.colo1.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.colo1.com:61616, |
| default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://broker.colo.com:61616;connectionFactoryNames#ConnectionFactory |
| </value> |
| <description> |
| Specify the map of endpoints to JMS configuration properties. In general, endpoint |
| identifies the HCatalog server URL. "default" is used if no endpoint is mentioned |
| in the query. If some JMS property is not defined, the system will use the property |
| defined jndi.properties. jndi.properties files is retrieved from the application classpath. |
| Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers. |
| hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616 |
| </description> |
| </property> |
| </verbatim> |
| |
| Currently HCatalog does not provide APIs to get the connection details to connect to the JMS Provider it publishes |
| notifications to. It only has APIs which provide the topic name in the JMS Provider to which the notifications are |
| published for a given database table. So the JMS Provider's connection properties needs to be manually configured |
| in Oozie using the above setting. You can either provide a =default= JNDI configuration which will be used as the |
| JMS Provider for all HCatalog servers, or can specify a configuration per HCatalog server URL or provide a |
| configuration based on a rule matching multiple HCatalog server URLs. For example: With the configuration of |
| hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616, |
| request URL of hcat://server1.colo1.com:8020 will map to tcp://broker.colo1.com:61616, hcat://server2.colo2.com:8020 |
| will map to tcp://broker.colo2.com:61616 and so on. |
| |
| *Configure HCatalog Polling Frequency:* |
| |
| <verbatim> |
| <property> |
| <name>oozie.service.coord.push.check.requeue.interval |
| </name> |
| <value>600000</value> |
| <description>Command re-queue interval for push dependencies (in millisecond). |
| </description> |
| </property> |
| </verbatim> |
| |
| If there is no JMS Provider configured for a HCatalog Server, then oozie polls HCatalog based on the frequency defined |
| in =oozie.service.coord.input.check.requeue.interval=. This config also applies to HDFS polling. |
| If there is a JMS provider configured for a HCatalog Server, then oozie polls HCatalog based on the frequency defined |
| in =oozie.service.coord.push.check.requeue.interval= as a fallback. |
| The defaults for =oozie.service.coord.input.check.requeue.interval= and =oozie.service.coord.push.check.requeue.interval= |
| are 1 minute and 10 minutes respectively. |
| |
| ---+++ Notifications Configuration |
| |
| Oozie supports publishing notifications to a JMS Provider for job status changes and SLA met and miss events. For |
| more information on the feature, refer [[DG_JMSNotifications][JMS Notifications]] documentation. Oozie can also send email |
| notifications on SLA misses. |
| |
| * *Message Broker Installation*: <br/> |
| For Oozie to send/receive messages, a JMS-compliant broker should be installed. Apache ActiveMQ is a popular JMS-compliant |
| broker usable for this purpose. See [[http://activemq.apache.org/getting-started.html][here]] for instructions on |
| installing and running ActiveMQ. |
| |
| * *Services*: <br/> |
| Add/modify =oozie.services.ext= property in =oozie-site.xml= to include the following services. |
| <verbatim> |
| <property> |
| <name>oozie.services.ext</name> |
| <value> |
| org.apache.oozie.service.JMSAccessorService, |
| org.apache.oozie.service.JMSTopicService, |
| org.apache.oozie.service.EventHandlerService, |
| org.apache.oozie.sla.service.SLAService |
| </value> |
| </property> |
| </verbatim> |
| |
| * *Event Handlers*: <br/> |
| <verbatim> |
| <property> |
| <name>oozie.service.EventHandlerService.event.listeners</name> |
| <value> |
| org.apache.oozie.jms.JMSJobEventListener, |
| org.apache.oozie.sla.listener.SLAJobEventListener, |
| org.apache.oozie.jms.JMSSLAEventListener, |
| org.apache.oozie.sla.listener.SLAEmailEventListener |
| </value> |
| </property> |
| </verbatim> |
| It is also recommended to increase =oozie.service.SchedulerService.threads= to 15 for faster event processing and sending notifications. The services and their functions are as follows: <br/> |
| JMSJobEventListener - Sends JMS job notifications <br/> |
| JMSSLAEventListener - Sends JMS SLA notifications <br/> |
| SLAEmailEventListener - Sends Email SLA notifications <br/> |
| SLAJobEventListener - Processes job events and calculates SLA. Does not send any notifications |
| * *JMS properties*: <br/> |
| Add =oozie.jms.producer.connection.properties= property in =oozie-site.xml=. Its value corresponds to an |
| identifier (e.g. default) assigned to a semi-colon separated key#value list of properties from your JMS broker's |
| =jndi.properties= file. The important properties are =java.naming.factory.initial= and =java.naming.provider.url=. |
| |
| As an example, if using ActiveMQ in local env, the property can be set to |
| <verbatim> |
| <property> |
| <name>oozie.jms.producer.connection.properties</name> |
| <value> |
| java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory |
| </value> |
| </property> |
| </verbatim> |
| * *JMS Topic name*: <br/> |
| JMS consumers listen on a particular "topic". Hence Oozie needs to define a topic variable with which to publish messages |
| about the various jobs. |
| <verbatim> |
| <property> |
| <name>oozie.service.JMSTopicService.topic.name</name> |
| <value> |
| default=${username} |
| </value> |
| <description> |
| Topic options are ${username}, ${jobId}, or a fixed string which can be specified as default or for a |
| particular job type. |
| For e.g To have a fixed string topic for workflows, coordinators and bundles, |
| specify in the following comma-separated format: {jobtype1}={some_string1}, {jobtype2}={some_string2} |
| where job type can be WORKFLOW, COORDINATOR or BUNDLE. |
| Following example defines topic for workflow job, workflow action, coordinator job, coordinator action, |
| bundle job and bundle action |
| WORKFLOW=workflow, |
| COORDINATOR=coordinator, |
| BUNDLE=bundle |
| For jobs with no defined topic, default topic will be ${username} |
| </description> |
| </property> |
| </verbatim> |
| |
| Another related property is the topic prefix. |
| <verbatim> |
| <property> |
| <name>oozie.service.JMSTopicService.topic.prefix</name> |
| <value></value> |
| <description> |
| This can be used to append a prefix to the topic in oozie.service.JMSTopicService.topic.name. For eg: oozie. |
| </description> |
| </property> |
| </verbatim> |
| |
| |
| ---+++ Setting Up Oozie with HTTPS (SSL) |
| |
| *IMPORTANT*: |
| The default HTTPS configuration will cause all Oozie URLs to use HTTPS except for the JobTracker callback URLs. This is to simplify |
| configuration (no changes needed outside of Oozie), but this is okay because Oozie doesn't inherently trust the callbacks anyway; |
| they are used as hints. |
| |
| The related environment variables are explained at [[AG_Install#Environment_Setup][Environment Setup]]. |
| |
| You can use either a certificate from a Certificate Authority or a Self-Signed Certificate. Using a self-signed certificate |
| requires some additional configuration on each Oozie client machine. If possible, a certificate from a Certificate Authority is |
| recommended because it's simpler to configure. |
| |
| There's also some additional considerations when using Oozie HA with HTTPS. |
| |
| ---++++To use a Self-Signed Certificate |
| There are many ways to create a Self-Signed Certificate, this is just one way. We will be using |
| the [[http://docs.oracle.com/javase/6/docs/technotes/tools/solaris/keytool.html][keytool]] program, which is |
| included with your JRE. If it's not on your path, you should be able to find it in $JAVA_HOME/bin. |
| |
| 1. Run the following command (as the Oozie user) to create the keystore file, which will be named =.keystore= and located in the |
| Oozie user's home directory. |
| <verbatim> |
| keytool -genkeypair -alias tomcat -keyalg RSA -dname "CN=hostname" -storepass password -keypass password |
| </verbatim> |
| The =hostname= should be the host name of the Oozie Server or a wildcard on the subdomain it belongs to. Make sure to include |
| the "CN=" part. You can change =storepass= and =keypass= values, but they should be the same. If you do want to use something |
| other than password, you'll also need to change the =OOZIE_HTTPS_KEYSTORE_PASS= environment variable in oozie-env.sh to |
| match; =password= is the default. |
| |
| For example, if your Oozie server was at oozie.int.example.com, then you would do this: |
| <verbatim> |
| keytool -genkeypair -alias tomcat -keyalg RSA -dname "CN=oozie.int.example.com" -storepass password -keypass password |
| </verbatim> |
| If you're going to be using Oozie HA, it's simplest if you have a single certificate that all Oozie servers in the HA group can use. |
| To do that, you'll need to use a wildcard on the subdomain it belongs to: |
| <verbatim> |
| keytool -genkeypair -alias tomcat -keyalg RSA -dname "CN=*.int.example.com" -storepass password -keypass password |
| </verbatim> |
| The above would work on any server in the int.example.com domain. |
| |
| 2. Run the following command (as the Oozie user) to export a certificate file from the keystore file: |
| <verbatim> |
| keytool -exportcert -alias tomcat -file path/to/anywhere/certificate.cert -storepass password |
| </verbatim> |
| |
| 3. Run the following command (as any user) to create a truststore containing the certificate we just exported: |
| <verbatim> |
| keytool -import -alias tomcat -file path/to/certificate.cert -keystore /path/to/anywhere/oozie.truststore -storepass password2 |
| </verbatim> |
| You'll need the =oozie.truststore= later if you're using the Oozie client (or other Java-based client); otherwise, you can skip |
| this step. The =storepass= value here is only used to verify or change the truststore and isn't typically required when only |
| reading from it; so it does not have to be given to users only using the client. |
| |
| ---++++To use a Certificate from a Certificate Authority |
| |
| 1. You will need to make a request to a Certificate Authority in order to obtain a proper Certificate; please consult a Certificate |
| Authority on this procedure. If you're going to be using Oozie HA, it's simplest if you have a single certificate that all Oozie |
| servers in the HA group can use. To do that, you'll need to use a wild on the subdomain it belongs to (e.g. "*.int.example.com"). |
| |
| 2. Once you have your .cert file, run the following command (as the Oozie user) to create a keystore file from your certificate: |
| <verbatim> |
| keytool -import -alias tomcat -file path/to/certificate.cert |
| </verbatim> |
| The keystore file will be named =.keystore= and located in the Oozie user's home directory. |
| |
| ---++++Configure the Oozie Server to use SSL (HTTPS) |
| |
| 1. Make sure the Oozie server isn't running |
| |
| 2. Run the following command (as the Oozie user): |
| <verbatim> |
| oozie-setup.sh prepare-war -secure |
| </verbatim> |
| This will configure Oozie to use HTTPS instead of HTTP. To revert back to HTTP, simply rerun the command without =-secure=. |
| |
| 3. Start the Oozie server |
| |
| *Note:* If using Oozie HA, make sure that each Oozie server has a copy of the .keystore file. |
| |
| ---++++Configure the Oozie Client to connect using SSL (HTTPS) |
| |
| The first two steps are only necessary if you are using a Self-Signed Certificate; the third is required either way. |
| Also, these steps must be done on every machine where you intend to use the Oozie Client. |
| |
| 1. Copy or download the oozie.truststore file onto the client machine |
| |
| 2. When using any Java-based program, you'll need to pass =-Djavax.net.ssl.trustStore= to the JVM. To |
| do this for the Oozie client: |
| <verbatim> |
| export OOZIE_CLIENT_OPTS='-Djavax.net.ssl.trustStore=/path/to/oozie.truststore' |
| </verbatim> |
| |
| 3. When using the Oozie Client, you will need to use https://oozie.server.hostname:11443/oozie instead of |
| http://oozie.server.hostname:11000/oozie -- Java will not automatically redirect from the http address to the https address. |
| |
| ---++++Connect to the Oozie Web UI using SSL (HTTPS) |
| |
| 1. Use https://oozie.server.hostname:11443/oozie |
| though most browsers should automatically redirect you if you use http://oozie.server.hostname:11000/oozie |
| |
| *IMPORTANT*: If using a Self-Signed Certificate, your browser will warn you that it can't verify the certificate or something |
| similar. You will probably have to add your certificate as an exception. |
| |
| ---++++Additional considerations for Oozie HA with SSL |
| |
| You'll need to configure the load balancer to do SSL pass-through. This will allow the clients talking to Oozie to use the |
| SSL certificate provided by the Oozie servers (so the load balancer does not need one). Please consult your load balancer's |
| documentation on how to configure this. Make sure to point the load balancer at the https://HOST:HTTPS_PORT addresses for your |
| Oozie servers. Clients can then connect to the load balancer at https://LOAD_BALANCER_HOST:PORT. |
| |
| *Important:* Callbacks from the JobTracker/ResourceManager are done via http or https depending on what you enter for the |
| =OOZIE_BASE_URL= property. If you are using a Certificate from a Certificate Authority, you can simply put the https address here. |
| If you are using a self-signed certificate, you have to do one of the following options (Option 1 is recommended): |
| |
| Option 1) You'll need to follow the steps in |
| the [[AG_Install#Configure_the_Oozie_Client_to_connect_using_SSL_HTTPS][Configure the Oozie Client to connect using SSL (HTTPS)]] |
| section, but on the host of the JobTracker/ResourceManager. You can then set =OOZIE_BASE_URL= to the load balancer https address. |
| This will allow the JobTracker/ResourceManager to contact the Oozie server with https (like the Oozie client, they are also Java |
| programs). |
| |
| Option 2) You'll need setup another load balancer, or another "pool" on the existing load balancer, with the http addresses of the |
| Oozie servers. You can then set =OOZIE_BASE_URL= to the load balancer http address. Clients should use the https load balancer |
| address. This will allow clients to use https while the JobTracker/ResourceManager uses http for callbacks. |
| |
| ---+++ Fine Tuning an Oozie Server |
| |
| Refer to the [[./oozie-default.xml][oozie-default.xml]] for details. |
| |
| ---+++ Using Metrics instead of Instrumentation |
| |
| As of version 4.1.0, Oozie includes a replacement for the Instrumentation based on Codahale's Metrics library. It includes a |
| number of improvements over the original Instrumentation included in Oozie. They both report most of the same information, though |
| the formatting is slightly different and there's some additional information in the Metrics version; the format of the output to the |
| oozie-instrumentation log is also different. The Metrics version can be enabled by adding the =MetricsInstrumentationService= to |
| the list of services: |
| <verbatim> |
| <property> |
| <name>oozie.services.ext</name> |
| <value> |
| org.apache.oozie.service.MetricsInstrumentationService |
| </value> |
| </property> |
| </verbatim> |
| |
| Once enabled, the =admin/instrumentation= REST endpoint will no longer be available and instead the =admin/metrics= endpoint should |
| be used (see the [[WebServicesAPI#Oozie_Metrics][Web Services API]] documentation for more details); the Oozie Web UI will also |
| replace the "Instrumentation" tab with a "Metrics" tab. |
| |
| We can also publish the instrumentation metrics to the external server graphite or ganglia. For this the following |
| properties should be specified in oozie-site.xml : |
| <verbatim> |
| <property> |
| <name>oozie.external_monitoring.enable</name> |
| <value>false</value> |
| <description> |
| If the oozie functional metrics needs to be exposed to the metrics-server backend, set it to true |
| If set to true, the following properties has to be specified : oozie.metrics.server.name, |
| oozie.metrics.host, oozie.metrics.prefix, oozie.metrics.report.interval.sec, oozie.metrics.port |
| </description> |
| </property> |
| |
| <property> |
| <name>oozie.external_monitoring.type</name> |
| <value>graphite</value> |
| <description> |
| The name of the server to which we want to send the metrics, would be graphite or ganglia. |
| </description> |
| </property> |
| |
| <property> |
| <name>oozie.external_monitoring.address</name> |
| <value>http://localhost:2020</value> |
| </property> |
| |
| <property> |
| <name>oozie.external_monitoring.metricPrefix</name> |
| <value>oozie</value> |
| </property> |
| |
| <property> |
| <name>oozie.external_monitoring.reporterIntervalSecs</name> |
| <value>60</value> |
| </property> |
| </verbatim> |
| |
| We can also publish the instrumentation metrics via JMX interface. For this the following property should be specified |
| in oozie-site.xml : |
| <verbatim> |
| <property> |
| <name>oozie.jmx_monitoring.enable</name> |
| <value>false</value> |
| <description> |
| If the oozie functional metrics needs to be exposed via JMX interface, set it to true. |
| </description> |
| </property>> |
| </verbatim> |
| |
| #HA |
| ---+++ High Availability (HA) |
| |
| Multiple Oozie Servers can be configured against the same database to provide High Availability (HA) of the Oozie service. |
| |
| ---++++ Pre-requisites |
| |
| 1. A database that supports multiple concurrent connections. In order to have full HA, the database should also have HA support, or |
| it becomes a single point of failure. |
| |
| *NOTE:* The default derby database does not support this |
| |
| 2. A ZooKeeper ensemble. |
| |
| Apache ZooKeeper is a distributed, open-source coordination service for distributed applications; the Oozie servers use it for |
| coordinating access to the database and communicating with each other. In order to have full HA, there should be at least 3 |
| ZooKeeper servers. |
| More information on ZooKeeper can be found [[http://zookeeper.apache.org][here]]. |
| |
| 3. Multiple Oozie servers. |
| |
| *IMPORTANT:* While not strictly required for all configuration properties, all of the servers should ideally have exactly the same |
| configuration for consistency's sake. |
| |
| 4. A Loadbalancer, Virtual IP, or Round-Robin DNS. |
| |
| This is used to provide a single entry-point for users and for callbacks from the JobTracker/ResourceManager. The load balancer |
| should be configured for round-robin between the Oozie servers to distribute the requests. Users (using either the Oozie client, a |
| web browser, or the REST API) should connect through the load balancer. In order to have full HA, the load balancer should also |
| have HA support, or it becomes a single point of failure. |
| |
| ---++++ Installation/Configuration Steps |
| |
| 1. Install identically configured Oozie servers normally. Make sure they are all configured against the same database and make sure |
| that you DO NOT start them yet. |
| |
| 2. Add the following services to the extension services configuration property in oozie-site.xml in all Oozie servers. This will |
| make Oozie use the ZooKeeper versions of these services instead of the default implementations. |
| |
| <verbatim> |
| <property> |
| <name>oozie.services.ext</name> |
| <value> |
| org.apache.oozie.service.ZKLocksService, |
| org.apache.oozie.service.ZKXLogStreamingService, |
| org.apache.oozie.service.ZKJobsConcurrencyService, |
| org.apache.oozie.service.ZKUUIDService |
| </value> |
| </property> |
| </verbatim> |
| |
| 3. Add the following property to oozie-site.xml in all Oozie servers. It should be a comma-separated list of host:port pairs of the |
| ZooKeeper servers. The default value is shown below. |
| |
| <verbatim> |
| <property> |
| <name>oozie.zookeeper.connection.string</name> |
| <value>localhost:2181</value> |
| </property> |
| </verbatim> |
| |
| 4. (Optional) Add the following property to oozie-site.xml in all Oozie servers to specify the namespace to use. All of the Oozie |
| Servers that are planning on talking to each other should have the same namespace. If there are multiple Oozie setups each doing |
| their own HA, they should have their own namespace. The default value is shown below. |
| |
| <verbatim> |
| <property> |
| <name>oozie.zookeeper.namespace</name> |
| <value>oozie</value> |
| </property> |
| </verbatim> |
| |
| 5. Change the value of =OOZIE_BASE_URL= in oozie-site.xml to point to the loadbalancer or virtual IP, for example: |
| |
| <verbatim> |
| <property> |
| <name>oozie.base.url</name> |
| <value>http://my.loadbalancer.hostname:11000/oozie</value> |
| </property> |
| </verbatim> |
| |
| 6. (Optional) Add the following property to oozie-site.xml in all Oozie servers to specify the each host instance id. |
| Each Oozie server in HA should have its own unique instance id. The default is =${OOZIE_HTTP_HOSTNAME}= (i.e. the hostname). |
| |
| <verbatim> |
| <property> |
| <name>oozie.instance.id</name> |
| <value>hostname</value> |
| </property> |
| </verbatim> |
| |
| 7. (Optional) If using a secure cluster, see [[AG_Install#Security][Security]] below on configuring Kerberos with Oozie HA. |
| |
| 8. Start the ZooKeeper servers. |
| |
| 9. Start the Oozie servers. |
| |
| Note: If one of the Oozie servers becomes unavailable, querying Oozie for the logs from a job in the Web UI, REST API, or client may |
| be missing information until that server comes back up. |
| |
| ---++++ Security |
| |
| Oozie HA works with the existing Oozie security framework and settings. For HA features (log streaming, share lib, etc) to work |
| properly in a secure setup, following property can be set on each server. If =oozie.server.authentication.type= is not set, then |
| server-server authentication will fall back on =oozie.authentication.type=. |
| |
| <verbatim> |
| <property> |
| <name>oozie.server.authentication.type</name> |
| <value>kerberos</value> |
| </property> |
| </verbatim> |
| |
| Below are some additional steps and information specific to Oozie HA: |
| |
| 1. (Optional) To prevent unauthorized users or programs from interacting with or reading the znodes used by Oozie in ZooKeeper, |
| you can tell Oozie to use Kerberos-backed ACLs. To enforce this for all of the Oozie-related znodes, simply add the following |
| property to oozie-site.xml in all Oozie servers and set it to =true=. The default is =false=. |
| |
| <verbatim> |
| <property> |
| <name>oozie.zookeeper.secure</name> |
| <value>true</value> |
| </property> |
| </verbatim> |
| |
| Note: The Kerberos principals of each of the Oozie servers should have the same primary name (i.e. in =primary/instance@REALM=, each |
| server should have the same value for =primary=). |
| |
| *Important:* Once this property is set to =true=, it will set the ACLs on all existing Oozie-related znodes to only allow Kerberos |
| authenticated users with a principal that has the same primary as described above (also for any subsequently created new znodes). |
| This means that if you ever want to turn this feature off, you will have to manually connect to ZooKeeper using a Kerberos principal |
| with the same primary and either delete all znodes under and including the namespace (i.e. if =oozie.zookeeper.namespace= = =oozie= |
| then that would be =/oozie=); alternatively, instead of deleting them all, you can manually set all of their ACLs to =world:anyone=. |
| In either case, make sure that no Oozie servers are running while this is being done. |
| |
| Also, in your zoo.cfg for ZooKeeper, make sure to set the following properties: |
| <verbatim> |
| authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider |
| kerberos.removeHostFromPrincipal=true |
| kerberos.removeRealmFromPrincipal=true |
| </verbatim> |
| |
| 2. Until Hadoop 2.5.0 and later, there is a known limitation where each Oozie server can only use one HTTP principal. However, |
| for Oozie HA, we need to use two HTTP principals: =HTTP/oozie-server-host@realm= and =HTTP/load-balancer-host@realm=. This |
| allows access to each Oozie server directly and through the load balancer. While users should always go through the load balancer, |
| certain features (e.g. log streaming) require the Oozie servers to talk to each other directly; it can also be helpful for an |
| administrator to talk directly to an Oozie server. So, if using a Hadoop version prior to 2.5.0, you will have to choose which |
| HTTP principal to use as you cannot use both; it is recommended to choose =HTTP/load-balancer-host@realm= so users can connect |
| through the load balancer. This will prevent Oozie servers from talking to each other directly, which will effectively disable |
| log streaming. |
| |
| For Hadoop 2.5.0 and later: |
| |
| 2a. When creating the keytab used by Oozie, make sure to include Oozie's principal and the two HTTP principals mentioned above. |
| |
| 2b. Set =oozie.authentication.kerberos.principal= to * (that is, an asterisks) so it will use both HTTP principals. |
| |
| For earlier versions of Hadoop: |
| |
| 2a. When creating the keytab used by Oozie, make sure to include Oozie's principal and the load balancer HTTP principal |
| |
| 2b. Set =oozie.authentication.kerberos.principal= to =HTTP/load-balancer-host@realm=. |
| |
| 3. With Hadoop 2.6.0 and later, a rolling random secret that is synchronized across all Oozie servers will be used for signing the |
| Oozie auth tokens. This is done automatically when HA is enabled; no additional configuration is needed. |
| |
| For earlier versions of Hadoop, each server will have a different random secret. This will still work but will likely result in |
| additional calls to the KDC to authenticate users to the Oozie server (because the auth tokens will not be accepted by other |
| servers, which will cause a fallback to Kerberos). |
| |
| 4. If you'd like to use HTTPS (SSL) with Oozie HA, there's some additional considerations that need to be made. |
| See the [[AG_Install#Setting_Up_Oozie_with_HTTPS_SSL][Setting Up Oozie with HTTPS (SSL)]] section for more information. |
| |
| ---++++ JobId sequence |
| Oozie in HA mode, uses ZK to generate job id sequence. Job Ids are of following format. |
| <Id sequence>-<yyMMddHHmmss(server start time)>-<system_id>-<W/C/B> |
| |
| Where, <systemId> is configured as =oozie.system.id= (default is "oozie-" + "user.name") |
| W/C/B is suffix to job id indicating that generated job is a type of workflow or coordinator or bundle. |
| |
| Maximum allowed character for job id sequence is 40. "Id sequence" is stored in ZK and reset to 0 once maximum job id sequence is |
| reached. Maximum job id sequence is configured as =oozie.service.ZKUUIDService.jobid.sequence.max=, default value is 99999999990. |
| |
| <verbatim> |
| <property> |
| <name>oozie.service.ZKUUIDService.jobid.sequence.max</name> |
| <value>99999999990</value> |
| </property> |
| </verbatim> |
| |
| ---++ Starting and Stopping Oozie |
| |
| Use the standard Tomcat commands to start and stop Oozie. |
| |
| ---++ Oozie Command Line Installation |
| |
| Copy and expand the =oozie-client= TAR.GZ file bundled with the distribution. Add the =bin/= directory to the =PATH=. |
| |
| Refer to the [[DG_CommandLineTool][Command Line Interface Utilities]] document for a full reference of the =oozie= |
| command line tool. |
| |
| ---++ Oozie Share Lib |
| |
| The Oozie sharelib TAR.GZ file bundled with the distribution contains the necessary files to run Oozie map-reduce streaming, pig, |
| hive, sqooop, and distcp actions. There is also a sharelib for HCatalog. The sharelib is required for these actions to work; any |
| other actions (mapreduce, shell, ssh, and java) do not require the sharelib to be installed. |
| |
| As of Oozie 4.0, the following property is included. If true, Oozie will create and ship a "launcher jar" to hdfs that contains |
| classes necessary for the launcher job. If false, Oozie will not do this, and it is assumed that the necessary classes are in their |
| respective sharelib jars or the "oozie" sharelib instead. When false, the sharelib is required for ALL actions; when true, the |
| sharelib is only required for actions that need additional jars (the original list from above). |
| |
| <verbatim> |
| <property> |
| <name>oozie.action.ship.launcher.jar</name> |
| <value>true</value> |
| </property> |
| </verbatim> |
| |
| Using sharelib CLI, sharelib files are copied to new lib_<timestamped> directory. At start, server picks the sharelib from latest |
| time-stamp directory. While starting, server also purges sharelib directory which are older than sharelib retention days |
| (defined as oozie.service.ShareLibService.temp.sharelib.retention.days and 7 days is default). |
| |
| Sharelib mapping file can be also configured. Configured file is a key value mapping, where key will be the sharelib name for the |
| action and value is a comma separated list of DFS directories or jar files. |
| This can be configured in oozie-site.xml as : |
| <verbatim> |
| <!-- OOZIE --> |
| <property> |
| <name>oozie.service.ShareLibService.mapping.file</name> |
| <value></value> |
| <description> |
| Sharelib mapping files contains list of key=value, |
| where key will be the sharelib name for the action and value is a comma separated list of |
| DFS directories or jar files. |
| Example. |
| oozie.pig_10=hdfs:///share/lib/pig/pig-0.10.1/lib/ |
| oozie.pig=hdfs:///share/lib/pig/pig-0.11.1/lib/ |
| oozie.distcp=hdfs:///share/lib/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-distcp-2.2.0.jar |
| oozie.spark=hdfs:///share/lib/spark/lib/,hdfs:///share/lib/spark/python/lib/pyspark.zip,hdfs:///share/lib/spark/python/lib/py4j-0-9-src.zip |
| </description> |
| </property> |
| </verbatim> |
| |
| Oozie sharelib TAR.GZ file bundled with the distribution does not contain pyspark and py4j zip files since they vary |
| with Apache Spark version. Therefore, to run pySpark using Spark Action, user need to specify pyspark and py4j zip |
| files. These files can be added either to workflow's lib/ directory, to the sharelib or in sharelib mapping file. |
| |
| |
| ---++ Oozie Coordinators/Bundles Processing Timezone |
| |
| By default Oozie runs coordinator and bundle jobs using =UTC= timezone for datetime values specified in the application |
| XML and in the job parameter properties. This includes coordinator applications start and end times of jobs, coordinator |
| datasets initial-instance, and bundle applications kickoff times. In addition, coordinator dataset instance URI templates |
| will be resolved using datetime values of the Oozie processing timezone. |
| |
| It is possible to set the Oozie processing timezone to a timezone that is an offset of UTC, alternate timezones must |
| expressed in using a GMT offset ( =GMT+/-####= ). For example: =GMT+0530= (India timezone). |
| |
| To change the default =UTC= timezone, use the =oozie.processing.timezone= property in the =oozie-site.xml=. For example: |
| |
| <verbatim> |
| <configuration> |
| <property> |
| <name>oozie.processing.timezone</name> |
| <value>GMT+0530</value> |
| </property> |
| </configuration> |
| </verbatim> |
| |
| *IMPORTANT:* If using a processing timezone other than =UTC=, all datetime values in coordinator and bundle jobs must |
| be expressed in the corresponding timezone, for example =2012-08-08T12:42+0530=. |
| |
| *NOTE:* It is strongly encouraged to use =UTC=, the default Oozie processing timezone. |
| |
| For more details on using an alternate Oozie processing timezone, please refer to the |
| [[CoordinatorFunctionalSpec#datetime][Coordinator Functional Specification, section '4. Datetime']] |
| |
| #UberJar |
| ---++ MapReduce Workflow Uber Jars |
| For Map-Reduce jobs (not including streaming or pipes), additional jar files can also be included via an uber jar. An uber jar is a |
| jar file that contains additional jar files within a "lib" folder (see |
| [[WorkflowFunctionalSpec#AppDeployment][Workflow Functional Specification]] for more information). Submitting a workflow with an uber jar |
| requires at least Hadoop 2.2.0 or 1.2.0. As such, using uber jars in a workflow is disabled by default. To enable this feature, use |
| the =oozie.action.mapreduce.uber.jar.enable= property in the =oozie-site.xml= (and make sure to use a supported version of Hadoop). |
| |
| <verbatim> |
| <configuration> |
| <property> |
| <name>oozie.action.mapreduce.uber.jar.enable</name> |
| <value>true</value> |
| </property> |
| </configuration> |
| </verbatim> |
| |
| ---++ Advanced/Custom Environment Settings |
| |
| Oozie can be configured to use Unix standard filesystem hierarchy for its different files |
| (configuration, logs, data and temporary files). |
| |
| These settings must be done in the =bin/oozie-env.sh= script. |
| |
| This script is sourced before the configuration =oozie-env.sh= and supports additional |
| environment variables (shown with their default values): |
| |
| <verbatim> |
| export OOZIE_CONFIG=${OOZIE_HOME}/conf |
| export OOZIE_DATA={OOZIE_HOME}/data |
| export OOZIE_LOG={OOZIE_HOME}/logs |
| export CATALINA_BASE=${OOZIE_HOME}/oozie-server |
| export CATALINA_TMPDIR=${OOZIE_HOME}/oozie-server/temp |
| export CATALINA_OUT=${OOZIE_LOGS}/catalina.out |
| export CATALINA_PID=/tmp/oozie.pid |
| </verbatim> |
| |
| Sample values to make Oozie follow Unix standard filesystem hierarchy: |
| |
| <verbatim> |
| export OOZIE_CONFIG=/etc/oozie |
| export OOZIE_DATA=/var/lib/oozie |
| export OOZIE_LOG=/var/log/oozie |
| export CATALINA_BASE=${OOZIE_DATA}/oozie-server |
| export CATALINA_TMPDIR=/tmp |
| export CATALINA_PID=/tmp/oozie.pid |
| </verbatim> |
| |
| [[index][::Go back to Oozie Documentation Index::]] |
| |
| </noautolink> |