| <noautolink> |
| |
| [[index][::Go back to Oozie Documentation Index::]] |
| |
| ---+!! Building Oozie |
| |
| %TOC% |
| |
| ---++ System Requirements |
| |
| * Unix box (tested on Mac OS X and Linux) |
| * Java JDK 1.7+ |
| * [[http://maven.apache.org/][Maven 3.0.1+]] |
| * [[http://hadoop.apache.org/core/releases.html][Hadoop 0.20.2+]] |
| * [[http://hadoop.apache.org/pig/releases.html][Pig 0.7+]] |
| |
| JDK commands (java, javac) must be in the command path. |
| |
| The Maven command (mvn) must be in the command path. |
| |
| ---++ Oozie Documentation Generation |
| |
| To generate the documentation, Oozie uses a patched Doxia plugin for Maven with improved twiki support. |
| |
| The source of the modified plugin is available in the Oozie GitHub repository, in the =ydoxia= branch. |
| |
| To build and install it locally run the following command in the =ydoxia= branch: |
| |
| <verbatim> |
| $ mvn install |
| </verbatim> |
| |
| #SshSetup |
| ---++ Passphrase-less SSH Setup |
| |
| *NOTE: SSH actions are deprecated in Oozie 2.* |
| |
| To run SSH Testcases and for easier Hadoop start/stop configure SSH to localhost to be passphrase-less. |
| |
| Create your SSH keys without a passphrase and add the public key to the authorized file: |
| |
| <verbatim> |
| $ ssh-keygen -t dsa |
| $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys2 |
| </verbatim> |
| |
| Test that you can ssh without password: |
| |
| <verbatim> |
| $ ssh localhost |
| </verbatim> |
| |
| ---++ Building with different Java Versions |
| |
| Oozie requires a minimum Java version of 1.7. Any newer version can be used but by default bytecode will be generated |
| which is compatible with 1.7. This can be changed by specifying the build property *targetJavaVersion*. |
| |
| ---++ Building and Testing Oozie |
| |
| The JARs for the specified Hadoop and Pig versions must be available in one of the Maven repositories defined in Oozie |
| main 'pom.xml' file. Or they must be installed in the local Maven cache. |
| |
| ---+++ Examples Running Oozie Testcases with Different Configurations |
| |
| *Using embedded Hadoop minicluster with 'simple' authentication:* |
| |
| <verbatim> |
| $ mvn clean test |
| </verbatim> |
| |
| *Using a Hadoop cluster with 'simple' authentication:* |
| |
| <verbatim> |
| $ mvn clean test -Doozie.test.hadoop.minicluster=false |
| </verbatim> |
| |
| *Using embedded Hadoop minicluster with 'simple' authentication and Derby database:* |
| |
| <verbatim> |
| $ mvn clean test -Doozie.test.hadoop.minicluster=false -Doozie.test.db=derby |
| </verbatim> |
| |
| *Using a Hadoop cluster with 'kerberos' authentication:* |
| |
| <verbatim> |
| $ mvn clean test -Doozie.test.hadoop.minicluster=false -Doozie.test.hadoop.security=kerberos |
| </verbatim> |
| |
| NOTE: The embedded minicluster cannot be used when testing with 'kerberos' authentication. |
| |
| *Using a custom Oozie configuration for testcases:* |
| |
| <verbatim> |
| $ mvn clean test -Doozie.test.config.file=/home/tucu/custom-oozie-sitel.xml |
| </verbatim> |
| |
| *Running the testcases with different databases:* |
| |
| <verbatim> |
| $ mvn clean test -Doozie.test.db=[hsqldb*|derby|mysql|postgres|oracle] |
| </verbatim> |
| |
| Using =mysql= and =oracle= enables profiles that will include their JARs files in the build. If using |
| =oracle=, the Oracle JDBC JAR file must be manually installed in the local Maven cache (the JAR is |
| not available in public Maven repos). |
| |
| ---+++ Build Options Reference |
| |
| All these options can be set using *-D*. |
| |
| Except for the options marked with =(*)=, the options can be specified in the =test.properties= in the root |
| of the Oozie project. The options marked with =(*)= are used in Maven POMs, thus they don't take effect if |
| specified in the =test.properties= file (which is loaded by the =XTestCase= class at class initialization time). |
| |
| *hadoop.version* =(*)=: indicates the Hadoop version(Hadoop-1 or Hadoop-2) you wish to build Oozie against specifically. It will |
| substitute this value in the Oozie POM properties and pull the corresponding Hadoop artifacts from Maven. Default version is 1.2.1 |
| for Hadoop-1 (the most common case). For Hadoop-2, the version you can pass is *2.4.0*. |
| |
| *generateSite* (*): generates Oozie documentation, default is undefined (no documentation is generated) |
| |
| *skipTests* (*): skips the execution of all testcases, no value required, default is undefined |
| |
| *test*= (*): runs a single test case, to run a test give the test class name without package and extension, no default |
| |
| *oozie.test.db*= (*): indicates the database to use for running the testcases, supported values are 'hsqldb', 'derby', |
| 'mysql', 'postgres' and 'oracle'; default value is 'hsqldb'. For each database there is |
| =core/src/test/resources/DATABASE-oozie-site.xml= file preconfigured. |
| |
| *oozie.test.properties* (*): indicates the file to load the test properties from, by default is =test.properties=. |
| Having this option allows having different test properties sets, for example: minicluster, simple & kerberos. |
| |
| *oozie.test.waitfor.ratio*= : multiplication factor for testcases using waitfor, the ratio is used to adjust the |
| effective time out. For slow machines the ratio should be increased. The default value is =1=. |
| |
| *oozie.test.config.file*= : indicates a custom Oozie configuration file for running the testcases. The specified file |
| must be an absolute path. For example, it can be useful to specify different database than HSQL for running the |
| testcases. |
| |
| *oozie.test.hadoop.minicluster*= : indicates if Hadoop minicluster should be started for testcases, default value 'true' |
| |
| *oozie.test.job.tracker*= : indicates the URI of the JobTracker when using a Hadoop cluster for testing, default value |
| 'localhost:8021' |
| |
| *oozie.test.name.node*= : indicates the URI of the NameNode when using a Hadoop cluster for testing, default value |
| 'hdfs://localhost:8020' |
| |
| *oozie.test.hadoop.security*= : indicates the type of Hadoop authentication for testing, valid values are 'simple' or |
| 'kerberos, default value 'simple' |
| |
| *oozie.test.kerberos.keytab.file*= : indicates the location of the keytab file, default value |
| '${user.home}/oozie.keytab' |
| |
| *oozie.test.kerberos.realm*= : indicates the Kerberos real, default value 'LOCALHOST' |
| |
| *oozie.test.kerberos.oozie.principal*= : indicates the Kerberos principal for oozie, default value |
| '${user.name}/localhost' |
| |
| *oozie.test.kerberos.jobtracker.principal*= : indicates the Kerberos principal for the JobTracker, default value |
| 'mapred/localhost' |
| |
| *oozie.test.kerberos.namenode.principal*= : indicates the Kerberos principal for the NameNode, default value |
| 'hdfs/localhost' |
| |
| *oozie.test.user.oozie*= : specifies the user ID used to start Oozie server in testcases, default value |
| is =${user.name}=. |
| |
| *oozie.test.user.test*= : specifies primary user ID used as the user submitting jobs to Oozie Server in testcases, |
| default value is =test=. |
| |
| *oozie.test.user.test2*= : specifies secondary user ID used as the user submitting jobs to Oozie Server in testcases, |
| default value is =test2=. |
| |
| *oozie.test.user.test3*= : specifies secondary user ID used as the user submitting jobs to Oozie Server in testcases, |
| default value is =test3=. |
| |
| *oozie.test.group*= : specifies group ID used as group when submitting jobs to Oozie Server in testcases, |
| default value is =testg=. |
| |
| NOTE: The users/group specified in *oozie.test.user.test2*, *oozie.test.user.test3*= and *oozie.test.user.group*= |
| are used for the authorization testcases only. |
| |
| *oozie.test.dir*= : specifies the directory where the =oozietests= directory will be created, default value is =/tmp=. |
| The =oozietests= directory is used by testcases when they need a local filesystem directory. |
| |
| *hadoop.log.dir*= : specifies the directory where Hadoop minicluster will write its logs during testcases, default |
| value is =/tmp=. |
| |
| *test.exclude*= : specifies a testcase class (just the class name) to exclude for the tests run, for example =TestSubmitCommand=. |
| |
| *test.exclude.pattern*= : specifies one or more patterns for testcases to exclude, for example =**/Test*Command.java=. |
| |
| ---+++ Testing Map Reduce Pipes Action |
| |
| Pipes testcases require Hadoop's *wordcount-simple* pipes binary example to run. The *wordcount-simple* pipes binary |
| should be compiled for the build platform and copied into Oozie's *core/src/test/resources/* directory. The binary file |
| must be named *wordcount-simple*. |
| |
| If the *wordcount-simple* pipes binary file is not available the testcase will do a NOP and it will print to its output |
| file the following message 'SKIPPING TEST: TestPipesMain, binary 'wordcount-simple' not available in the classpath'. |
| |
| There are 2 testcases that use the *wordcount-simple* pipes binary, *TestPipesMain* and *TestMapReduceActionExecutor*, |
| the 'SKIPPING TEST..." message would appear in the testcase log file of both testcases. |
| |
| ---++ Building an Oozie Distribution |
| |
| An Oozie distribution bundles an embedded Tomcat server. The Oozie distro module downloads Tomcat TAR.GZ from Apache |
| once (in the =distro/downloads/= directory) and uses it when creating the distro. |
| |
| The simplest way to build Oozie is to run the =mkdistro.sh= script: |
| <verbatim> |
| $ bin/mkdistro.sh [-DskipTests] |
| Running =mkdistro.sh= will create the binary distribution of Oozie. The following options are available to customise |
| the versions of the dependencies: |
| -Puber - Bundle required hadoop and hcatalog libraries in oozie war |
| -P<profile> - default hadoop-2. Valid are hadoop-1, hadoop-2 or hadoop-3. Choose the correct hadoop |
| profile depending on the hadoop version used. |
| -Dhadoop.version=<version> - default 1.2.1 for hadoop-1, 2.4.0 for hadoop-2 and 3.0.0-SNAPSHOT for hadoop-3 |
| -Dhadoop.auth.version=<version> - defaults to hadoop version |
| -Ddistcp.version=<version> - defaults to hadoop version |
| -Dpig.version=<version> - default 0.16.0 |
| -Dpig.classifier=<classifier> - default none |
| -Dsqoop.version=<version> - default 1.4.3 |
| -Dsqoop.classifier=<classifier> - default hadoop100 |
| -Dtomcat.version=<version> - default 6.0.53 |
| -Dopenjpa.version=<version> - default 2.2.2 |
| -Dxerces.version=<version> - default 2.10.0 |
| -Dcurator.version=<version> - default 2.5.0 |
| -Dhive.version=<version> - default 0.13.1 for hadoop-1, 1.2.0 for hadoop-2 and hadoop-3 profile |
| -Dhbase.version=<version> - default 0.94.2 |
| </verbatim> |
| |
| The following properties should be specified when building a release: |
| |
| * -DgenerateDocs : forces the generation of Oozie documentation |
| * -Dbuild.time= : timestamps the distribution |
| * -Dvc.revision= : specifies the source control revision number of the distribution |
| * -Dvc.url= : specifies the source control URL of the distribution |
| |
| The provided <code>bin/mkdistro.sh</code> script runs the above Maven invocation setting all these properties to the |
| right values (the 'vc.*' properties are obtained from the local git repository). |
| |
| ---++ IDE Setup |
| |
| Eclipse and IntelliJ can use directly Oozie Maven project files. |
| |
| The only special consideration is that the following source directories from the =client= module must be added to |
| the =core= module source path: |
| |
| * =client/src/main/java= : as source directory |
| * =client/src/main/resources= : as source directory |
| * =client/src/test/java= : as test-source directory |
| * =client/src/test/resources= : as test-source directory |
| |
| [[index][::Go back to Oozie Documentation Index::]] |
| |
| </noautolink> |