| Build instructions for Hadoop |
| |
| ---------------------------------------------------------------------------------- |
| Requirements: |
| |
| * Unix System |
| * JDK 1.8+ |
| * Maven 3.0 or later |
| * Findbugs 1.3.9 (if running findbugs) |
| * ProtocolBuffer 2.5.0 |
| * CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac |
| * Zlib devel (if compiling native code) |
| * openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance) |
| * Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs) |
| * Internet connection for first build (to fetch all Maven and Hadoop dependencies) |
| * python (for releasedocs) |
| * bats (for shell code testing) |
| * Node.js / bower / Ember-cli (for YARN UI v2 building) |
| |
| ---------------------------------------------------------------------------------- |
| The easiest way to get an environment with all the appropriate tools is by means |
| of the provided Docker config. |
| This requires a recent version of docker (1.4.1 and higher are known to work). |
| |
| On Linux: |
| Install Docker and run this command: |
| |
| $ ./start-build-env.sh |
| |
| On Mac: |
| First make sure Virtualbox and docker toolbox are installed. |
| You can use docker toolbox as described in http://docs.docker.com/mac/step_one/. |
| $ docker-machine create --driver virtualbox \ |
| --virtualbox-memory "4096" hadoopdev |
| $ eval $(docker-machine env hadoopdev) |
| $ ./start-build-env.sh |
| |
| The prompt which is then presented is located at a mounted version of the source tree |
| and all required tools for testing and building have been installed and configured. |
| |
| Note that from within this docker environment you ONLY have access to the Hadoop source |
| tree from where you started. So if you need to run |
| dev-support/bin/test-patch /path/to/my.patch |
| then the patch must be placed inside the hadoop source tree. |
| |
| Known issues: |
| - On Mac with Boot2Docker the performance on the mounted directory is currently extremely slow. |
| This is a known problem related to boot2docker on the Mac. |
| See: |
| https://github.com/boot2docker/boot2docker/issues/593 |
| This issue has been resolved as a duplicate, and they point to a new feature for utilizing NFS mounts |
| as the proposed solution: |
| https://github.com/boot2docker/boot2docker/issues/64 |
| An alternative solution to this problem is to install Linux native inside a virtual machine |
| and run your IDE and Docker etc inside that VM. |
| |
| ---------------------------------------------------------------------------------- |
| Installing required packages for clean install of Ubuntu 14.04 LTS Desktop: |
| |
| * Oracle JDK 1.8 (preferred) |
| $ sudo apt-get purge openjdk* |
| $ sudo apt-get install software-properties-common |
| $ sudo add-apt-repository ppa:webupd8team/java |
| $ sudo apt-get update |
| $ sudo apt-get install oracle-java8-installer |
| * Maven |
| $ sudo apt-get -y install maven |
| * Native libraries |
| $ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev |
| * ProtocolBuffer 2.5.0 (required) |
| $ sudo apt-get -y install protobuf-compiler |
| |
| Optional packages: |
| |
| * Snappy compression |
| $ sudo apt-get install snappy libsnappy-dev |
| * Intel ISA-L library for erasure coding |
| Please refer to https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version |
| (OR https://github.com/01org/isa-l) |
| * Bzip2 |
| $ sudo apt-get install bzip2 libbz2-dev |
| * Jansson (C Library for JSON) |
| $ sudo apt-get install libjansson-dev |
| * Linux FUSE |
| $ sudo apt-get install fuse libfuse-dev |
| |
| ---------------------------------------------------------------------------------- |
| Maven main modules: |
| |
| hadoop (Main Hadoop project) |
| - hadoop-project (Parent POM for all Hadoop Maven modules. ) |
| (All plugins & dependencies versions are defined here.) |
| - hadoop-project-dist (Parent POM for modules that generate distributions.) |
| - hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs) |
| - hadoop-assemblies (Maven assemblies used by the different modules) |
| - hadoop-common-project (Hadoop Common) |
| - hadoop-hdfs-project (Hadoop HDFS) |
| - hadoop-mapreduce-project (Hadoop MapReduce) |
| - hadoop-tools (Hadoop tools like Streaming, Distcp, etc.) |
| - hadoop-dist (Hadoop distribution assembler) |
| |
| ---------------------------------------------------------------------------------- |
| Where to run Maven from? |
| |
| It can be run from any module. The only catch is that if not run from utrunk |
| all modules that are not part of the build run must be installed in the local |
| Maven cache or available in a Maven repository. |
| |
| ---------------------------------------------------------------------------------- |
| Maven build goals: |
| |
| * Clean : mvn clean [-Preleasedocs] |
| * Compile : mvn compile [-Pnative] |
| * Run tests : mvn test [-Pnative] [-Pshelltest] |
| * Create JAR : mvn package |
| * Run findbugs : mvn compile findbugs:findbugs |
| * Run checkstyle : mvn compile checkstyle:checkstyle |
| * Install JAR in M2 cache : mvn install |
| * Deploy JAR to Maven repo : mvn deploy |
| * Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license] |
| * Run Rat : mvn apache-rat:check |
| * Build javadocs : mvn javadoc:javadoc |
| * Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs][-Pyarn-ui] |
| * Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION |
| |
| Build options: |
| |
| * Use -Pnative to compile/bundle native code |
| * Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist) |
| * Use -Psrc to create a project source TAR.GZ |
| * Use -Dtar to create a TAR with the distribution (using -Pdist) |
| * Use -Preleasedocs to include the changelog and release docs (requires Internet connectivity) |
| * Use -Pyarn-ui to build YARN UI v2. (Requires Internet connectivity) |
| |
| Snappy build options: |
| |
| Snappy is a compression library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. |
| |
| * Use -Drequire.snappy to fail the build if libsnappy.so is not found. |
| If this option is not specified and the snappy library is missing, |
| we silently build a version of libhadoop.so that cannot make use of snappy. |
| This option is recommended if you plan on making use of snappy and want |
| to get more repeatable builds. |
| |
| * Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy |
| header files and library files. You do not need this option if you have |
| installed snappy using a package manager. |
| * Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library |
| files. Similarly to snappy.prefix, you do not need this option if you have |
| installed snappy using a package manager. |
| * Use -Dbundle.snappy to copy the contents of the snappy.lib directory into |
| the final tar file. This option requires that -Dsnappy.lib is also given, |
| and it ignores the -Dsnappy.prefix option. If -Dsnappy.lib isn't given, the |
| bundling and building will fail. |
| |
| OpenSSL build options: |
| |
| OpenSSL includes a crypto library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. |
| |
| * Use -Drequire.openssl to fail the build if libcrypto.so is not found. |
| If this option is not specified and the openssl library is missing, |
| we silently build a version of libhadoop.so that cannot make use of |
| openssl. This option is recommended if you plan on making use of openssl |
| and want to get more repeatable builds. |
| * Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto |
| header files and library files. You do not need this option if you have |
| installed openssl using a package manager. |
| * Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library |
| files. Similarly to openssl.prefix, you do not need this option if you have |
| installed openssl using a package manager. |
| * Use -Dbundle.openssl to copy the contents of the openssl.lib directory into |
| the final tar file. This option requires that -Dopenssl.lib is also given, |
| and it ignores the -Dopenssl.prefix option. If -Dopenssl.lib isn't given, the |
| bundling and building will fail. |
| |
| Tests options: |
| |
| * Use -DskipTests to skip tests when running the following Maven goals: |
| 'package', 'install', 'deploy' or 'verify' |
| * -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,.... |
| * -Dtest.exclude=<TESTCLASSNAME> |
| * -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java |
| * To run all native unit tests, use: mvn test -Pnative -Dtest=allNative |
| * To run a specific native unit test, use: mvn test -Pnative -Dtest=<test> |
| For example, to run test_bulk_crc32, you would use: |
| mvn test -Pnative -Dtest=test_bulk_crc32 |
| |
| Intel ISA-L build options: |
| |
| Intel ISA-L is an erasure coding library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. Note the library is used via dynamic module. Please |
| reference the official site for the library details. |
| https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version |
| (OR https://github.com/01org/isa-l) |
| |
| * Use -Drequire.isal to fail the build if libisal.so is not found. |
| If this option is not specified and the isal library is missing, |
| we silently build a version of libhadoop.so that cannot make use of ISA-L and |
| the native raw erasure coders. |
| This option is recommended if you plan on making use of native raw erasure |
| coders and want to get more repeatable builds. |
| * Use -Disal.prefix to specify a nonstandard location for the libisal |
| library files. You do not need this option if you have installed ISA-L to the |
| system library path. |
| * Use -Disal.lib to specify a nonstandard location for the libisal library |
| files. |
| * Use -Dbundle.isal to copy the contents of the isal.lib directory into |
| the final tar file. This option requires that -Disal.lib is also given, |
| and it ignores the -Disal.prefix option. If -Disal.lib isn't given, the |
| bundling and building will fail. |
| |
| Special plugins: OWASP's dependency-check: |
| |
| OWASP's dependency-check plugin will scan the third party dependencies |
| of this project for known CVEs (security vulnerabilities against them). |
| It will produce a report in target/dependency-check-report.html. To |
| invoke, run 'mvn dependency-check:aggregate'. Note that this plugin |
| requires maven 3.1.1 or greater. |
| |
| ---------------------------------------------------------------------------------- |
| Building components separately |
| |
| If you are building a submodule directory, all the hadoop dependencies this |
| submodule has will be resolved as all other 3rd party dependencies. This is, |
| from the Maven cache or from a Maven repository (if not available in the cache |
| or the SNAPSHOT 'timed out'). |
| An alternative is to run 'mvn install -DskipTests' from Hadoop source top |
| level once; and then work from the submodule. Keep in mind that SNAPSHOTs |
| time out after a while, using the Maven '-nsu' will stop Maven from trying |
| to update SNAPSHOTs from external repos. |
| |
| ---------------------------------------------------------------------------------- |
| Protocol Buffer compiler |
| |
| The version of Protocol Buffer compiler, protoc, must match the version of the |
| protobuf JAR. |
| |
| If you have multiple versions of protoc in your system, you can set in your |
| build shell the HADOOP_PROTOC_PATH environment variable to point to the one you |
| want to use for the Hadoop build. If you don't define this environment variable, |
| protoc is looked up in the PATH. |
| ---------------------------------------------------------------------------------- |
| Importing projects to eclipse |
| |
| When you import the project to eclipse, install hadoop-maven-plugins at first. |
| |
| $ cd hadoop-maven-plugins |
| $ mvn install |
| |
| Then, generate eclipse project files. |
| |
| $ mvn eclipse:eclipse -DskipTests |
| |
| At last, import to eclipse by specifying the root directory of the project via |
| [File] > [Import] > [Existing Projects into Workspace]. |
| |
| ---------------------------------------------------------------------------------- |
| Building distributions: |
| |
| Create binary distribution without native code and without documentation: |
| |
| $ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true |
| |
| Create binary distribution with native code and with documentation: |
| |
| $ mvn package -Pdist,native,docs -DskipTests -Dtar |
| |
| Create source distribution: |
| |
| $ mvn package -Psrc -DskipTests |
| |
| Create source and binary distributions with native code and documentation: |
| |
| $ mvn package -Pdist,native,docs,src -DskipTests -Dtar |
| |
| Create a local staging version of the website (in /tmp/hadoop-site) |
| |
| $ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site |
| |
| ---------------------------------------------------------------------------------- |
| Installing Hadoop |
| |
| Look for these HTML files after you build the document by the above commands. |
| |
| * Single Node Setup: |
| hadoop-project-dist/hadoop-common/SingleCluster.html |
| |
| * Cluster Setup: |
| hadoop-project-dist/hadoop-common/ClusterSetup.html |
| |
| ---------------------------------------------------------------------------------- |
| |
| Handling out of memory errors in builds |
| |
| ---------------------------------------------------------------------------------- |
| |
| If the build process fails with an out of memory error, you should be able to fix |
| it by increasing the memory used by maven which can be done via the environment |
| variable MAVEN_OPTS. |
| |
| Here is an example setting to allocate between 256 and 512 MB of heap space to |
| Maven |
| |
| export MAVEN_OPTS="-Xms256m -Xmx512m" |
| |
| ---------------------------------------------------------------------------------- |
| |
| Building on Windows |
| |
| ---------------------------------------------------------------------------------- |
| Requirements: |
| |
| * Windows System |
| * JDK 1.8+ |
| * Maven 3.0 or later |
| * Findbugs 1.3.9 (if running findbugs) |
| * ProtocolBuffer 2.5.0 |
| * CMake 2.6 or newer |
| * Windows SDK 7.1 or Visual Studio 2010 Professional |
| * Windows SDK 8.1 (if building CPU rate control for the container executor) |
| * zlib headers (if building native code bindings for zlib) |
| * Internet connection for first build (to fetch all Maven and Hadoop dependencies) |
| * Unix command-line tools from GnuWin32: sh, mkdir, rm, cp, tar, gzip. These |
| tools must be present on your PATH. |
| * Python ( for generation of docs using 'mvn site') |
| |
| Unix command-line tools are also included with the Windows Git package which |
| can be downloaded from http://git-scm.com/downloads |
| |
| If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012). |
| Do not use Visual Studio Express. It does not support compiling for 64-bit, |
| which is problematic if running a 64-bit system. The Windows SDK 7.1 is free to |
| download here: |
| |
| http://www.microsoft.com/en-us/download/details.aspx?id=8279 |
| |
| The Windows SDK 8.1 is available to download at: |
| |
| http://msdn.microsoft.com/en-us/windows/bg162891.aspx |
| |
| Cygwin is neither required nor supported. |
| |
| ---------------------------------------------------------------------------------- |
| Building: |
| |
| Keep the source code tree in a short path to avoid running into problems related |
| to Windows maximum path length limitation (for example, C:\hdc). |
| |
| Run builds from a Windows SDK Command Prompt. (Start, All Programs, |
| Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt). |
| |
| JAVA_HOME must be set, and the path must not contain spaces. If the full path |
| would contain spaces, then use the Windows short path instead. |
| |
| You must set the Platform environment variable to either x64 or Win32 depending |
| on whether you're running a 64-bit or 32-bit system. Note that this is |
| case-sensitive. It must be "Platform", not "PLATFORM" or "platform". |
| Environment variables on Windows are usually case-insensitive, but Maven treats |
| them as case-sensitive. Failure to set this environment variable correctly will |
| cause msbuild to fail while building the native code in hadoop-common. |
| |
| set Platform=x64 (when building on a 64-bit system) |
| set Platform=Win32 (when building on a 32-bit system) |
| |
| Several tests require that the user must have the Create Symbolic Links |
| privilege. |
| |
| All Maven goals are the same as described above with the exception that |
| native code is built by enabling the 'native-win' Maven profile. -Pnative-win |
| is enabled by default when building on Windows since the native components |
| are required (not optional) on Windows. |
| |
| If native code bindings for zlib are required, then the zlib headers must be |
| deployed on the build machine. Set the ZLIB_HOME environment variable to the |
| directory containing the headers. |
| |
| set ZLIB_HOME=C:\zlib-1.2.7 |
| |
| At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested |
| with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in |
| the zlib 1.2.7 source tree. |
| |
| http://www.zlib.net/ |
| |
| ---------------------------------------------------------------------------------- |
| Building distributions: |
| |
| * Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar] |
| |
| ---------------------------------------------------------------------------------- |
| Running compatibility checks with checkcompatibility.py |
| |
| Invoke `./dev-support/bin/checkcompatibility.py` to run Java API Compliance Checker |
| to compare the public Java APIs of two git objects. This can be used by release |
| managers to compare the compatibility of a previous and current release. |
| |
| As an example, this invocation will check the compatibility of interfaces annotated as Public or LimitedPrivate: |
| |
| ./dev-support/bin/checkcompatibility.py --annotation org.apache.hadoop.classification.InterfaceAudience.Public --annotation org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate --include "hadoop.*" branch-2.7.2 trunk |