| Build instructions for Hadoop |
| |
| ---------------------------------------------------------------------------------- |
| Requirements: |
| |
| * Unix System |
| * JDK 1.8 |
| * Maven 3.3 or later |
| * Boost 1.72 (if compiling native code) |
| * Protocol Buffers 3.21.12 (if compiling native code) |
| * CMake 3.19 or newer (if compiling native code) |
| * Zlib devel (if compiling native code) |
| * Cyrus SASL devel (if compiling native code) |
| * One of the compilers that support thread_local storage: GCC 9.3.0 or later, Visual Studio, |
| Clang (community version), Clang (version for iOS 9 and later) (if compiling native code) |
| * openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance) |
| * Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs) |
| * Doxygen ( if compiling libhdfspp and generating the documents ) |
| * Internet connection for first build (to fetch all Maven and Hadoop dependencies) |
| * python (for releasedocs) |
| * bats (for shell code testing) |
| * Node.js / bower / Ember-cli (for YARN UI v2 building) |
| |
| ---------------------------------------------------------------------------------- |
| The easiest way to get an environment with all the appropriate tools is by means |
| of the provided Docker config. |
| This requires a recent version of docker (1.4.1 and higher are known to work). |
| |
| On Linux / Mac: |
| Install Docker and run this command: |
| |
| $ ./start-build-env.sh |
| |
| The prompt which is then presented is located at a mounted version of the source tree |
| and all required tools for testing and building have been installed and configured. |
| |
| Note that from within this docker environment you ONLY have access to the Hadoop source |
| tree from where you started. So if you need to run |
| dev-support/bin/test-patch /path/to/my.patch |
| then the patch must be placed inside the hadoop source tree. |
| |
| Known issues: |
| - On Mac with Boot2Docker the performance on the mounted directory is currently extremely slow. |
| This is a known problem related to boot2docker on the Mac. |
| See: |
| https://github.com/boot2docker/boot2docker/issues/593 |
| This issue has been resolved as a duplicate, and they point to a new feature for utilizing NFS mounts |
| as the proposed solution: |
| https://github.com/boot2docker/boot2docker/issues/64 |
| An alternative solution to this problem is to install Linux native inside a virtual machine |
| and run your IDE and Docker etc inside that VM. |
| |
| ---------------------------------------------------------------------------------- |
| Installing required packages for clean install of Ubuntu 18.04 LTS Desktop. |
| (For Ubuntu 20.04, gcc/g++ and cmake bundled with Ubuntu can be used. |
| Refer to dev-support/docker/Dockerfile): |
| |
| * Open JDK 1.8 |
| $ sudo apt-get update |
| $ sudo apt-get -y install openjdk-8-jdk |
| * Maven |
| $ sudo apt-get -y install maven |
| * Native libraries |
| $ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev libsasl2-dev |
| * GCC 9.3.0 |
| $ sudo apt-get -y install software-properties-common |
| $ sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test |
| $ sudo apt-get update |
| $ sudo apt-get -y install g++-9 gcc-9 |
| $ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-9 |
| * CMake 3.19 |
| $ curl -L https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz > cmake-3.19.0.tar.gz |
| $ tar -zxvf cmake-3.19.0.tar.gz && cd cmake-3.19.0 |
| $ ./bootstrap |
| $ make -j$(nproc) |
| $ sudo make install |
| * Protocol Buffers 3.21.12 (required to build native code) |
| $ curl -L https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz > protobuf-3.21.12.tar.gz |
| $ tar -zxvf protobuf-3.21.12.tar.gz && cd protobuf-3.21.12 |
| $ ./autogen.sh |
| $ ./configure |
| $ make -j$(nproc) |
| $ sudo make install |
| * Boost |
| $ curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download > boost_1_72_0.tar.bz2 |
| $ tar --bzip2 -xf boost_1_72_0.tar.bz2 && cd boost_1_72_0 |
| $ ./bootstrap.sh --prefix=/usr/ |
| $ ./b2 --without-python |
| $ sudo ./b2 --without-python install |
| |
| Optional packages: |
| |
| * Snappy compression (only used for hadoop-mapreduce-client-nativetask) |
| $ sudo apt-get install libsnappy-dev |
| * Intel ISA-L library for erasure coding |
| Please refer to https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version |
| (OR https://github.com/01org/isa-l) |
| * Bzip2 |
| $ sudo apt-get install bzip2 libbz2-dev |
| * Linux FUSE |
| $ sudo apt-get install fuse libfuse-dev |
| * ZStandard compression |
| $ sudo apt-get install libzstd1-dev |
| * PMDK library for storage class memory(SCM) as HDFS cache backend |
| Please refer to http://pmem.io/ and https://github.com/pmem/pmdk |
| |
| ---------------------------------------------------------------------------------- |
| Maven main modules: |
| |
| hadoop (Main Hadoop project) |
| - hadoop-project (Parent POM for all Hadoop Maven modules. ) |
| (All plugins & dependencies versions are defined here.) |
| - hadoop-project-dist (Parent POM for modules that generate distributions.) |
| - hadoop-annotations (Generates the Hadoop doclet used to generate the Javadocs) |
| - hadoop-assemblies (Maven assemblies used by the different modules) |
| - hadoop-maven-plugins (Maven plugins used in project) |
| - hadoop-build-tools (Build tools like checkstyle, etc.) |
| - hadoop-common-project (Hadoop Common) |
| - hadoop-hdfs-project (Hadoop HDFS) |
| - hadoop-yarn-project (Hadoop YARN) |
| - hadoop-mapreduce-project (Hadoop MapReduce) |
| - hadoop-tools (Hadoop tools like Streaming, Distcp, etc.) |
| - hadoop-dist (Hadoop distribution assembler) |
| - hadoop-client-modules (Hadoop client modules) |
| - hadoop-minicluster (Hadoop minicluster artifacts) |
| - hadoop-cloud-storage-project (Generates artifacts to access cloud storage like aws, azure, etc.) |
| |
| ---------------------------------------------------------------------------------- |
| Where to run Maven from? |
| |
| It can be run from any module. The only catch is that if not run from trunk |
| all modules that are not part of the build run must be installed in the local |
| Maven cache or available in a Maven repository. |
| |
| ---------------------------------------------------------------------------------- |
| Maven build goals: |
| |
| * Clean : mvn clean [-Preleasedocs] |
| * Compile : mvn compile [-Pnative] |
| * Run tests : mvn test [-Pnative] [-Pshelltest] |
| * Create JAR : mvn package |
| * Run spotbugs : mvn compile spotbugs:spotbugs |
| * Run checkstyle : mvn compile checkstyle:checkstyle |
| * Install JAR in M2 cache : mvn install |
| * Deploy JAR to Maven repo : mvn deploy |
| * Run clover : mvn test -Pclover |
| * Run Rat : mvn apache-rat:check |
| * Build javadocs : mvn javadoc:javadoc |
| * Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs][-Pyarn-ui] |
| * Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION |
| |
| Build options: |
| |
| * Use -Pnative to compile/bundle native code |
| * Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist) |
| * Use -Psrc to create a project source TAR.GZ |
| * Use -Dtar to create a TAR with the distribution (using -Pdist) |
| * Use -Preleasedocs to include the changelog and release docs (requires Internet connectivity) |
| * Use -Pyarn-ui to build YARN UI v2. (Requires Internet connectivity) |
| * Use -DskipShade to disable client jar shading to speed up build times (in |
| development environments only, not to build release artifacts) |
| |
| YARN Application Timeline Service V2 build options: |
| |
| YARN Timeline Service v.2 chooses Apache HBase as the primary backing storage. The supported |
| version of Apache HBase is 2.5.8. |
| |
| Snappy build options: |
| |
| Snappy is a compression library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. Snappy library as optional dependency is only |
| used for hadoop-mapreduce-client-nativetask. |
| |
| * Use -Drequire.snappy to fail the build if libsnappy.so is not found. |
| If this option is not specified and the snappy library is missing, |
| we silently build a version of libhadoop.so that cannot make use of snappy. |
| This option is recommended if you plan on making use of snappy and want |
| to get more repeatable builds. |
| * Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy |
| header files and library files. You do not need this option if you have |
| installed snappy using a package manager. |
| * Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library |
| files. Similarly to snappy.prefix, you do not need this option if you have |
| installed snappy using a package manager. |
| * Use -Dbundle.snappy to copy the contents of the snappy.lib directory into |
| the final tar file. This option requires that -Dsnappy.lib is also given, |
| and it ignores the -Dsnappy.prefix option. If -Dsnappy.lib isn't given, the |
| bundling and building will fail. |
| |
| |
| ZStandard build options: |
| |
| ZStandard is a compression library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. |
| |
| * Use -Drequire.zstd to fail the build if libzstd.so is not found. |
| If this option is not specified and the zstd library is missing. |
| |
| * Use -Dzstd.prefix to specify a nonstandard location for the libzstd |
| header files and library files. You do not need this option if you have |
| installed zstandard using a package manager. |
| |
| * Use -Dzstd.lib to specify a nonstandard location for the libzstd library |
| files. Similarly to zstd.prefix, you do not need this option if you have |
| installed using a package manager. |
| |
| * Use -Dbundle.zstd to copy the contents of the zstd.lib directory into |
| the final tar file. This option requires that -Dzstd.lib is also given, |
| and it ignores the -Dzstd.prefix option. If -Dzstd.lib isn't given, the |
| bundling and building will fail. |
| |
| OpenSSL build options: |
| |
| OpenSSL includes a crypto library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. |
| |
| * Use -Drequire.openssl to fail the build if libcrypto.so is not found. |
| If this option is not specified and the openssl library is missing, |
| we silently build a version of libhadoop.so that cannot make use of |
| openssl. This option is recommended if you plan on making use of openssl |
| and want to get more repeatable builds. |
| * Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto |
| header files and library files. You do not need this option if you have |
| installed openssl using a package manager. |
| * Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library |
| files. Similarly to openssl.prefix, you do not need this option if you have |
| installed openssl using a package manager. |
| * Use -Dbundle.openssl to copy the contents of the openssl.lib directory into |
| the final tar file. This option requires that -Dopenssl.lib is also given, |
| and it ignores the -Dopenssl.prefix option. If -Dopenssl.lib isn't given, the |
| bundling and building will fail. |
| |
| Tests options: |
| |
| * Use -DskipTests to skip tests when running the following Maven goals: |
| 'package', 'install', 'deploy' or 'verify' |
| * -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,.... |
| * -Dtest.exclude=<TESTCLASSNAME> |
| * -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java |
| * To run all native unit tests, use: mvn test -Pnative -Dtest=allNative |
| * To run a specific native unit test, use: mvn test -Pnative -Dtest=<test> |
| For example, to run test_bulk_crc32, you would use: |
| mvn test -Pnative -Dtest=test_bulk_crc32 |
| |
| Intel ISA-L build options: |
| |
| Intel ISA-L is an erasure coding library that can be utilized by the native code. |
| It is currently an optional component, meaning that Hadoop can be built with |
| or without this dependency. Note the library is used via dynamic module. Please |
| reference the official site for the library details. |
| https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version |
| (OR https://github.com/01org/isa-l) |
| |
| * Use -Drequire.isal to fail the build if libisal.so is not found. |
| If this option is not specified and the isal library is missing, |
| we silently build a version of libhadoop.so that cannot make use of ISA-L and |
| the native raw erasure coders. |
| This option is recommended if you plan on making use of native raw erasure |
| coders and want to get more repeatable builds. |
| * Use -Disal.prefix to specify a nonstandard location for the libisal |
| library files. You do not need this option if you have installed ISA-L to the |
| system library path. |
| * Use -Disal.lib to specify a nonstandard location for the libisal library |
| files. |
| * Use -Dbundle.isal to copy the contents of the isal.lib directory into |
| the final tar file. This option requires that -Disal.lib is also given, |
| and it ignores the -Disal.prefix option. If -Disal.lib isn't given, the |
| bundling and building will fail. |
| |
| Special plugins: OWASP's dependency-check: |
| |
| OWASP's dependency-check plugin will scan the third party dependencies |
| of this project for known CVEs (security vulnerabilities against them). |
| It will produce a report in target/dependency-check-report.html. To |
| invoke, run 'mvn dependency-check:aggregate'. Note that this plugin |
| requires maven 3.1.1 or greater. |
| |
| PMDK library build options: |
| |
| The Persistent Memory Development Kit (PMDK), formerly known as NVML, is a growing |
| collection of libraries which have been developed for various use cases, tuned, |
| validated to production quality, and thoroughly documented. These libraries are built |
| on the Direct Access (DAX) feature available in both Linux and Windows, which allows |
| applications directly load/store access to persistent memory by memory-mapping files |
| on a persistent memory aware file system. |
| |
| It is currently an optional component, meaning that Hadoop can be built without |
| this dependency. Please Note the library is used via dynamic module. For getting |
| more details please refer to the official sites: |
| http://pmem.io/ and https://github.com/pmem/pmdk. |
| |
| * -Drequire.pmdk is used to build the project with PMDK libraries forcibly. With this |
| option provided, the build will fail if libpmem library is not found. If this option |
| is not given, the build will generate a version of Hadoop with libhadoop.so. |
| And storage class memory(SCM) backed HDFS cache is still supported without PMDK involved. |
| Because PMDK can bring better caching write/read performance, it is recommended to build |
| the project with this option if user plans to use SCM backed HDFS cache. |
| * -Dpmdk.lib is used to specify a nonstandard location for PMDK libraries if they are not |
| under /usr/lib or /usr/lib64. |
| * -Dbundle.pmdk is used to copy the specified libpmem libraries into the distribution tar |
| package. This option requires that -Dpmdk.lib is specified. With -Dbundle.pmdk provided, |
| the build will fail if -Dpmdk.lib is not specified. |
| |
| Controlling the redistribution of the protobuf-2.5 dependency |
| |
| The protobuf 2.5.0 library is used at compile time to compile the class |
| org.apache.hadoop.ipc.ProtobufHelper; this class known to have been used by |
| external projects in the past. Protobuf 2.5 is not used directly in |
| the Hadoop codebase; alongside the move to Protobuf 3.x a private successor |
| class, org.apache.hadoop.ipc.internal.ShadedProtobufHelper is now used. |
| |
| The hadoop-common module no longer exports its compile-time dependency on |
| protobuf-java-2.5. |
| Any application declaring a dependency on hadoop-commmon will no longer get |
| the artifact added to their classpath. |
| If is still required, then they must explicitly declare it: |
| |
| <dependency> |
| <groupId>com.google.protobuf</groupId> |
| <artifactId>protobuf-java</artifactId> |
| <version>2.5.0</version> |
| </dependency> |
| |
| In Hadoop builds the scope of the dependency can be set with the |
| option "common.protobuf2.scope". |
| This can be upgraded from "provided" to "compile" on the maven command line: |
| |
| -Dcommon.protobuf2.scope=compile |
| |
| If this is done then protobuf-java-2.5.0.jar will again be exported as a |
| hadoop-common dependency, and included in the share/hadoop/common/lib/ |
| directory of any Hadoop distribution built. |
| |
| Note that protobuf-java-2.5.0.jar is still placed in |
| share/hadoop/yarn/timelineservice/lib; this is needed by the hbase client |
| library. |
| |
| ---------------------------------------------------------------------------------- |
| Building components separately |
| |
| If you are building a submodule directory, all the hadoop dependencies this |
| submodule has will be resolved as all other 3rd party dependencies. This is, |
| from the Maven cache or from a Maven repository (if not available in the cache |
| or the SNAPSHOT 'timed out'). |
| An alternative is to run 'mvn install -DskipTests' from Hadoop source top |
| level once; and then work from the submodule. Keep in mind that SNAPSHOTs |
| time out after a while, using the Maven '-nsu' will stop Maven from trying |
| to update SNAPSHOTs from external repos. |
| |
| ---------------------------------------------------------------------------------- |
| Importing projects to eclipse |
| |
| At first, install artifacts including hadoop-maven-plugins at the top of the source tree. |
| |
| $ mvn clean install -DskipTests -DskipShade |
| |
| Then, import to eclipse by specifying the root directory of the project via |
| [File] > [Import] > [Maven] > [Existing Maven Projects]. |
| |
| ---------------------------------------------------------------------------------- |
| Building distributions: |
| |
| Create binary distribution without native code and without Javadocs: |
| |
| $ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true |
| |
| Create binary distribution with native code: |
| |
| $ mvn package -Pdist,native -DskipTests -Dtar |
| |
| Create source distribution: |
| |
| $ mvn package -Psrc -DskipTests |
| |
| Create source and binary distributions with native code: |
| |
| $ mvn package -Pdist,native,src -DskipTests -Dtar |
| |
| Create a local staging version of the website (in /tmp/hadoop-site) |
| |
| $ mvn site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site |
| |
| Note that the site needs to be built in a second pass after other artifacts. |
| |
| ---------------------------------------------------------------------------------- |
| Installing Hadoop |
| |
| Look for these HTML files after you build the document by the above commands. |
| |
| * Single Node Setup: |
| hadoop-project-dist/hadoop-common/SingleCluster.html |
| |
| * Cluster Setup: |
| hadoop-project-dist/hadoop-common/ClusterSetup.html |
| |
| ---------------------------------------------------------------------------------- |
| |
| Handling out of memory errors in builds |
| |
| ---------------------------------------------------------------------------------- |
| |
| If the build process fails with an out of memory error, you should be able to fix |
| it by increasing the memory used by maven which can be done via the environment |
| variable MAVEN_OPTS. |
| |
| Here is an example setting to allocate between 256 MB and 1.5 GB of heap space to |
| Maven |
| |
| export MAVEN_OPTS="-Xms256m -Xmx1536m" |
| |
| ---------------------------------------------------------------------------------- |
| |
| Building on macOS (without Docker) |
| |
| ---------------------------------------------------------------------------------- |
| Installing required dependencies for clean install of macOS 10.14: |
| |
| * Install Xcode Command Line Tools |
| $ xcode-select --install |
| * Install Homebrew |
| $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" |
| * Install OpenJDK 8 |
| $ brew tap AdoptOpenJDK/openjdk |
| $ brew cask install adoptopenjdk8 |
| * Install maven and tools |
| $ brew install maven autoconf automake cmake wget |
| * Install native libraries, only openssl is required to compile native code, |
| you may optionally install zlib, lz4, etc. |
| $ brew install openssl |
| * Protocol Buffers 3.21.12 (required to compile native code) |
| $ curl -L https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz > protobuf-3.21.12.tar.gz |
| $ tar -zxvf protobuf-3.21.12.tar.gz && cd protobuf-3.21.12 |
| $ ./autogen.sh |
| $ ./configure |
| $ make |
| $ make check |
| $ make install |
| $ protoc --version |
| |
| Note that building Hadoop 3.1.1/3.1.2/3.2.0 native code from source is broken |
| on macOS. For 3.1.1/3.1.2, you need to manually backport YARN-8622. For 3.2.0, |
| you need to backport both YARN-8622 and YARN-9487 in order to build native code. |
| |
| ---------------------------------------------------------------------------------- |
| Building command example: |
| |
| * Create binary distribution with native code but without documentation: |
| $ mvn package -Pdist,native -DskipTests -Dmaven.javadoc.skip \ |
| -Dopenssl.prefix=/usr/local/opt/openssl |
| |
| Note that the command above manually specified the openssl library and include |
| path. This is necessary at least for Homebrewed OpenSSL. |
| |
| |
| ---------------------------------------------------------------------------------- |
| |
| Building on CentOS 8 |
| |
| ---------------------------------------------------------------------------------- |
| |
| |
| * Install development tools such as GCC, autotools, OpenJDK and Maven. |
| $ sudo dnf group install --with-optional 'Development Tools' |
| $ sudo dnf install java-1.8.0-openjdk-devel maven |
| |
| * Install python2 for building documentation. |
| $ sudo dnf install python2 |
| |
| * Install Protocol Buffers v3.21.12. |
| $ curl -L https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz > protobuf-3.21.12.tar.gz |
| $ tar -zxvf protobuf-3.21.12.tar.gz && cd protobuf-3.21.12 |
| $ ./autogen.sh |
| $ ./configure --prefix=/usr/local |
| $ make |
| $ sudo make install |
| $ cd .. |
| |
| * Install libraries provided by CentOS 8. |
| $ sudo dnf install libtirpc-devel zlib-devel lz4-devel bzip2-devel openssl-devel cyrus-sasl-devel libpmem-devel |
| |
| * Install GCC 9.3.0 |
| $ sudo dnf -y install gcc-toolset-9-gcc gcc-toolset-9-gcc-c++ |
| $ source /opt/rh/gcc-toolset-9/enable |
| |
| * Install CMake 3.19 |
| $ curl -L https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz > cmake-3.19.0.tar.gz |
| $ tar -zxvf cmake-3.19.0.tar.gz && cd cmake-3.19.0 |
| $ ./bootstrap |
| $ make -j$(nproc) |
| $ sudo make install |
| |
| * Install boost. |
| $ curl -L -o boost_1_72_0.tar.bz2 https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download |
| $ tar xjf boost_1_72_0.tar.bz2 |
| $ cd boost_1_72_0 |
| $ ./bootstrap.sh --prefix=/usr/local |
| $ ./b2 |
| $ sudo ./b2 install |
| |
| * Install optional dependencies (snappy-devel). |
| $ sudo dnf --enablerepo=PowerTools install snappy-devel |
| |
| * Install optional dependencies (libzstd-devel). |
| $ sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm |
| $ sudo dnf --enablerepo=epel install libzstd-devel |
| |
| * Install optional dependencies (isa-l). |
| $ sudo dnf --enablerepo=PowerTools install nasm |
| $ git clone https://github.com/intel/isa-l |
| $ cd isa-l/ |
| $ ./autogen.sh |
| $ ./configure |
| $ make |
| $ sudo make install |
| |
| ---------------------------------------------------------------------------------- |
| |
| Building on Windows 10 |
| |
| ---------------------------------------------------------------------------------- |
| Requirements: |
| |
| * Windows 10 |
| * JDK 1.8 |
| * Maven 3.0 or later (maven.apache.org) |
| * Boost 1.72 (boost.org) |
| * Protocol Buffers 3.21.12 (https://github.com/protocolbuffers/protobuf/tags) |
| * CMake 3.19 or newer (cmake.org) |
| * Visual Studio 2019 (visualstudio.com) |
| * Windows SDK 8.1 (optional, if building CPU rate control for the container executor. Get this from |
| http://msdn.microsoft.com/en-us/windows/bg162891.aspx) |
| * Zlib (zlib.net, if building native code bindings for zlib) |
| * Git (preferably, get this from https://git-scm.com/download/win since the package also contains |
| Unix command-line tools that are needed during packaging). |
| * Python (python.org, for generation of docs using 'mvn site') |
| * Internet connection for first build (to fetch all Maven and Hadoop dependencies) |
| |
| ---------------------------------------------------------------------------------- |
| |
| Building guidelines: |
| |
| Hadoop repository provides the Dockerfile for building Hadoop on Windows 10, located at |
| dev-support/docker/Dockerfile_windows_10. It is highly recommended to use this and create the |
| Docker image for building Hadoop on Windows 10, since you don't have to install anything else |
| other than Docker and no additional steps are required in terms of aligning the environment with |
| the necessary paths etc. |
| |
| However, if you still prefer taking the route of not using Docker, this Dockerfile_windows_10 will |
| still be immensely useful as a raw guide for all the steps involved in creating the environment |
| needed to build Hadoop on Windows 10. |
| |
| Building using the Docker: |
| We first need to build the Docker image for building Hadoop on Windows 10. Run this command from |
| the root of the Hadoop repository. |
| > docker build -t hadoop-windows-10-builder -f .\dev-support\docker\Dockerfile_windows_10 .\dev-support\docker\ |
| |
| Start the container with the image that we just built. |
| > docker run --rm -it hadoop-windows-10-builder |
| |
| You can now clone the Hadoop repo inside this container and proceed with the build. |
| |
| NOTE: |
| While one may perceive the idea of mounting the locally cloned (on the host filesystem) Hadoop |
| repository into the container (using the -v option), we have seen the build to fail owing to some |
| files not being able to be located by Maven. Thus, we suggest cloning the Hadoop repository to a |
| non-mounted folder inside the container and proceed with the build. When the build is completed, |
| you may use the "docker cp" command to copy the built Hadoop tar.gz file from the docker container |
| to the host filesystem. If you still would like to mount the Hadoop codebase, a workaround would |
| be to copy the mounted Hadoop codebase into another folder (which doesn't point to a mount) in the |
| container's filesystem and use this for building. |
| |
| However, we noticed no build issues when the Maven repository from the host filesystem was mounted |
| into the container. One may use this to greatly reduce the build time. Assuming that the Maven |
| repository is located at D:\Maven\Repository in the host filesystem, one can use the following |
| command to mount the same onto the default Maven repository location while launching the container. |
| > docker run --rm -v D:\Maven\Repository:C:\Users\ContainerAdministrator\.m2\repository -it hadoop-windows-10-builder |
| |
| Building: |
| |
| Keep the source code tree in a short path to avoid running into problems related |
| to Windows maximum path length limitation (for example, C:\hdc). |
| |
| There is one support command file located in dev-support called win-paths-eg.cmd. |
| It should be copied somewhere convenient and modified to fit your needs. |
| |
| win-paths-eg.cmd sets up the environment for use. You will need to modify this |
| file. It will put all of the required components in the command path, |
| configure the bit-ness of the build, and set several optional components. |
| |
| Several tests require that the user must have the Create Symbolic Links |
| privilege. |
| |
| To simplify the installation of Boost, Protocol buffers, OpenSSL and Zlib dependencies we can use |
| vcpkg (https://github.com/Microsoft/vcpkg.git). Upon cloning the vcpkg repo, checkout the commit |
| 7ffa425e1db8b0c3edf9c50f2f3a0f25a324541d to get the required versions of the dependencies |
| mentioned above. |
| > git clone https://github.com/Microsoft/vcpkg.git |
| > cd vcpkg |
| > git checkout 7ffa425e1db8b0c3edf9c50f2f3a0f25a324541d |
| > .\bootstrap-vcpkg.bat |
| > .\vcpkg.exe install boost:x64-windows |
| > .\vcpkg.exe install protobuf:x64-windows |
| > .\vcpkg.exe install openssl:x64-windows |
| > .\vcpkg.exe install zlib:x64-windows |
| |
| Set the following environment variables - |
| (Assuming that vcpkg was checked out at C:\vcpkg) |
| > set PROTOBUF_HOME=C:\vcpkg\installed\x64-windows |
| > set MAVEN_OPTS=-Xmx2048M -Xss128M |
| |
| All Maven goals are the same as described above with the exception that |
| native code is built by enabling the 'native-win' Maven profile. -Pnative-win |
| is enabled by default when building on Windows since the native components |
| are required (not optional) on Windows. |
| |
| If native code bindings for zlib are required, then the zlib headers must be |
| deployed on the build machine. Set the ZLIB_HOME environment variable to the |
| directory containing the headers. |
| |
| set ZLIB_HOME=C:\zlib-1.2.7 |
| |
| At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested |
| with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in |
| the zlib 1.2.7 source tree. |
| |
| http://www.zlib.net/ |
| |
| |
| Build command: |
| The following command builds all the modules in the Hadoop project and generates the tar.gz file in |
| hadoop-dist/target upon successful build. Run these commands from an |
| "x64 Native Tools Command Prompt for VS 2019" which can be found under "Visual Studio 2019" in the |
| Windows start menu. If you're using the Docker image from Dockerfile_windows_10, you'll be |
| logged into "x64 Native Tools Command Prompt for VS 2019" automatically when you start the |
| container. |
| |
| > set classpath= |
| > set PROTOBUF_HOME=C:\vcpkg\installed\x64-windows |
| > mvn clean package -Dhttps.protocols=TLSv1.2 -DskipTests -DskipDocs -Pnative-win,dist^ |
| -Drequire.openssl -Drequire.test.libhadoop -Pyarn-ui -Dshell-executable=C:\Git\bin\bash.exe^ |
| -Dtar -Dopenssl.prefix=C:\vcpkg\installed\x64-windows^ |
| -Dcmake.prefix.path=C:\vcpkg\installed\x64-windows^ |
| -Dwindows.cmake.toolchain.file=C:\vcpkg\scripts\buildsystems\vcpkg.cmake -Dwindows.cmake.build.type=RelWithDebInfo^ |
| -Dwindows.build.hdfspp.dll=off -Dwindows.no.sasl=on -Duse.platformToolsetVersion=v142 |
| |
| Building the release tarball: |
| Assuming that we're still running in the Docker container hadoop-windows-10-builder, run the |
| following command to create the Apache Hadoop release tarball - |
| |
| > set IS_WINDOWS=1 |
| > set MVN_ARGS="-Dshell-executable=C:\Git\bin\bash.exe -Dhttps.protocols=TLSv1.2 -Pnative-win -Drequire.openssl -Dopenssl.prefix=C:\vcpkg\installed\x64-windows -Dcmake.prefix.path=C:\vcpkg\installed\x64-windows -Dwindows.cmake.toolchain.file=C:\vcpkg\scripts\buildsystems\vcpkg.cmake -Dwindows.cmake.build.type=RelWithDebInfo -Dwindows.build.hdfspp.dll=off -Duse.platformToolsetVersion=v142 -Dwindows.no.sasl=on -DskipTests -DskipDocs -Drequire.test.libhadoop" |
| > C:\Git\bin\bash.exe C:\hadoop\dev-support\bin\create-release --mvnargs=%MVN_ARGS% |
| |
| Note: |
| If the building fails due to an issue with long paths, rename the Hadoop root directory to just a |
| letter (like 'h') and rebuild - |
| |
| > C:\Git\bin\bash.exe C:\h\dev-support\bin\create-release --mvnargs=%MVN_ARGS% |
| |
| ---------------------------------------------------------------------------------- |
| Building distributions: |
| |
| * Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar][-Dmaven.javadoc.skip=true] |
| |
| ---------------------------------------------------------------------------------- |
| Running compatibility checks with checkcompatibility.py |
| |
| Invoke `./dev-support/bin/checkcompatibility.py` to run Java API Compliance Checker |
| to compare the public Java APIs of two git objects. This can be used by release |
| managers to compare the compatibility of a previous and current release. |
| |
| As an example, this invocation will check the compatibility of interfaces annotated as Public or LimitedPrivate: |
| |
| ./dev-support/bin/checkcompatibility.py --annotation org.apache.hadoop.classification.InterfaceAudience.Public --annotation org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate --include "hadoop.*" branch-2.7.2 trunk |
| |
| ---------------------------------------------------------------------------------- |
| Changing the Hadoop version declared returned by VersionInfo |
| |
| If for compatibility reasons the version of Hadoop has to be declared as a 2.x release in the information returned by |
| org.apache.hadoop.util.VersionInfo, set the property declared.hadoop.version to the desired version. |
| For example: mvn package -Pdist -Ddeclared.hadoop.version=2.11 |
| |
| If unset, the project version declared in the POM file is used. |