branch-2.0.4-alpha/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm - hadoop - Git at Google

 ~~ Licensed under the Apache License, Version 2.0 (the "License");
 ~~ you may not use this file except in compliance with the License.
 ~~ You may obtain a copy of the License at
 ~~
 ~~   http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing, software
 ~~ distributed under the License is distributed on an "AS IS" BASIS,
 ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License. See accompanying LICENSE file.

   ---
   Native Libraries Guide
   ---
   ---
   ${maven.build.timestamp}

 Native Libraries Guide

 %{toc|section=1|fromDepth=0}

 * Overview

    This guide describes the native hadoop library and includes a small
    discussion about native shared libraries.

    Note: Depending on your environment, the term "native libraries" could
    refer to all *.so's you need to compile; and, the term "native
    compression" could refer to all *.so's you need to compile that are
    specifically related to compression. Currently, however, this document
    only addresses the native hadoop library (<<<libhadoop.so>>>).

 * Native Hadoop Library

    Hadoop has native implementations of certain components for performance
    reasons and for non-availability of Java implementations. These
    components are available in a single, dynamically-linked native library
    called the native hadoop library. On the *nix platforms the library is
    named <<<libhadoop.so>>>.

 * Usage

    It is fairly easy to use the native hadoop library:

     [[1]] Review the components.

     [[2]] Review the supported platforms.

     [[3]] Either download a hadoop release, which will include a pre-built
        version of the native hadoop library, or build your own version of
        the native hadoop library. Whether you download or build, the name
        for the library is the same: libhadoop.so

     [[4]] Install the compression codec development packages (>zlib-1.2,
        >gzip-1.2):
           + If you download the library, install one or more development
             packages - whichever compression codecs you want to use with
             your deployment.
           + If you build the library, it is mandatory to install both
             development packages.

     [[5]] Check the runtime log files.

 * Components

    The native hadoop library includes two components, the zlib and gzip
    compression codecs:

      * zlib

      * gzip

    The native hadoop library is imperative for gzip to work.

 * Supported Platforms

    The native hadoop library is supported on *nix platforms only. The
    library does not to work with Cygwin or the Mac OS X platform.

    The native hadoop library is mainly used on the GNU/Linus platform and
    has been tested on these distributions:

      * RHEL4/Fedora

      * Ubuntu

      * Gentoo

    On all the above distributions a 32/64 bit native hadoop library will
    work with a respective 32/64 bit jvm.

 * Download

    The pre-built 32-bit i386-Linux native hadoop library is available as
    part of the hadoop distribution and is located in the <<<lib/native>>>
    directory. You can download the hadoop distribution from Hadoop Common
    Releases.

    Be sure to install the zlib and/or gzip development packages -
    whichever compression codecs you want to use with your deployment.

 * Build

    The native hadoop library is written in ANSI C and is built using the
    GNU autotools-chain (autoconf, autoheader, automake, autoscan,
    libtool). This means it should be straight-forward to build the library
    on any platform with a standards-compliant C compiler and the GNU
    autotools-chain (see the supported platforms).

    The packages you need to install on the target platform are:

      * C compiler (e.g. GNU C Compiler)

      * GNU Autools Chain: autoconf, automake, libtool

      * zlib-development package (stable version >= 1.2.0)

    Once you installed the prerequisite packages use the standard hadoop
    build.xml file and pass along the compile.native flag (set to true) to
    build the native hadoop library:

 ----
    $ ant -Dcompile.native=true <target>
 ----

    You should see the newly-built library in:

 ----
    $ build/native/<platform>/lib
 ----

    where <platform> is a combination of the system-properties:
    ${os.name}-${os.arch}-${sun.arch.data.model} (for example,
    Linux-i386-32).

    Please note the following:

      * It is mandatory to install both the zlib and gzip development
        packages on the target platform in order to build the native hadoop
        library; however, for deployment it is sufficient to install just
        one package if you wish to use only one codec.

      * It is necessary to have the correct 32/64 libraries for zlib,
        depending on the 32/64 bit jvm for the target platform, in order to
        build and deploy the native hadoop library.

 * Runtime

    The bin/hadoop script ensures that the native hadoop library is on the
    library path via the system property:
    <<<-Djava.library.path=<path> >>>

    During runtime, check the hadoop log files for your MapReduce tasks.

      * If everything is all right, then:
        <<<DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...>>>
        <<<INFO util.NativeCodeLoader - Loaded the native-hadoop library>>>

      * If something goes wrong, then:
        <<<INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable>>>

 * Native Shared Libraries

    You can load any native shared library using DistributedCache for
    distributing and symlinking the library files.

    This example shows you how to distribute a shared library, mylib.so,
    and load it from a MapReduce task.

     [[1]] First copy the library to the HDFS:
        <<<bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1>>>

     [[2]] The job launching program should contain the following:
        <<<DistributedCache.createSymlink(conf);>>>
        <<<DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);>>>

     [[3]] The MapReduce task can contain:
        <<<System.loadLibrary("mylib.so");>>>

    Note: If you downloaded or built the native hadoop library, you don’t
    need to use DistibutedCache to make the library available to your
    MapReduce tasks.
	~~ Licensed under the Apache License, Version 2.0 (the "License");
	~~ you may not use this file except in compliance with the License.
	~~ You may obtain a copy of the License at
	~~
	~~ http://www.apache.org/licenses/LICENSE-2.0
	~~
	~~ Unless required by applicable law or agreed to in writing, software
	~~ distributed under the License is distributed on an "AS IS" BASIS,
	~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	~~ See the License for the specific language governing permissions and
	~~ limitations under the License. See accompanying LICENSE file.

	---
	Native Libraries Guide
	---
	---
	${maven.build.timestamp}

	Native Libraries Guide

	%{toc\|section=1\|fromDepth=0}

	* Overview

	This guide describes the native hadoop library and includes a small
	discussion about native shared libraries.

	Note: Depending on your environment, the term "native libraries" could
	refer to all *.so's you need to compile; and, the term "native
	compression" could refer to all *.so's you need to compile that are
	specifically related to compression. Currently, however, this document
	only addresses the native hadoop library (<<<libhadoop.so>>>).

	* Native Hadoop Library

	Hadoop has native implementations of certain components for performance
	reasons and for non-availability of Java implementations. These
	components are available in a single, dynamically-linked native library
	called the native hadoop library. On the *nix platforms the library is
	named <<<libhadoop.so>>>.

	* Usage

	It is fairly easy to use the native hadoop library:

	[[1]] Review the components.

	[[2]] Review the supported platforms.

	[[3]] Either download a hadoop release, which will include a pre-built
	version of the native hadoop library, or build your own version of
	the native hadoop library. Whether you download or build, the name
	for the library is the same: libhadoop.so

	[[4]] Install the compression codec development packages (>zlib-1.2,
	>gzip-1.2):
	+ If you download the library, install one or more development
	packages - whichever compression codecs you want to use with
	your deployment.
	+ If you build the library, it is mandatory to install both
	development packages.

	[[5]] Check the runtime log files.

	* Components

	The native hadoop library includes two components, the zlib and gzip
	compression codecs:

	* zlib

	* gzip

	The native hadoop library is imperative for gzip to work.

	* Supported Platforms

	The native hadoop library is supported on *nix platforms only. The
	library does not to work with Cygwin or the Mac OS X platform.

	The native hadoop library is mainly used on the GNU/Linus platform and
	has been tested on these distributions:

	* RHEL4/Fedora

	* Ubuntu

	* Gentoo

	On all the above distributions a 32/64 bit native hadoop library will
	work with a respective 32/64 bit jvm.

	* Download

	The pre-built 32-bit i386-Linux native hadoop library is available as
	part of the hadoop distribution and is located in the <<<lib/native>>>
	directory. You can download the hadoop distribution from Hadoop Common
	Releases.

	Be sure to install the zlib and/or gzip development packages -
	whichever compression codecs you want to use with your deployment.

	* Build

	The native hadoop library is written in ANSI C and is built using the
	GNU autotools-chain (autoconf, autoheader, automake, autoscan,
	libtool). This means it should be straight-forward to build the library
	on any platform with a standards-compliant C compiler and the GNU
	autotools-chain (see the supported platforms).

	The packages you need to install on the target platform are:

	* C compiler (e.g. GNU C Compiler)

	* GNU Autools Chain: autoconf, automake, libtool

	* zlib-development package (stable version >= 1.2.0)

	Once you installed the prerequisite packages use the standard hadoop
	build.xml file and pass along the compile.native flag (set to true) to
	build the native hadoop library:

	----
	$ ant -Dcompile.native=true <target>
	----

	You should see the newly-built library in:

	----
	$ build/native/<platform>/lib
	----

	where <platform> is a combination of the system-properties:
	${os.name}-${os.arch}-${sun.arch.data.model} (for example,
	Linux-i386-32).

	Please note the following:

	* It is mandatory to install both the zlib and gzip development
	packages on the target platform in order to build the native hadoop
	library; however, for deployment it is sufficient to install just
	one package if you wish to use only one codec.

	* It is necessary to have the correct 32/64 libraries for zlib,
	depending on the 32/64 bit jvm for the target platform, in order to
	build and deploy the native hadoop library.

	* Runtime

	The bin/hadoop script ensures that the native hadoop library is on the
	library path via the system property:
	<<<-Djava.library.path=<path> >>>

	During runtime, check the hadoop log files for your MapReduce tasks.

	* If everything is all right, then:
	<<<DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...>>>
	<<<INFO util.NativeCodeLoader - Loaded the native-hadoop library>>>

	* If something goes wrong, then:
	<<<INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable>>>

	* Native Shared Libraries

	You can load any native shared library using DistributedCache for
	distributing and symlinking the library files.

	This example shows you how to distribute a shared library, mylib.so,
	and load it from a MapReduce task.

	[[1]] First copy the library to the HDFS:
	<<<bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1>>>

	[[2]] The job launching program should contain the following:
	<<<DistributedCache.createSymlink(conf);>>>
	<<<DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);>>>

	[[3]] The MapReduce task can contain:
	<<<System.loadLibrary("mylib.so");>>>

	Note: If you downloaded or built the native hadoop library, you don’t
	need to use DistibutedCache to make the library available to your
	MapReduce tasks.