common/src/docs/src/documentation/content/xdocs/native_libraries.xml - hadoop - Git at Google

 <?xml version="1.0"?>
 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->

 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

 <document>

   <header>
     <title>Native Libraries Guide</title>
   </header>

   <body>

   <section>
   <title>Overview</title>

 <p>This guide describes the native hadoop library and includes a small discussion about native shared libraries.</p>

       <p><strong>Note:</strong> Depending on your environment, the term "native libraries" <em>could</em>
       refer to all *.so's you need to compile; and, the term "native compression" <em>could</em> refer to all *.so's
       you need to compile that are specifically related to compression.
       Currently, however, this document only addresses the native hadoop library (<em>libhadoop.so</em>).</p>

   </section>

     <section>
       <title>Native Hadoop Library </title>

       <p>Hadoop has native implementations of certain components for
       performance reasons and for non-availability of Java implementations. These
       components are available in a single, dynamically-linked native library called
        the native hadoop library. On the *nix platforms the library is named <em>libhadoop.so</em>. </p>

     <section>
       <title>Usage</title>

       <p>It is fairly easy to use the native hadoop library:</p>

       <ol>
               <li>
           Review the <a href="#Components">components</a>.
         </li>
         <li>
           Review the <a href="#Supported+Platforms">supported platforms</a>.
         </li>
         <li>
           Either <a href="#Download">download</a> a hadoop release, which will
           include a pre-built version of the native hadoop library, or
           <a href="#Build">build</a> your own version of the
           native hadoop library. Whether you download or build, the name for the library is
           the same: <em>libhadoop.so</em>
         </li>
         <li>
           Install the compression codec development packages
           (<strong>&gt;zlib-1.2</strong>, <strong>&gt;gzip-1.2</strong>):
           <ul>
               <li>If you download the library, install one or more development packages -
               whichever compression codecs you want to use with your deployment.</li>
               <li>If you build the library, it is <strong>mandatory</strong>
               to install both development packages.</li>
           </ul>
         </li>
          <li>
           Check the <a href="#Runtime">runtime</a> log files.
         </li>
       </ol>
      </section>
     <section>
       <title>Components</title>
      <p>The native hadoop library includes two components, the zlib and gzip
       <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html">
       compression codecs</a>:
       </p>
       <ul>
         <li><a href="ext:zlib">zlib</a></li>
         <li><a href="ext:gzip">gzip</a></li>
       </ul>
       <p>The native hadoop library is imperative for gzip to work.</p>
     </section>

     <section>
       <title>Supported Platforms</title>

       <p>The native hadoop library is supported on *nix platforms only.
       The library does not to work with <a href="ext:cygwin">Cygwin</a>
       or the <a href="ext:osx">Mac OS X</a> platform.</p>

       <p>The native hadoop library is mainly used on the GNU/Linus platform and
       has been tested on these distributions:</p>
       <ul>
         <li>
           <a href="http://www.redhat.com/rhel/">RHEL4</a>/<a href="http://fedora.redhat.com/">Fedora</a>
         </li>
         <li><a href="http://www.ubuntu.com/">Ubuntu</a></li>
         <li><a href="http://www.gentoo.org/">Gentoo</a></li>
       </ul>

       <p>On all the above distributions a 32/64 bit native hadoop library will work
       with a respective 32/64 bit jvm.</p>
     </section>

     <section>
       <title>Download</title>

       <p>The pre-built 32-bit i386-Linux native hadoop library is available as part of the
       hadoop distribution and is located in the <code>lib/native</code> directory. You can download the
       hadoop distribution from <a href="ext:releases/download">Hadoop Common Releases</a>.</p>

       <p>Be sure to install the zlib and/or gzip development packages - whichever compression
       codecs you want to use with your deployment.</p>
      </section>

     <section>
       <title>Build</title>

       <p>The native hadoop library is written in <a href="http://en.wikipedia.org/wiki/ANSI_C">ANSI C</a>
       and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool).
       This means it should be straight-forward to build the library on any platform with a standards-compliant
       C compiler and the GNU autotools-chain (see the <a href="#Supported+Platforms">supported platforms</a>).</p>

       <p>The packages you need to install on the target platform are:</p>
       <ul>
         <li>
           C compiler (e.g. <a href="http://gcc.gnu.org/">GNU C Compiler</a>)
         </li>
         <li>
           GNU Autools Chain:
           <a href="http://www.gnu.org/software/autoconf/">autoconf</a>,
           <a href="http://www.gnu.org/software/automake/">automake</a>,
           <a href="http://www.gnu.org/software/libtool/">libtool</a>
         </li>
         <li>
           zlib-development package (stable version >= 1.2.0)
         </li>
       </ul>

       <p>Once you installed the prerequisite packages use the standard hadoop <code>build.xml</code>
       file and pass along the <code>compile.native</code> flag (set to <code>true</code>) to build the native hadoop library:</p>

       <p><code>$ ant -Dcompile.native=true &lt;target&gt;</code></p>

       <p>You should see the newly-built library in:</p>

       <p><code>$ build/native/&lt;platform&gt;/lib</code></p>

       <p>where &lt;<code>platform</code>&gt; is a combination of the system-properties:
       <code>${os.name}-${os.arch}-${sun.arch.data.model}</code> (for example, Linux-i386-32).</p>

       <p>Please note the following:</p>
         <ul>
           <li>
             It is <strong>mandatory</strong> to install both the zlib and gzip
             development packages on the target platform in order to build the
             native hadoop library; however, for deployment it is sufficient to
             install just one package if you wish to use only one codec.
           </li>
           <li>
             It is necessary to have the correct 32/64 libraries for zlib,
             depending on the 32/64 bit jvm for the target platform, in order to
             build and deploy the native hadoop library.
           </li>
         </ul>
     </section>

      <section>
       <title>Runtime</title>
       <p>The <code>bin/hadoop</code> script ensures that the native hadoop
       library is on the library path via the system property: <br/>
       <em>-Djava.library.path=&lt;path&gt;</em></p>

       <p>During runtime, check the hadoop log files for your MapReduce tasks.</p>

       <ul>
          <li>If everything is all right, then:<br/><br/>
           <code> DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...  </code><br/>
           <code> INFO  util.NativeCodeLoader - Loaded the native-hadoop library </code><br/>
          </li>

          <li>If something goes wrong, then:<br/><br/>
          <code>
           INFO util.NativeCodeLoader - Unable to load native-hadoop library for
           your platform... using builtin-java classes where applicable
         </code>

          </li>
       </ul>
     </section>
      </section>

     <section>
       <title>Native Shared Libraries</title>
       <p>You can load <strong>any</strong> native shared library using
       <a href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#DistributedCache">DistributedCache</a>
       for <em>distributing</em> and <em>symlinking</em> the library files.</p>

       <p>This example shows you how to distribute a shared library, <code>mylib.so</code>,
       and load it from a MapReduce task.</p>
       <ol>
       <li> First copy the library to the HDFS: <br/>
       <code>bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</code>
       </li>
       <li> The job launching program should contain the following: <br/>
       <code> DistributedCache.createSymlink(conf); </code> <br/>
       <code> DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so.1#mylib.so", conf);
       </code>
       </li>
       <li> The MapReduce task can contain: <br/>
       <code> System.loadLibrary("mylib.so"); </code>
       </li>
       </ol>

      <p><br/><strong>Note:</strong> If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to
      make the library available to your MapReduce tasks.</p>
     </section>
   </body>

 </document>
	<?xml version="1.0"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

	<document>

	<header>
	<title>Native Libraries Guide</title>
	</header>

	<body>

	<section>
	<title>Overview</title>

	<p>This guide describes the native hadoop library and includes a small discussion about native shared libraries.</p>

	<p><strong>Note:</strong> Depending on your environment, the term "native libraries" <em>could</em>
	refer to all .so's you need to compile; and, the term "native compression" <em>could</em> refer to all .so's
	you need to compile that are specifically related to compression.
	Currently, however, this document only addresses the native hadoop library (<em>libhadoop.so</em>).</p>

	</section>

	<section>
	<title>Native Hadoop Library </title>

	<p>Hadoop has native implementations of certain components for
	performance reasons and for non-availability of Java implementations. These
	components are available in a single, dynamically-linked native library called
	the native hadoop library. On the *nix platforms the library is named <em>libhadoop.so</em>. </p>

	<section>
	<title>Usage</title>

	<p>It is fairly easy to use the native hadoop library:</p>

	<ol>
	<li>
	Review the <a href="#Components">components</a>.
	</li>
	<li>
	Review the <a href="#Supported+Platforms">supported platforms</a>.
	</li>
	<li>
	Either <a href="#Download">download</a> a hadoop release, which will
	include a pre-built version of the native hadoop library, or
	<a href="#Build">build</a> your own version of the
	native hadoop library. Whether you download or build, the name for the library is
	the same: <em>libhadoop.so</em>
	</li>
	<li>
	Install the compression codec development packages
	(<strong>>zlib-1.2</strong>, <strong>>gzip-1.2</strong>):
	<ul>
	<li>If you download the library, install one or more development packages -
	whichever compression codecs you want to use with your deployment.</li>
	<li>If you build the library, it is <strong>mandatory</strong>
	to install both development packages.</li>
	</ul>
	</li>
	<li>
	Check the <a href="#Runtime">runtime</a> log files.
	</li>
	</ol>
	</section>
	<section>
	<title>Components</title>
	<p>The native hadoop library includes two components, the zlib and gzip
	<a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html">
	compression codecs</a>:
	</p>
	<ul>
	<li><a href="ext:zlib">zlib</a></li>
	<li><a href="ext:gzip">gzip</a></li>
	</ul>
	<p>The native hadoop library is imperative for gzip to work.</p>
	</section>

	<section>
	<title>Supported Platforms</title>

	<p>The native hadoop library is supported on *nix platforms only.
	The library does not to work with <a href="ext:cygwin">Cygwin</a>
	or the <a href="ext:osx">Mac OS X</a> platform.</p>

	<p>The native hadoop library is mainly used on the GNU/Linus platform and
	has been tested on these distributions:</p>
	<ul>
	<li>
	<a href="http://www.redhat.com/rhel/">RHEL4</a>/<a href="http://fedora.redhat.com/">Fedora</a>
	</li>
	<li><a href="http://www.ubuntu.com/">Ubuntu</a></li>
	<li><a href="http://www.gentoo.org/">Gentoo</a></li>
	</ul>

	<p>On all the above distributions a 32/64 bit native hadoop library will work
	with a respective 32/64 bit jvm.</p>
	</section>

	<section>
	<title>Download</title>

	<p>The pre-built 32-bit i386-Linux native hadoop library is available as part of the
	hadoop distribution and is located in the <code>lib/native</code> directory. You can download the
	hadoop distribution from <a href="ext:releases/download">Hadoop Common Releases</a>.</p>

	<p>Be sure to install the zlib and/or gzip development packages - whichever compression
	codecs you want to use with your deployment.</p>
	</section>

	<section>
	<title>Build</title>

	<p>The native hadoop library is written in <a href="http://en.wikipedia.org/wiki/ANSI_C">ANSI C</a>
	and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool).
	This means it should be straight-forward to build the library on any platform with a standards-compliant
	C compiler and the GNU autotools-chain (see the <a href="#Supported+Platforms">supported platforms</a>).</p>

	<p>The packages you need to install on the target platform are:</p>
	<ul>
	<li>
	C compiler (e.g. <a href="http://gcc.gnu.org/">GNU C Compiler</a>)
	</li>
	<li>
	GNU Autools Chain:
	<a href="http://www.gnu.org/software/autoconf/">autoconf</a>,
	<a href="http://www.gnu.org/software/automake/">automake</a>,
	<a href="http://www.gnu.org/software/libtool/">libtool</a>
	</li>
	<li>
	zlib-development package (stable version >= 1.2.0)
	</li>
	</ul>

	<p>Once you installed the prerequisite packages use the standard hadoop <code>build.xml</code>
	file and pass along the <code>compile.native</code> flag (set to <code>true</code>) to build the native hadoop library:</p>

	<p><code>$ ant -Dcompile.native=true <target></code></p>

	<p>You should see the newly-built library in:</p>

	<p><code>$ build/native/<platform>/lib</code></p>

	<p>where <<code>platform</code>> is a combination of the system-properties:
	<code>${os.name}-${os.arch}-${sun.arch.data.model}</code> (for example, Linux-i386-32).</p>

	<p>Please note the following:</p>
	<ul>
	<li>
	It is <strong>mandatory</strong> to install both the zlib and gzip
	development packages on the target platform in order to build the
	native hadoop library; however, for deployment it is sufficient to
	install just one package if you wish to use only one codec.
	</li>
	<li>
	It is necessary to have the correct 32/64 libraries for zlib,
	depending on the 32/64 bit jvm for the target platform, in order to
	build and deploy the native hadoop library.
	</li>
	</ul>
	</section>

	<section>
	<title>Runtime</title>
	<p>The <code>bin/hadoop</code> script ensures that the native hadoop
	library is on the library path via the system property: <br/>
	<em>-Djava.library.path=<path></em></p>

	<p>During runtime, check the hadoop log files for your MapReduce tasks.</p>

	<ul>
	<li>If everything is all right, then:<br/><br/>
	<code> DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... </code><br/>
	<code> INFO util.NativeCodeLoader - Loaded the native-hadoop library </code><br/>
	</li>

	<li>If something goes wrong, then:<br/><br/>
	<code>
	INFO util.NativeCodeLoader - Unable to load native-hadoop library for
	your platform... using builtin-java classes where applicable
	</code>

	</li>
	</ul>
	</section>
	</section>

	<section>
	<title>Native Shared Libraries</title>
	<p>You can load <strong>any</strong> native shared library using
	<a href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#DistributedCache">DistributedCache</a>
	for <em>distributing</em> and <em>symlinking</em> the library files.</p>

	<p>This example shows you how to distribute a shared library, <code>mylib.so</code>,
	and load it from a MapReduce task.</p>
	<ol>
	<li> First copy the library to the HDFS: <br/>
	<code>bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</code>
	</li>
	<li> The job launching program should contain the following: <br/>
	<code> DistributedCache.createSymlink(conf); </code> <br/>
	<code> DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so.1#mylib.so", conf);
	</code>
	</li>
	<li> The MapReduce task can contain: <br/>
	<code> System.loadLibrary("mylib.so"); </code>
	</li>
	</ol>

	<p><br/><strong>Note:</strong> If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to
	make the library available to your MapReduce tasks.</p>
	</section>
	</body>

	</document>