To build the integration, pass the following option to CMake
-DARROW_HDFS=on
For convenience, we have bundled hdfs.h
for libhdfs from Apache Hadoop in Arrow's thirdparty. If you wish to build against the hdfs.h
in your installed Hadoop distribution, set the $HADOOP_HOME
environment variable.
By default, the HDFS client C++ class in libarrow_io
uses the libhdfs JNI interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables.
HADOOP_HOME
: the root of your installed Hadoop distribution. Often has lib/native/libhdfs.so
.JAVA_HOME
: the location of your Java SDK installation.CLASSPATH
: must contain the Hadoop jars. You can set these using:export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
ARROW_LIBHDFS_DIR
(optional): explicit location of libhdfs.so
if it is installed somewhere other than $HADOOP_HOME/lib/native
.To accommodate distribution-specific nuances, the JAVA_HOME
variable may be set to the root path for the Java SDK, the JRE path itself, or to the directory containing the libjvm
library.
The installed location of Java on OS X can vary, however the following snippet will set it automatically for you:
export JAVA_HOME=$(/usr/libexec/java_home)
Homebrew‘s Hadoop does not have native libs. Apache doesn’t build these, so users must build Hadoop to get the native libs. See this Stack Overflow answer for details:
http://stackoverflow.com/a/40051353/478288
Be sure to include the path to the native libs in JAVA_LIBRARY_PATH
:
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
If you get an error about needing to install Java 6, then add BundledApp and JNI to the JVMCapabilities
in $JAVA_HOME/../Info.plist
. See
https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/