To build the integration, pass the following option to CMake
-DARROW_HDFS=on
For convenience, we have bundled hdfs.h
for libhdfs from Apache Hadoop in Arrow's thirdparty. If you wish to build against the hdfs.h
in your installed Hadoop distribution, set the $HADOOP_HOME
environment variable.
By default, the HDFS client C++ class in libarrow_io
uses the libhdfs JNI interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables.
HADOOP_HOME
: the root of your installed Hadoop distribution. Check in the lib/native
directory to look for libhdfs.so
if you have any questions about which directory you're after.JAVA_HOME
: the location of your Java SDK installationCLASSPATH
: must contain the Hadoop jars. You can set these using:export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
The installed location of Java on OS X can vary, however the following snippet will set it automatically for you:
export JAVA_HOME=$(/usr/libexec/java_home)