Using Arrow's HDFS (Apache Hadoop Distributed File System) interface

Build requirements

To build the integration, pass the following option to CMake

-DARROW_HDFS=on

For convenience, we have bundled hdfs.h for libhdfs from Apache Hadoop in Arrow's thirdparty. If you wish to build against the hdfs.h in your installed Hadoop distribution, set the $HADOOP_HOME environment variable.

Runtime requirements

By default, the HDFS client C++ class in libarrow_io uses the libhdfs JNI interface to the Java Hadoop client. This library is loaded at runtime (rather than at link / library load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables.

  • HADOOP_HOME: the root of your installed Hadoop distribution. Often has lib/native/libhdfs.so.
  • JAVA_HOME: the location of your Java SDK installation.
  • CLASSPATH: must contain the Hadoop jars. You can set these using:
export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
  • ARROW_LIBHDFS_DIR (optional): explicit location of libhdfs.so if it is installed somewhere other than $HADOOP_HOME/lib/native.

To accommodate distribution-specific nuances, the JAVA_HOME variable may be set to the root path for the Java SDK, the JRE path itself, or to the directory containing the libjvm library.

Mac Specifics

The installed location of Java on OS X can vary, however the following snippet will set it automatically for you:

export JAVA_HOME=$(/usr/libexec/java_home)

Homebrew‘s Hadoop does not have native libs. Apache doesn’t build these, so users must build Hadoop to get the native libs. See this Stack Overflow answer for details:

http://stackoverflow.com/a/40051353/478288

Be sure to include the path to the native libs in JAVA_LIBRARY_PATH:

export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

If you get an error about needing to install Java 6, then add BundledApp and JNI to the JVMCapabilities in $JAVA_HOME/../Info.plist. See

https://oliverdowling.com.au/2015/10/09/oracles-jre-8-on-mac-os-x-el-capitan/

https://derflounder.wordpress.com/2015/08/08/modifying-oracles-java-sdk-to-run-java-applications-on-os-x/