blob: 23ff678ba5178a00c7eec083bc34953d4e9bd90d [file] [log] [blame]
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
C API libhdfs
---
---
${maven.build.timestamp}
C API libhdfs
%{toc|section=1|fromDepth=0}
* Overview
libhdfs is a JNI based C API for Hadoop's Distributed File System
(HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate
HDFS files and the filesystem. libhdfs is part of the Hadoop
distribution and comes pre-compiled in
<<<${HADOOP_HDFS_HOME}/lib/native/libhdfs.so>>> . libhdfs is compatible with
Windows and can be built on Windows by running <<<mvn compile>>> within the
<<<hadoop-hdfs-project/hadoop-hdfs>>> directory of the source tree.
* The APIs
The libhdfs APIs are a subset of the
{{{../../api/org/apache/hadoop/fs/FileSystem.html}Hadoop FileSystem APIs}}.
The header file for libhdfs describes each API in detail and is
available in <<<${HADOOP_HDFS_HOME}/include/hdfs.h>>>.
* A Sample Program
----
\#include "hdfs.h"
int main(int argc, char **argv) {
hdfsFS fs = hdfsConnect("default", 0);
const char* writePath = "/tmp/testfile.txt";
hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
if(!writeFile) {
fprintf(stderr, "Failed to open %s for writing!\n", writePath);
exit(-1);
}
char* buffer = "Hello, World!";
tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
if (hdfsFlush(fs, writeFile)) {
fprintf(stderr, "Failed to 'flush' %s\n", writePath);
exit(-1);
}
hdfsCloseFile(fs, writeFile);
}
----
* How To Link With The Library
See the CMake file for <<<test_libhdfs_ops.c>>> in the libhdfs source
directory (<<<hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt>>>) or
something like:
<<<gcc above_sample.c -I${HADOOP_HDFS_HOME}/include -L${HADOOP_HDFS_HOME}/lib/native -lhdfs -o above_sample>>>
* Common Problems
The most common problem is the <<<CLASSPATH>>> is not set properly when
calling a program that uses libhdfs. Make sure you set it to all the
Hadoop jars needed to run Hadoop itself as well as the right configuration
directory containing <<<hdfs-site.xml>>>. It is not valid to use wildcard
syntax for specifying multiple jars. It may be useful to run
<<<hadoop classpath --glob>>> or <<<hadoop classpath --jar <path>>>> to
generate the correct classpath for your deployment. See
{{{../hadoop-common/CommandsManual.html#classpath}Hadoop Commands Reference}}
for more information on this command.
* Thread Safe
libdhfs is thread safe.
* Concurrency and Hadoop FS "handles"
The Hadoop FS implementation includes a FS handle cache which
caches based on the URI of the namenode along with the user
connecting. So, all calls to <<<hdfsConnect>>> will return the same
handle but calls to <<<hdfsConnectAsUser>>> with different users will
return different handles. But, since HDFS client handles are
completely thread safe, this has no bearing on concurrency.
* Concurrency and libhdfs/JNI
The libhdfs calls to JNI should always be creating thread local
storage, so (in theory), libhdfs should be as thread safe as the
underlying calls to the Hadoop FS.