blob: 5ad50ab6356720a2eed325d0c9ae4cc79674c229 [file] [log] [blame]
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
C API libhdfs
---
---
${maven.build.timestamp}
C API libhdfs
%{toc|section=1|fromDepth=0}
* Overview
libhdfs is a JNI based C API for Hadoop's Distributed File System
(HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate
HDFS files and the filesystem. libhdfs is part of the Hadoop
distribution and comes pre-compiled in
<<<${HADOOP_PREFIX}/libhdfs/libhdfs.so>>> .
* The APIs
The libhdfs APIs are a subset of: {{{hadoop fs APIs}}}.
The header file for libhdfs describes each API in detail and is
available in <<<${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h>>>
* A Sample Program
----
\#include "hdfs.h"
int main(int argc, char **argv) {
hdfsFS fs = hdfsConnect("default", 0);
const char* writePath = "/tmp/testfile.txt";
hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
if(!writeFile) {
fprintf(stderr, "Failed to open %s for writing!\n", writePath);
exit(-1);
}
char* buffer = "Hello, World!";
tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
if (hdfsFlush(fs, writeFile)) {
fprintf(stderr, "Failed to 'flush' %s\n", writePath);
exit(-1);
}
hdfsCloseFile(fs, writeFile);
}
----
* How To Link With The Library
See the Makefile for <<<hdfs_test.c>>> in the libhdfs source directory
(<<<${HADOOP_PREFIX}/src/c++/libhdfs/Makefile>>>) or something like:
<<<gcc above_sample.c -I${HADOOP_PREFIX}/src/c++/libhdfs -L${HADOOP_PREFIX}/libhdfs -lhdfs -o above_sample>>>
* Common Problems
The most common problem is the <<<CLASSPATH>>> is not set properly when
calling a program that uses libhdfs. Make sure you set it to all the
Hadoop jars needed to run Hadoop itself. Currently, there is no way to
programmatically generate the classpath, but a good bet is to include
all the jar files in <<<${HADOOP_PREFIX}>>> and <<<${HADOOP_PREFIX}/lib>>> as well
as the right configuration directory containing <<<hdfs-site.xml>>>
* Thread Safe
libdhfs is thread safe.
* Concurrency and Hadoop FS "handles"
The Hadoop FS implementation includes a FS handle cache which
caches based on the URI of the namenode along with the user
connecting. So, all calls to <<<hdfsConnect>>> will return the same
handle but calls to <<<hdfsConnectAsUser>>> with different users will
return different handles. But, since HDFS client handles are
completely thread safe, this has no bearing on concurrency.
* Concurrency and libhdfs/JNI
The libhdfs calls to JNI should always be creating thread local
storage, so (in theory), libhdfs should be as thread safe as the
underlying calls to the Hadoop FS.