| ~~ Licensed under the Apache License, Version 2.0 (the "License"); |
| ~~ you may not use this file except in compliance with the License. |
| ~~ You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. See accompanying LICENSE file. |
| |
| --- |
| C API libhdfs |
| --- |
| --- |
| ${maven.build.timestamp} |
| |
| C API libhdfs |
| |
| %{toc|section=1|fromDepth=0} |
| |
| * Overview |
| |
| libhdfs is a JNI based C API for Hadoop's Distributed File System |
| (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate |
| HDFS files and the filesystem. libhdfs is part of the Hadoop |
| distribution and comes pre-compiled in |
| <<<${HADOOP_PREFIX}/libhdfs/libhdfs.so>>> . |
| |
| * The APIs |
| |
| The libhdfs APIs are a subset of: {{{hadoop fs APIs}}}. |
| |
| The header file for libhdfs describes each API in detail and is |
| available in <<<${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h>>> |
| |
| * A Sample Program |
| |
| ---- |
| \#include "hdfs.h" |
| |
| int main(int argc, char **argv) { |
| |
| hdfsFS fs = hdfsConnect("default", 0); |
| const char* writePath = "/tmp/testfile.txt"; |
| hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); |
| if(!writeFile) { |
| fprintf(stderr, "Failed to open %s for writing!\n", writePath); |
| exit(-1); |
| } |
| char* buffer = "Hello, World!"; |
| tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); |
| if (hdfsFlush(fs, writeFile)) { |
| fprintf(stderr, "Failed to 'flush' %s\n", writePath); |
| exit(-1); |
| } |
| hdfsCloseFile(fs, writeFile); |
| } |
| ---- |
| |
| * How To Link With The Library |
| |
| See the Makefile for <<<hdfs_test.c>>> in the libhdfs source directory |
| (<<<${HADOOP_PREFIX}/src/c++/libhdfs/Makefile>>>) or something like: |
| <<<gcc above_sample.c -I${HADOOP_PREFIX}/src/c++/libhdfs -L${HADOOP_PREFIX}/libhdfs -lhdfs -o above_sample>>> |
| |
| * Common Problems |
| |
| The most common problem is the <<<CLASSPATH>>> is not set properly when |
| calling a program that uses libhdfs. Make sure you set it to all the |
| Hadoop jars needed to run Hadoop itself. Currently, there is no way to |
| programmatically generate the classpath, but a good bet is to include |
| all the jar files in <<<${HADOOP_PREFIX}>>> and <<<${HADOOP_PREFIX}/lib>>> as well |
| as the right configuration directory containing <<<hdfs-site.xml>>> |
| |
| * Thread Safe |
| |
| libdhfs is thread safe. |
| |
| * Concurrency and Hadoop FS "handles" |
| |
| The Hadoop FS implementation includes a FS handle cache which |
| caches based on the URI of the namenode along with the user |
| connecting. So, all calls to <<<hdfsConnect>>> will return the same |
| handle but calls to <<<hdfsConnectAsUser>>> with different users will |
| return different handles. But, since HDFS client handles are |
| completely thread safe, this has no bearing on concurrency. |
| |
| * Concurrency and libhdfs/JNI |
| |
| The libhdfs calls to JNI should always be creating thread local |
| storage, so (in theory), libhdfs should be as thread safe as the |
| underlying calls to the Hadoop FS. |