| <!--- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. See accompanying LICENSE file. |
| --> |
| |
| C API libhdfs |
| ============= |
| |
| <!-- MACRO{toc|fromDepth=0|toDepth=3} --> |
| |
| Overview |
| -------- |
| |
| libhdfs is a JNI based C API for Hadoop's Distributed File System (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and the filesystem. libhdfs is part of the Hadoop distribution and comes pre-compiled in `$HADOOP_HDFS_HOME/lib/native/libhdfs.so` . libhdfs is compatible with Windows and can be built on Windows by running `mvn compile` within the `hadoop-hdfs-project/hadoop-hdfs` directory of the source tree. |
| |
| The APIs |
| -------- |
| |
| The libhdfs APIs are a subset of the [Hadoop FileSystem APIs](../../api/org/apache/hadoop/fs/FileSystem.html). |
| |
| The header file for libhdfs describes each API in detail and is available in `$HADOOP_HDFS_HOME/include/hdfs.h`. |
| |
| A Sample Program |
| ---------------- |
| ```c |
| #include "hdfs.h" |
| |
| int main(int argc, char **argv) { |
| |
| hdfsFS fs = hdfsConnect("default", 0); |
| const char* writePath = "/tmp/testfile.txt"; |
| hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY |O_CREAT, 0, 0, 0); |
| if(!writeFile) { |
| fprintf(stderr, "Failed to open %s for writing!\n", writePath); |
| exit(-1); |
| } |
| char* buffer = "Hello, World!"; |
| tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); |
| if (hdfsFlush(fs, writeFile)) { |
| fprintf(stderr, "Failed to 'flush' %s\n", writePath); |
| exit(-1); |
| } |
| hdfsCloseFile(fs, writeFile); |
| } |
| ``` |
| |
| How To Link With The Library |
| ---------------------------- |
| |
| See the CMake file for `test_libhdfs_ops.c` in the libhdfs source directory (`hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt`) or something like: `gcc above_sample.c -I$HADOOP_HDFS_HOME/include -L$HADOOP_HDFS_HOME/lib/native -lhdfs -o above_sample` |
| |
| Common Problems |
| --------------- |
| |
| The most common problem is the `CLASSPATH` is not set properly when calling a program that uses libhdfs. Make sure you set it to all the Hadoop jars needed to run Hadoop itself as well as the right configuration directory containing `hdfs-site.xml`. |
| Wildcard entries in the `CLASSPATH` are now supported by libhdfs. |
| |
| Thread Safe |
| ----------- |
| |
| libhdfs is thread safe. |
| |
| * Concurrency and Hadoop FS "handles" |
| |
| The Hadoop FS implementation includes an FS handle cache which |
| caches based on the URI of the namenode along with the user |
| connecting. So, all calls to `hdfsConnect` will return the same |
| handle but calls to `hdfsConnectAsUser` with different users will |
| return different handles. But, since HDFS client handles are |
| completely thread safe, this has no bearing on concurrency. |
| |
| * Concurrency and libhdfs/JNI |
| |
| The libhdfs calls to JNI should always be creating thread local |
| storage, so (in theory), libhdfs should be as thread safe as the |
| underlying calls to the Hadoop FS. |
| |
| |