blob: a410b9a5b5f615895297c1608b2d561a5cc5ee79 [file] [view]
A distributed file system that provides high-throughput access to application data.
## Capabilities
This service can be used to:
- [x] create_dir
- [x] stat
- [x] read
- [x] write
- [x] delete
- [x] list
- [ ] copy
- [x] rename
- [ ] ~~presign~~
## Differences with webhdfs
[Webhdfs][crate::services::Webhdfs] is powered by hdfs's RESTful HTTP API.
## Features
HDFS support needs to enable feature `services-hdfs`.
## Configuration
- `root`: Set the work dir for backend.
- `name_node`: Set the name node for backend.
- `kerberos_ticket_cache_path`: Set the kerberos ticket cache path for backend, this should be gotten by `klist` after `kinit`
- `user`: Set the user for backend
- `enable_append`: enable the append capacity. Default is false.
Refer to [`HdfsBuilder`]'s public API docs for more information.
## Environment
HDFS needs some environment set correctly.
- `JAVA_HOME`: the path to java home, could be found via `java -XshowSettings:properties -version`
- `HADOOP_HOME`: the path to hadoop home, opendal relays on this env to discover hadoop jars and set `CLASSPATH` automatically.
Most of the time, setting `JAVA_HOME` and `HADOOP_HOME` is enough. But there are some edge cases:
- If meeting errors like the following:
```shell
error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory
```
Java's lib are not including in pkg-config find path, please set `LD_LIBRARY_PATH`:
```shell
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
```
The path of `libjvm.so` could be different, please keep an eye on it.
- If meeting errors like the following:
```shell
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
```
`CLASSPATH` is not set correctly or your hadoop installation is incorrect.
To set `CLASSPATH`:
```shell
export CLASSPATH=$(find $HADOOP_HOME -iname "*.jar" | xargs echo | tr ' ' ':'):${CLASSPATH}
```
- If HDFS has High Availability (HA) enabled with multiple available NameNodes, some configuration is required:
1. Obtain the entire HDFS config folder (usually located at HADOOP_HOME/etc/hadoop).
2. Set the environment variable HADOOP_CONF_DIR to the path of this folder.
```shell
export HADOOP_CONF_DIR=<path of the config folder>
```
3. Append the HADOOP_CONF_DIR to the `CLASSPATH`
```shell
export CLASSPATH=$HADOOP_CONF_DIR:$HADOOP_CLASSPATH:$CLASSPATH
```
4. Use the `cluster_name` specified in the `core-site.xml` file (located in the HADOOP_CONF_DIR folder) to replace namenode:port.
```ignore
builder.name_node("hdfs://cluster_name");
```
### macOS Specific Note
If you encounter an issue during the build process on macOS with an error message similar to:
```shell
ld: unknown file type in $HADOOP_HOME/lib/native/libhdfs.so.0.0.0
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
This error is likely due to the fact that the official Hadoop build includes the libhdfs.so file for the x86-64 architecture, which is not compatible with aarch64 architecture required for MacOS.
To resolve this issue, you can add hdrs as a dependency in your Rust application's Cargo.toml file, and enable the vendored feature:
```toml
[dependencies]
hdrs = { version = "<version_number>", features = ["vendored"] }
```
Enabling the vendored feature ensures that hdrs includes the necessary libhdfs.so library built for the correct architecture.
## Example
### Via Builder
```rust,no_run
use opendal_core::Operator;
use opendal_service_hdfs::Hdfs;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create fs backend builder.
let mut builder = Hdfs::default()
// Set the name node for hdfs.
// If the string starts with a protocol type such as file://, hdfs://, or gs://, this protocol type will be used.
.name_node("hdfs://127.0.0.1:9000")
// Set the root for hdfs, all operations will happen under this root.
//
// NOTE: the root must be absolute path.
.root("/tmp")
// Enable the append capacity for hdfs.
//
// Note: HDFS run in non-distributed mode doesn't support append.
.enable_append(true);
// `Accessor` provides the low level APIs, we will use `Operator` normally.
let op: Operator = Operator::new(builder)?.finish();
Ok(())
}
```