tree: 267b967a009c9e8bc33ae237b77b1f117319e475 [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
hbase-oss/README.md

HBOSS: HBase / Object Store Semantics adapter

Introduction

This module provides an implementation of Apache Hadoop's FileSystem interface that bridges the gap between Apache HBase, which assumes that many operations are atomic, and object-store implementations of FileSystem (such as s3a) which inherently cannot provide atomic semantics to those operations natively.

This is implemented separately from s3a so that it can potentially be used for other object stores. It is also impractical to provide the required semantics for the general case without significant drawbacks in some cases. A separate implementation allows all trade-offs to be made on HBase's terms.

Lock Implementations

TreeLockManager implements generic logic for managing read / write locks on branches of filesystem hierarchies, but needs to be extended by an implementation that provides individual read / write locks and methods to traverse the tree.

The desired implementation must be configured by setting to one of the values below:

fs.hboss.sync.impl

Null Implementation (org.apache.hadoop.hbase.oss.sync.NullTreeLockManager)

The null implementation just provides no-op methods instead of actual locking operations. This functions as an easy way to verify that a test case has successfully reproduced a problem that is hidden by the other implementations.

Local Implementation (org.apache.hadoop.hbase.oss.sync.LocalTreeLockManager)

Primarily intended to help with development and validation, but could possibly work for a standalone instance of HBase. This implementation uses Java's built-in ReentrantReadWriteLock.

ZooKeeper Implementation (org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager)

This implementation is intended for production use once it is stable. It uses Apache Curator's implementation of read / write locks on Apache ZooKeeper. It could share a ZooKeeper ensemble with the HBase cluster.

At a minimum, you must configure the ZooKeeper connection string (including root znode):

fs.hboss.sync.zk.connectionString

You may also want to configure:

fs.hboss.sync.zk.sleep.base.ms (default 1000)
fs.hboss.sync.zk.sleep.max.retries (default 3)

DynamoDB Implementation (not implemented)

An implementation based on Amazon‘s DynamoDB lock library was considered but was not completed due to the lack of an efficient way to traverse the tree and discover locks on child nodes. The benefit is that S3Guard is required for s3a use and as such there’s a dependency on DynamoDB anyway.

Storage Implementations

Currently HBOSS is primarily designed for and exclusively tested with Hadoop's s3a client against Amazon S3. S3Guard must be enabled. Both this requirement and the use of an external data store for locking have serious implications if any other client accesses the same data.

In theory, HBOSS could also work well with Google's cloud storage client (gs) or other object storage clients.

FileSystem Instantiation

There are 2 ways to get an HBOSS instance. It can be instantiated directly and given a URI and Configuration object that can be used to get the underlying FileSystem:

Configuration conf = new Configuration();
FileSystem hboss = new HBaseObjectStoreSemantics();
hboss.initialize("s3a://bucket/", conf);

If the application code cannot be changed, you can remap the object store client's schema to the HBOSS implementation, and set fs.hboss.fs.<scheme>.impl to the underlying implementation that HBOSS should wrap:

Configuration conf = new Configuration();
conf.set("fs.hboss.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
conf.set("fs.s3a.impl", "org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics");
FileSystem hboss = FileSystem.get("s3a://bucket/", conf);

Testing

You can quickly run HBOSS's tests with any of the implementations:

mvn verify -Pnull  # reproduce the problems
mvn verify -Plocal # useful for debugging
mvn verify -Pzk    # the default

If the ‘zk’ profile is activated, it will start an embedded ZooKeeper process. The tests can also be run against a distributed ZooKeeper ensemble by setting fs.hboss.sync.zk.connectionString in src/test/resources/core-site.xml.

By default, the tests will also be run against a mock S3 client that works on in-memory data structures. One can also set fs.hboss.data.uri to point to any other storage in src/test/resources/core-site.xml.

Any required credentials or other individal configuration should be set in src/test/resources/auth-keys.xml, which should be ignored by source control.