| --- |
| layout: post |
| status: PUBLISHED |
| published: true |
| title: 'HGraphDB: Apache HBase As An Apache TinkerPop Graph Database' |
| id: fa1b2a57-54d8-4e19-8caf-06dd8a5d6b66 |
| date: '2016-12-29 20:19:49 -0500' |
| categories: hbase |
| tags: [] |
| permalink: hbase/entry/hgraphdb_hbase_as_a_tinkerpop |
| --- |
| <p style="text-align: justify;">Robert Yokota is a Software Engineer at Yammer.</p> |
| <p style="text-align: justify;">An earlier version of this post was published <a href="https://yokota.blog/2016/11/10/hgraphdb-hbase-as-a-tinkerpop-graph-database/">here</a> on Robert's blog.</p></p> |
| <p style="text-align: justify;">Be sure to also check out the excellent follow on post <a href="https://yokota.blog/2016/12/13/graph-analytics-on-hbase-with-hgraphdb-and-giraph/">Graph Analytics on HBase with HGraphDB and Giraph</a>.</p></p> |
| <hr /> |
| <h1>HGraphDB: Apache HBase As An Apache TinkerPop Graph Database</h1> |
| <p style="text-align: justify;"> |
| The use of graph databases is common among social networking companies. A social network can easily be represented as a graph model, so a graph database is a natural fit. For instance, Facebook has a graph database called <a href="https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920/">Tao</a>, Twitter has <a href="https://github.com/twitter/flockdb">FlockDB</a>, and Pinterest has <a href="http://www.slideshare.net/InfoQ/zen-pinterests-graph-storage-service">Zen</a>. At Yammer, an enterprise social network, we rely on Apache HBase for much of our messaging infrastructure, so I decided to see if HBase could also be used for some graph modelling and analysis.</p> |
| <p style="text-align: justify;"> |
| Below I put together a wish list of what I wanted to see in a graph database.</p> |
| <ul> |
| <li>It should be implemented directly on top of HBase.</li> |
| <li>It should support the <a href="http://tinkerpop.apache.org/">TinkerPop 3 API</a>.</li> |
| <li>It should allow the user to supply IDs for both vertices and edges.</li> |
| <li>It should allow user-supplied IDs to be either strings or numbers.</li> |
| <li>It should allow property values to be of arbitrary type, including maps, arrays, and serializable objects.</li> |
| <li>It should support indexing vertices by label and property.</li> |
| <li>It should support indexing edges by label and property, specific to a given vertex.</li> |
| <li>It should support range queries and pagination with both vertex indices and edge indices.</li> |
| </ul> |
| <p><span style="text-align: justify;">I did not find a graph database that met all of the above criteria. For instance, </span><a href="https://github.com/thinkaurelius/titan" style="text-align: justify;">Titan</a><span style="text-align: justify;"> is a graph database that supports the TinkerPop API, but it is not implemented directly on HBase. Rather, it is implemented on top of an abstraction layer that can be integrated with Apache HBase, Apache Cassandra, or Berkeley DB as its underlying store. Also, Titan does not support user-supplied IDs. </span><a href="https://s2graph.incubator.apache.org/" style="text-align: justify;">Apache S2Graph Incubating</a><span style="text-align: justify;"> is a graph database that is implemented directly on HBase, and it supports both user-supplied IDs and indices on edges, but it does not yet support the TinkerPop API nor does it support indices on vertices.</span></p> |
| <p style="text-align: justify;"> |
| This led me to create <strong><a href="https://github.com/rayokota/hgraphdb">HGraphDB</a></strong>, a TinkerPop 3 layer for HBase. It provides support for all of the above bullet points. Feel free to try it out if you are interested in using HBase as a graph database.</p> |