blob: 3d714004800fd00776d241267f75679734331b2f [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by Apache Maven Doxia at 2018-08-13 -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Apache James Project &#x2013; Mailbox HBase</title>
<style type="text/css" media="all">
@import url("../css/james.css");
@import url("../css/maven-base.css");
@import url("../css/maven-theme.css");
@import url("../css/site.css");
@import url("../js/jquery/css/custom-theme/jquery-ui-1.8.5.custom.css");
@import url("../js/jquery/css/print.css");
@import url("../js/fancybox/jquery.fancybox-1.3.4.css");
</style>
<script type="text/javascript" src="../js/jquery/js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="../js/jquery/js/jquery-ui-1.8.5.custom.min.js"></script>
<script type="text/javascript" src="../js/fancybox/jquery.fancybox-1.3.4.js"></script>
<link rel="stylesheet" href="../css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20180813" />
<meta http-equiv="Content-Language" content="en" />
<!-- Google Analytics -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1384591-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script').item(0); s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body class="composite">
<div id="banner">
<a href="../index.html" id="bannerLeft" title="james-logo.png">
<img src="../images/logos/james-logo.png" alt="James Project" />
</a>
<a href="http://www.apache.org/index.html" id="bannerRight">
<img src="images/logos/asf_logo_small.png" alt="The Apache Software Foundation" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<span id="publishDate">Last Published: 2018-08-13</span>
</div>
<div class="xright"> <a href="../index.html" title="Home">Home</a>
|
<a href="../documentation.html" title="James">James</a>
|
<a href="../mime4j/index.html" title="Mime4J">Mime4J</a>
|
<a href="../jsieve/index.html" title="jSieve">jSieve</a>
|
<a href="../jspf/index.html" title="jSPF">jSPF</a>
|
<a href="../jdkim/index.html" title="jDKIM">jDKIM</a>
|
<a href="../hupa/index.html" title="Hupa">Hupa</a>
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>James components</h5>
<ul>
<li class="collapsed">
<a href="../documentation.html" title="About James">About James</a>
</li>
<li class="collapsed">
<a href="../server/index.html" title="Server">Server</a>
</li>
<li class="collapsed">
<a href="../mailet/index.html" title="Mailets">Mailets</a>
</li>
<li class="expanded">
<a href="../mailbox/index.html" title="Mailbox">Mailbox</a>
<ul>
<li class="none">
<a href="../mailbox/source-code.html" title="Source Code">Source Code</a>
</li>
<li class="none">
<a href="../mailbox/apidocs/index.html" title="Javadoc">Javadoc</a>
</li>
<li class="none">
<a href="https://issues.apache.org/jira/browse/MAILBOX" title="Issue Tracker">Issue Tracker</a>
</li>
<li class="expanded">
<a href="../mailbox/mailbox-api.html" title="Framework">Framework</a>
<ul>
<li class="none">
<a href="../mailbox/mailbox-store.html" title="Mailbox Store">Mailbox Store</a>
</li>
<li class="none">
<a href="../mailbox/mailbox-tool.html" title="Mailbox Tool">Mailbox Tool</a>
</li>
</ul>
</li>
<li class="expanded">
<a href="../mailbox/index.html" title="Implementations">Implementations</a>
<ul>
<li class="none">
<a href="../mailbox/mailbox-memory.html" title="Mailbox Memory">Mailbox Memory</a>
</li>
<li class="none">
<a href="../mailbox/mailbox-cassandra.html" title="Mailbox Cassandra">Mailbox Cassandra</a>
</li>
<li class="none">
<a href="../mailbox/mailbox-maildir.html" title="Mailbox Maildir">Mailbox Maildir</a>
</li>
<li class="none">
<a href="../mailbox/mailbox-jpa.html" title="Mailbox JPA">Mailbox JPA</a>
</li>
<li class="none">
<a href="../mailbox/mailbox-jcr.html" title="Mailbox JCR">Mailbox JCR</a>
</li>
<li class="none">
<strong>Mailbox HBase</strong>
</li>
</ul>
</li>
<li class="none">
<a href="../mailbox/mailbox-spring.html" title="Wiring">Wiring</a>
</li>
<li class="none">
<a href="../download.cgi" title="Download releases">Download releases</a>
</li>
</ul>
</li>
<li class="collapsed">
<a href="../protocols/index.html" title="Protocols">Protocols</a>
</li>
<li class="collapsed">
<a href="../mpt/index.html" title="MPT">MPT</a>
</li>
</ul>
<h5>Apache Software Foundation</h5>
<ul>
<li>
<strong>
<a title="ASF" href="http://www.apache.org/">ASF</a>
</strong>
</li>
<li>
<a title="Get Involved" href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a>
</li>
<li>
<a title="FAQ" href="http://www.apache.org/foundation/faq.html">FAQ</a>
</li>
<li>
<a title="License" href="http://www.apache.org/licenses/" >License</a>
</li>
<li>
<a title="Sponsorship" href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
</li>
<li>
<a title="Thanks" href="http://www.apache.org/foundation/thanks.html">Thanks</a>
</li>
<li>
<a title="Security" href="http://www.apache.org/security/">Security</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="poweredBy" alt="Built by Maven" src="../images/logos/maven-feather.png" />
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<div class="section">
<h2><a name="Mailbox_HBase_Responsibility"></a>Mailbox HBase Responsibility</h2>
<p>This module provides a mailbox implementation for persisting mailboxes (messages, and subscriptions) in a HBase cluster.</p>
<p>It only supports the Basic capability.</p>
</div>
<div class="section">
<h2><a name="Overview"></a>Overview</h2>
<p>
This should provide an overview of the design and implementation of Mailbox HBase.
</p>
<div class="section">
<h3><a name="Tables"></a>Tables</h3>
<p>The current implementations stores Messages, Mailboxes and Subscriptions in their own tables.</p> There are:
<ul>
<li>JAMES_MAILBOXES - for storing mailboxes.</li>
<li>JAMES_MESSAGES - for storing messages.</li>
<li>JAMES_SUBSCRIPTIONS - for storing user subscriptions.</li>
</ul>
</div>
<div class="section">
<h3><a name="Mailbox_UID_generation"></a>Mailbox UID generation</h3>
<p>Mailboxes are identified using a unique
<a class="externalLink" href="http://download.oracle.com/javase/6/docs/api/java/util/UUID.html">UUID</a>
</p>
</div>
<div class="section">
<h3><a name="Message_UID_generation"></a>Message UID generation</h3>
<p>The IMAP RFC states that mailboxes should keep message UIDs unique and in ascending order. Mailbox HBase uses
<a class="externalLink" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],%20byte[],%20byte[],%20long)">incrementColumnValue</a>
int the HBaseUidProvider implementation to achieve this.
</p>
</div>
<div class="section">
<h3><a name="HBase_row_keys"></a>HBase row keys</h3>
HBase uses keys to access values. The current design uses the following row key structure:
<ul>
<li>JAMES_MAILBOXES: row key is mailbox UUID</li>
<li>JAMES_MESSAGES: row key is compound by concatenating mailbox UID and message UID (in reverseorder).
This way we have messages groupd by mailbox and in descending order (most recent first).
</li>
<li>JAMES_SUBSCRIPTION: row key is user name.</li>
</ul>
</div>
<div class="section">
<h3><a name="Misc"></a>Misc</h3>
<p>Message bodies (more importantly big attachements) sent to many users are stored many times. There is no space sharing yet.</p>
<p>Message data and message meta-data (flags and properties) are stored in different column families
so the column family optimization options can apply. Keep in mind that message data does not change, while meta-data does change.
</p>
</div>
</div>
<div class="section">
<h2><a name="Installation"></a>Installation</h2>
<p>In order for the mailbox implementation to work you have to provide it with a link to your HBase cluster. Putting
<i>hbase-site.xml</i> on the class path should be enough. Mailbox HBase will pick it up an read all the configuration parameters from it.
</p>
</div>
<div class="section">
<h2><a name="Mailbox_HBase_Classes"></a>Mailbox HBase Classes</h2>
<p>This is a overview of the most important classes in the implementation. </p>
<div class="section">
<h3><a name="HBaseMailboxManager"></a>HBaseMailboxManager</h3>
<p>
<b>HBaseMailboxManager</b> extends the
<b>StoreMailboxManager</b> class.
It has a simple implementation that just overrides the
<i>doCreateMailbox</i> method to return a HBaseMailbox implementation and
<i>createMessageManger</i> method to return a HBaseMessageManager implementation.
Other then that it relies on the default StoreMailboxManager implementation.
</p>
</div>
<div class="section">
<h3><a name="HBaseMessageManager"></a>HBaseMessageManager</h3>
<p>
<b>HBaseMessageManager</b> extends StoreMailboxManager and provides an implementation for getPermanentFlags method.
</p>
</div>
<div class="section">
<h3><a name="Chunked_Streams"></a>Chunked Streams</h3>
<p>Message bodies can have varying sizes. Some have attachements of up to 25Mb, some even greater.
There are practical limits to the size of a HBase column (see
<a class="externalLink" href="http://hbase.apache.org/book.html#supported.datatypes">http://hbase.apache.org/book.html#supported.datatypes</a>).
To address this issue, the implementation splits the message into smaller chunks and saves each chunk into a separate column.
The columns have increasing integer names starting with 1 and there can be at most Long.MAX_VALUE chunks.
</p>
<p>
The magic happens in
<b>ChunkInputStream</b> and
<b>ChunkOutputStream</b> that extend
InputStream and OutputStream from java.io package.
<br />
Data is retrieved using HBase Get operation and stored into an internal byte array.
Data is stored using HBase Put operation and chunks are split into
<b>chunkSize</b> configurable sized chunks.
Things could be more efficient if HBase had streaming support.
</p>
</div>
<div class="section">
<h3><a name="HBaseMessage"></a>HBaseMessage</h3>
<p>Extends AbstractMessage and represents a message in the message store.
What is important to remember is that the current implementation retrieves just the message meta-data from HBase
and uses ChunkInputStream to load the message body only when needed.
</p>
</div>
</div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">Copyright &#169; 2006-2018
<a href="https://www.apache.org/">The Apache Software Foundation</a>.
All Rights Reserved.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>