blob: c0918e695c964183848c0b6e20f3ba57a2e13696 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by Apache Maven Doxia at Apr 17, 2012 -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>
Mailbox HBase</title>
<style type="text/css" media="all">
@import url("./css/james.css");
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
@import url("./js/jquery/css/custom-theme/jquery-ui-1.8.5.custom.css");
@import url("./js/jquery/css/print.css");
@import url("./js/fancybox/jquery.fancybox-1.3.4.css");
</style>
<script type="text/javascript" src="./js/jquery/js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="./js/jquery/js/jquery-ui-1.8.5.custom.min.js"></script>
<script type="text/javascript" src="./js/fancybox/jquery.fancybox-1.3.4.js"></script>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20120417" />
<meta http-equiv="Content-Language" content="en" />
<link title="DOAP" rel="meta" type="application/rdf+xml" href="http://james.apache.org//doap_james-project.rdf"/>
<!-- Google Analytics -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1384591-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script').item(0); s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body class="composite">
<div id="banner">
<a href="../server/index.html" id="bannerLeft" title="james-server-logo.gif">
<img src="images/logos/james-server-logo.gif" alt="James Server" />
</a>
<a href="http://www.apache.org/index.html" id="bannerRight">
<img src="images/logos/asf-logo-reduced.gif" alt="The Apache Software Foundation" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<span id="publishDate">Last Published: 2012-04-17</span>
</div>
<div class="xright"> <a href="../index.html" title="Home">Home</a>
|
<a href="../server/index.html" title="Server">Server</a>
|
<a href="../hupa/index.html" title="Hupa">Hupa</a>
|
<a href="../protocols/index.html" title="Protocols">Protocols</a>
|
<a href="../imap/index.html" title="IMAP">IMAP</a>
|
<a href="../mailet/index.html" title="Mailets">Mailets</a>
|
<a href="index.html" title="Mailbox">Mailbox</a>
|
<a href="../mime4j/index.html" title="Mime4J">Mime4J</a>
|
<a href="../jsieve/index.html" title="jSieve">jSieve</a>
|
<a href="../jspf/index.html" title="jSPF">jSPF</a>
|
<a href="../jdkim/index.html" title="jDKIM">jDKIM</a>
|
<a href="../mpt/index.html" title="MPT">MPT</a>
|
<a href="../postage/index.html" title="Postage">Postage</a>
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>Mailbox</h5>
<ul>
<li class="none">
<a href="index.html" title="Introduction">Introduction</a>
</li>
<li class="none">
<a href="source-code.html" title="Source Code">Source Code</a>
</li>
</ul>
<h5>Framework</h5>
<ul>
<li class="none">
<a href="mailbox-api.html" title="Mailbox API">Mailbox API</a>
</li>
<li class="none">
<a href="mailbox-store.html" title="Mailbox Store">Mailbox Store</a>
</li>
<li class="none">
<a href="mailbox-tool.html" title="Mailbox Tool">Mailbox Tool</a>
</li>
</ul>
<h5>Implementations</h5>
<ul>
<li class="none">
<a href="mailbox-memory.html" title="Mailbox Memory">Mailbox Memory</a>
</li>
<li class="none">
<a href="mailbox-maildir.html" title="Mailbox Maildir">Mailbox Maildir</a>
</li>
<li class="none">
<a href="mailbox-jpa.html" title="Mailbox JPA">Mailbox JPA</a>
</li>
<li class="none">
<a href="mailbox-jcr.html" title="Mailbox JCR">Mailbox JCR</a>
</li>
<li class="none">
<strong>Mailbox HBase</strong>
</li>
</ul>
<h5>Wiring</h5>
<ul>
<li class="none">
<a href="mailbox-spring.html" title="Spring">Spring</a>
</li>
<li class="none">
<a href="mailbox-guice.html" title="Guice">Guice</a>
</li>
</ul>
<h5>References</h5>
<ul>
<li class="none">
<a href="apidocs/index.html" title="Javadoc">Javadoc</a>
</li>
<li class="none">
<a href="https://issues.apache.org/jira/browse/MAILBOX" title="Issue Tracker">Issue Tracker</a>
</li>
</ul>
<h5>About James</h5>
<ul>
<li class="none">
<a href="../index.html" title="Overview">Overview</a>
</li>
<li class="none">
<a href="../newsarchive.html" title="News">News</a>
</li>
<li class="none">
<a href="../mail.html" title="Mailing Lists">Mailing Lists</a>
</li>
<li class="none">
<a href="../contribute.html" title="Contributing">Contributing</a>
</li>
<li class="none">
<a href="../guidelines.html" title="Guidelines">Guidelines</a>
</li>
<li class="none">
<a href="http://wiki.apache.org/james" title="Wiki">Wiki</a>
</li>
<li class="none">
<a href="../team-list.html" title="Who We Are">Who We Are</a>
</li>
<li class="none">
<a href="../license.html" title="License">License</a>
</li>
</ul>
<h5>Download</h5>
<ul>
<li class="none">
<a href="../download.cgi" title="Releases">Releases</a>
</li>
<li class="none">
<a href="https://repository.apache.org/content/repositories/snapshots/org/apache/james/" title=""></a>
</li>
</ul>
<h5>Apache Software Foundation</h5>
<ul>
<li>
<strong>
<a title="ASF" href="http://www.apache.org/">ASF</a>
</strong>
</li>
<li>
<a title="Get Involved" href="http://www.apache.org/foundation/getinvolved.html">Get Involved</a>
</li>
<li>
<a title="FAQ" href="http://www.apache.org/foundation/faq.html">FAQ</a>
</li>
<li>
<a title="License" href="http://www.apache.org/licenses/" >License</a>
</li>
<li>
<a title="Sponsorship" href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
</li>
<li>
<a title="Thanks" href="http://www.apache.org/foundation/thanks.html">Thanks</a>
</li>
<li>
<a title="Security" href="http://www.apache.org/security/">Security</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="poweredBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!-- Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License. -->
<div class="section"><h2>Mailbox HBase Responsibility<a name="Mailbox_HBase_Responsibility"></a></h2>
This module provides a mailbox implementation for peristing mailboxes (messages, and subscriptions) in a HBase cluster.
</div>
<div class="section"><h2>Overview<a name="Overview"></a></h2>
<p>
This should provide an overview of the design and implementation of Mailbox HBase.
<!-- The current design is illustrated in <img src="apache-james-mailbox-hbase/images/james-hbase-mailbox-schema.svg"/>. It is generated from a MindMap file that you can find in the sources. -->
</p>
<div class="section"><h3>Tables<a name="Tables"></a></h3>
<p>The current implementations stores Messages, Mailboxes and Subscriptions in their own tables.</p> There are:
<ul>
<li>JAMES_MAILBOXES - for storing mailboxes.</li>
<li>JAMES_MESSAGES - for storing messages.</li>
<li>JAMES_SUBSCRIPTIONS - for storing user subscriptions.</li>
</ul>
</div>
<div class="section"><h3>Mailbox UID generation<a name="Mailbox_UID_generation"></a></h3>
<p>Mailboxes are identified using a unique
<a class="externalLink" href="http://download.oracle.com/javase/6/docs/api/java/util/UUID.html">UUID</a>
</p>
</div>
<div class="section"><h3>Message UID generation<a name="Message_UID_generation"></a></h3>
<p>The IMAP RFC states that mailboxes should keep message UIDs unique and in ascending order. Mailbox HBase uses
<a class="externalLink" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],%20byte[],%20byte[],%20long)">incrementColumnValue</a>
int the HBaseUidProvider implementation to achieve this.
</p>
</div>
<div class="section"><h3>HBase row keys<a name="HBase_row_keys"></a></h3>
HBase uses keys to access values. The current design uses the following row key structure:
<ul>
<li>JAMES_MAILBOXES: row key is mailbox UUID</li>
<li>JAMES_MESSAGES: row key is compound by concatenating mailbox UID and message UID (in reverseorder).
This way we have messages groupd by mailbox and in descending order (most recent first).
</li>
<li>JAMES_SUBSCRIPTION: row key is user name.</li>
</ul>
</div>
<div class="section"><h3>Misc<a name="Misc"></a></h3>
<p>Message bodyes (more importantly big attachements) sent to many users are stored many times. There is no space sharing yet.</p>
<p>Message data and mssage meta-data (flags and properties) are stored in different column families
so the column family optimization options can apply. Keep in mind that message data does not change, while meta-data does change.
</p>
</div>
</div>
<div class="section"><h2>Installation<a name="Installation"></a></h2>
<p>In order for the mailbox implementation to work you have to provide it with a link to your HBase cluster. Putting
<i>hbase-site.xml</i> on the class path should be enough. Mailbox HBase will pick it up an read all the configuration parameters from it.
</p>
</div>
<div class="section"><h2>Mailbox HBase Classes<a name="Mailbox_HBase_Classes"></a></h2>
<p>This is a onverview of the most important classes in the implementation. </p>
<div class="section"><h3>HBaseMailboxManager<a name="HBaseMailboxManager"></a></h3>
<p>
<b>HBaseMailboxManager</b> extends the
<b>StoreMailboxManager</b> class.
It has a simple implementation that just overrides the
<i>doCreateMailbox</i> method to return a HBaseMailbox implementation and
<i>createMessageManger</i> method to return a HBaseMessageManager implementation.
Other then that it relies on the default StoreMailboxManager implementation.
</p>
</div>
<div class="section"><h3>HBaseMessageManager<a name="HBaseMessageManager"></a></h3>
<p>
<b>HBaseMessageManager</b> extends StoreMailboxManager and provides an implementation for getPermanentFlags method.
</p>
</div>
<div class="section"><h3>Chunked Streams<a name="Chunked_Streams"></a></h3>
<p>Message bodies can have varying sizes. Some have attachements of up to 25Mb, some even greater.
There are practical limits to the size of a HBase column (see
<a class="externalLink" href="http://hbase.apache.org/book.html#supported.datatypes">http://hbase.apache.org/book.html#supported.datatypes</a>).
To adress this issue, the implementation splits the message into smaller chunks and saves each chunk into a separate column.
The columns have increasing integer names starting with 1 and there can be at most Long.MAX_VALUE chunks.
</p>
<p>
The magic happens in
<b>ChunkInputStream</b> and
<b>ChunkOutputStream</b> that extend
InputStream and OutputStream from java.io package.
<br />
Data is retrieved using HBase Get operation and stored into an internal byte array.
Data is stored using HBase Put operation and chunks are split into
<b>chunkSize</b> configurable sized chunks.
Things could be more efficient if HBase had streaming support.
</p>
</div>
<div class="section"><h3>HBaseMessage<a name="HBaseMessage"></a></h3>
<p>Extends AbstractMessage and represents a message in the message store.
What is important to remember is that the current implementation retrieves just the message meta-data from HBase
and uses ChunkInputStream to load the message body only when needed.
</p>
</div>
</div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">Copyright &#169; 2010-2012
<a href="http://www.apache.org/">The Apache Software Foundation</a>.
All Rights Reserved.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>