blob: ce2bdf4774f50963457d9842e06342ed7e7b0120 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<document>
<properties>
<author email="akarasulu@apache.org">Alex Karasulu</author>
<title>Partitions</title>
</properties>
<body>
<section name="Partitions">
<p>
Partitions are entry stores assigned to a naming context. The idea
behind a partition is that it stores a subset of the Directory
Information Base (DIB). Partitions can be implemented in any way so
long as they adhere to interfaces.
</p>
<subsection name="Status">
<p>
Presently the server has a single partition implementation. This
implementation is used for both the system partition and user
partitions. It uses <a href="http://jdbm.sourceforge.net/">JDBM</a>
as the underlying B+Tree implementation for storing entries.
</p>
<p>
Other implementations are possible. I'm particularly interested in
memory based partitions either BTree based or based on something like
Prevayer.
</p>
<p>
Partitions have simple interfaces that can be used to align any data
source to the LDAP data model thereby accessing it via JNDI or via
LDAP over the wire. This makes the server very flexible as a bridge
to standardize access to disparate data sources and formats. Dynamic
mapping based backends are also interesting.
</p>
</subsection>
<subsection name="System Partition">
<p>
The system partition is a very special partition that is hardcoded to
hang off of the <b>ou=system</b> naming context. It is always present
and contains administrative and operational information needed by the
server to operate. Hence its name.
</p>
<p>
The server's subsystems will use this partition to store information
critical to their operation. Things like triggers, stored procedures,
access control instructions and schema information can be maintained
here.
</p>
</subsection>
<subsection name="Root Nexus">
<p>
Several partitions can be assigned to different naming contexts within
the server so long as their names do not overlap such that one
partition's naming context is contained within anothers. The root
nexus is a fake partition that does not really store entries. It maps
other entry storing partitions to naming contexts and routes backing
store calls to the partition containing the entry associated with the
operation.
</p>
</subsection>
<subsection name="User Partitions">
<p>
User partitions are partitions added by users. When you download and
start using the server you may want to create a separate partition to
store the entries of your application. To us user (sometimes also
referred to as application) partitions are those that are not the system
partition! In the following section we describe how a user partition
can be created in the server.
</p>
</subsection>
</section>
<section name="Adding User Partitions">
<p>
Adding new application partitions to the server is a matter of
setting the right JNDI environment properties. These properties are
used in both standalone and in embedded configurations. We will show
you how to configure partitions by example using properties files and
programatically.
</p>
<subsection name="Using Properties Files">
<p>
Obviously properties files are not the best way to configure a large
system like an LDAP server. However properties files are the JNDI
standard for pulling in configuration. The server's JNDI provider tries
to honor this. Hence the use of a properties file for configuration.
Below we have the configuration of two user defined partitions within
a properties file. These partitions are for the naming contexts:
<code>dc=apache,dc=org</code> and <code>ou=test</code>.
</p>
<source>
# all multivalued properties are space separated like the list of partions here
server.db.partitions=apache test
# apache partition configuration
server.db.partition.suffix.apache=dc=apache,dc=org
server.db.partition.indices.apache=ou cn objectClass uid
server.db.partition.attributes.apache.dc=apache
server.db.partition.attributes.apache.objectClass=top domain extensibleObject
# test partition configuration
server.db.partition.suffix.test=ou=test
server.db.partition.indices.test=ou objectClass
server.db.partition.attributes.test.ou=test
server.db.partition.attributes.test.objectClass=top organizationalUnit extensibleObject
</source>
<p>
Although somewhat ugly the way we use properties for settings is
portable across JNDI LDAP providers. Hopefully we can build a tool
on top of this to save the user some hassle. Another approach may be
to use XML or something easier to generate these properties from them.
For now its the best non-specific (to the server's provider) means we
have to inject settings through JNDI environment Hashtables while
still being able to load settings via properties files. Properties
from proerties files are the common denominator though. Another
easier means to configure the server is possible programatically.
</p>
<h3>Partition Id</h3>
<p>
Breifly we'll explain these properties and the scheme used. A
partition's property set is associated as a set using the partition's
id. All partition ids are listed as a space separated list using the
<b>server.db.partitions</b> property: above it lists the ids for the two
partitions, <i>apache</i> and <i>test</i>.
</p>
<h3>Naming Context</h3>
<p>
Partitions need to know the naming context they will store entries
for. This naming context is also referred to as the suffix since all
entries in the partition have this common suffix. The suffix is a
distinguished name. The property key for the suffix of a partition is
composed of the following property key base
<b>server.db.partition.suffix.</b> concatenated with the id of the
partition: <b>server.db.partition.suffix.</b><i>${id}</i>. For example
if the partition id is foo, then the suffix key would be,
<b>server.db.partition.suffix.foo</b>.
</p>
<h3>User Defined Indices</h3>
<p>
Partitions can have indices on attributes. Unlike OpenLDAP where you
can build specific types of indices, the server's indices are of a
single type. For each partition, a key is assembled from the
partition id and the property key base:
<b>server.db.partition.indices.</b><i>${id}</i>. So
again for foo the key for attribute indices would be
<b>server.db.partition.indices.foo</b>. This value is a space separated
list of attributeType names to index. For example the apache
partition has indices built on top of <b>ou</b>, <b>objectClass</b>
and <b>uid</b>.
</p>
<h3>Suffix Entry</h3>
<p>
When creating a context the root entry of the context corresponding
to the suffix of the partition must be created. This entry is
composed of single-valued and multi-valued attributes. We must
specify these attributes as well as their values. To do so we again
use a key composed of a base, however this time we use both the id
of the partition and the name of the attribute:
<b>server.db.partition.attributes.</b><i>${id}</i>.<i>${name}</i>. So
for partition foo and attribute bar the following key would be used:
<b>server.db.partition.attributes.foo.bar</b>. The value of the key
is a space separated list of values for the bar attribute. For
example the apache partition's suffix has an objectClass attribute
and its values are set to: top domain extensibleObject.
</p>
</subsection>
<subsection name="Programatically">
<p>
This is simple create a Hashtable and stuff it with those properties.
But that's a real pain. The other option is to set all the properties
that way minus the one for the suffix entries attributes. We have
a shortcut where you can set an Attributes object within the Hashtable
and it will get picked up instead of using the standard property
scheme above.
</p>
<p>
Simply put the Attributes into the Hashtable using the following
key <b>server.db.partition.attributes.</b><i>${id}</i>. Below we show
how this can be done for the same example above:
</p>
<source>
BasicAttributes attrs = new BasicAttributes( true );
BasicAttribute attr = new BasicAttribute( "objectClass" );
attr.add( "top" );
attr.add( "organizationalUnit" );
attr.add( "extensibleObject" );
attrs.put( attr );
attr = new BasicAttribute( "ou" );
attr.add( "testing" );
attrs.put( attr );
extras.put( EnvKeys.PARTITIONS, "testing example" );
extras.put( EnvKeys.SUFFIX + "testing", "ou=testing" );
extras.put( EnvKeys.INDICES + "testing", "ou objectClass" );
extras.put( EnvKeys.ATTRIBUTES + "testing", attrs );
attrs = new BasicAttributes( true );
attr = new BasicAttribute( "objectClass" );
attr.add( "top" );
attr.add( "domain" );
attr.add( "extensibleObject" );
attrs.put( attr );
attr = new BasicAttribute( "dc" );
attr.add( "example" );
attrs.put( attr );
extras.put( EnvKeys.SUFFIX + "example", "dc=example" );
extras.put( EnvKeys.INDICES + "example", "ou dc objectClass" );
extras.put( EnvKeys.ATTRIBUTES + "example", attrs );
</source>
<p>
Ok that does not look any shorter. We'll add to this in the future.
Perhaps we enable the use of configuration beans that can be used
with an SPI specific to server. However this starts making your code
server provider specific. You can just change properties and use the
SUN provider anymore to have your code be location independent.
</p>
</subsection>
</section>
<section name="Future Progress">
<subsection name="Partition Nesting">
<p>
Today we have some limitations to the way we can partition the DIB.
Namely we can't have a partition within a partition and sometimes this
makes sense. Eventually we intend to enable this kind of
functionality using a special type of nexus which is both a router
and a backing store for entries. It's smart enough to know what to
route verses when to use its own database. Here's a <a href=
"http://issues.apache.org/jira/browse/DIREVE-23">JIRA improvement</a>
specifically aimed at achieving this goal.
</p>
</subsection>
<subsection name="Partition Variety">
<p>
Obviously we want as many different kinds of partitions as possible.
Some really cool ideas have floated around out there for a while.
Here's a list of theoretically possible partition types that might
be useful or just cool:
</p>
<ul>
<li>
Partitions that use JDBC to store entries. These would probably
be way too slow. However they might be useful if some mapping
were to be used to represent an existing application's database
schema as an LDAP DIT. This would allow us to expose any database
data via LDAP.
</li>
<li>
Partitions using other LDAP servers to store their entries. Why
do this when introducing latency. Perhaps you want to proxy other
servers or make other servers behave like the server.
</li>
<li>
A partition that serves out the Windows registry via LDAP. A
standard mechanism to map the Windows registry to an LDAP DIT is
pretty simple. This would be a neat way to expose client machine
registry management.
</li>
<li>
A partition based on SleepyCat's JE. I was going to try this
and see how it performs against JDBM.
</li>
<li>
A partition based on an in-memory BTree implementation. This would
be fast and really cool for storing things like schema info. It
would also be cool for staging data between memory and disk.
</li>
<li>
A partition based on Prevalyer. This is like an in-memory partition
but you can save it at the end of the day. This might be really
useful especially for things the system partition which almost
always need to be in memory. The system partition can do this by
using really large caches equal to the number of entries in the
system partition.
</li>
</ul>
</subsection>
<subsection name="Partitioning entries under a single context?">
<p>
Other aspirations include entry partitioning within a container
context. Imagine having 250 million entries under
<code>ou=citizens,dc=census,dc=gov</code>. You don't want all 250
million in one partition but would like to sub partition these entries
under the same context based on some attribute. Basically we will be
using the attribute's value to implement sub partitioning where within
a single context we are partitioning entries. The value is used to
hash entries across buckets (the buckets are other partitions). Yeah
this is a bit wild but it would be useful in several situations.
</p>
</subsection>
</section>
</body>
</document>