xdocs/users/partitions.xml - directory-server - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <document>
   <properties>
     <author email="akarasulu@apache.org">Alex Karasulu</author>
     <title>Partitions</title>
   </properties>

   <body>
     <section name="Partitions">
       <p>
         Partitions are entry stores assigned to a naming context.  The idea
         behind a partition is that it stores a subset of the Directory
         Information Base (DIB).  Partitions can be implemented in any way so
         long as they adhere to interfaces.
       </p>

       <subsection name="Status">
         <p>
           Presently the server has a single partition implementation.  This
           implementation is used for both the system partition and user
           partitions.  It uses <a href="http://jdbm.sourceforge.net/">JDBM</a>
           as the underlying B+Tree implementation for storing entries.
         </p>

         <p>
           Other implementations are possible.  I'm particularly interested in
           memory based partitions either BTree based or based on something like
           Prevayer.
         </p>

         <p>
           Partitions have simple interfaces that can be used to align any data
           source to the LDAP data model thereby accessing it via JNDI or via
           LDAP over the wire.  This makes the server very flexible as a bridge
           to standardize access to disparate data sources and formats.  Dynamic
           mapping based backends are also interesting.
         </p>
       </subsection>

       <subsection name="System Partition">
         <p>
           The system partition is a very special partition that is hardcoded to
           hang off of the <b>ou=system</b> naming context.  It is always present
           and contains administrative and operational information needed by the
           server to operate.  Hence its name.
         </p>

         <p>
           The server's subsystems will use this partition to store information
           critical to their operation.  Things like triggers, stored procedures,
           access control instructions and schema information can be maintained
           here.
         </p>
       </subsection>

       <subsection name="Root Nexus">
         <p>
           Several partitions can be assigned to different naming contexts within
           the server so long as their names do not overlap such that one
           partition's naming context is contained within anothers.  The root
           nexus is a fake partition that does not really store entries.  It maps
           other entry storing partitions to naming contexts and routes backing
           store calls to the partition containing the entry associated with the
           operation.
         </p>
       </subsection>

       <subsection name="User Partitions">
         <p>
           User partitions are partitions added by users.  When you download and
           start using the server you may want to create a separate partition to
           store the entries of your application.  To us user (sometimes also
           referred to as application) partitions are those that are not the system
           partition!  In the following section we describe how a user partition
           can be created in the server.
         </p>
       </subsection>
     </section>

     <section name="Adding User Partitions">
       <p>
         Adding new application partitions to the server is a matter of
         setting the right JNDI environment properties.  These properties are
         used in both standalone and in embedded configurations.  We will show
         you how to configure partitions by example using properties files and
         programatically.
       </p>

       <subsection name="Using Properties Files">
         <p>
           Obviously properties files are not the best way to configure a large
           system like an LDAP server.  However properties files are the JNDI
           standard for pulling in configuration.  The server's JNDI provider tries
           to honor this.  Hence the use of a properties file for configuration.
           Below we have the configuration of two user defined partitions within
           a properties file.  These partitions are for the naming contexts:
           <code>dc=apache,dc=org</code> and <code>ou=test</code>.
         </p>

     <source>
 # all multivalued properties are space separated like the list of partions here
 server.db.partitions=apache test

 # apache partition configuration
 server.db.partition.suffix.apache=dc=apache,dc=org
 server.db.partition.indices.apache=ou cn objectClass uid
 server.db.partition.attributes.apache.dc=apache
 server.db.partition.attributes.apache.objectClass=top domain extensibleObject

 # test partition configuration
 server.db.partition.suffix.test=ou=test
 server.db.partition.indices.test=ou objectClass
 server.db.partition.attributes.test.ou=test
 server.db.partition.attributes.test.objectClass=top organizationalUnit extensibleObject
     </source>

         <p>
           Although somewhat ugly the way we use properties for settings is
           portable across JNDI LDAP providers.  Hopefully we can build a tool
           on top of this to save the user some hassle.  Another approach may be
           to use XML or something easier to generate these properties from them.
           For now its the best non-specific (to the server's provider) means we
           have to inject settings through JNDI environment Hashtables while
           still being able to load settings via properties files.  Properties
           from proerties files are the common denominator though.  Another
           easier means to configure the server is possible programatically.
         </p>
         <h3>Partition Id</h3>
         <p>
           Breifly we'll explain these properties and the scheme used.  A
           partition's property set is associated as a set using the partition's
           id.  All partition ids are listed as a space separated list using the
           <b>server.db.partitions</b> property: above it lists the ids for the two
           partitions, <i>apache</i> and <i>test</i>.
         </p>
         <h3>Naming Context</h3>
         <p>
           Partitions need to know the naming context they will store entries
           for.  This naming context is also referred to as the suffix since all
           entries in the partition have this common suffix.  The suffix is a
           distinguished name.  The property key for the suffix of a partition is
           composed of the following property key base
           <b>server.db.partition.suffix.</b> concatenated with the id of the
           partition: <b>server.db.partition.suffix.</b><i>${id}</i>.  For example
           if the partition id is foo, then the suffix key would be,
           <b>server.db.partition.suffix.foo</b>.
         </p>
         <h3>User Defined Indices</h3>
         <p>
           Partitions can have indices on attributes.  Unlike OpenLDAP where you
           can build specific types of indices, the server's indices are of a
           single type.  For each partition, a key is assembled from the
           partition id and the property key base:
           <b>server.db.partition.indices.</b><i>${id}</i>.  So
           again for foo the key for attribute indices would be
           <b>server.db.partition.indices.foo</b>.  This value is a space separated
           list of attributeType names to index.  For example the apache
           partition has indices built on top of <b>ou</b>, <b>objectClass</b>
           and <b>uid</b>.
         </p>
         <h3>Suffix Entry</h3>
         <p>
           When creating a context the root entry of the context corresponding
           to the suffix of the partition must be created.  This entry is
           composed of single-valued and multi-valued attributes.  We must
           specify these attributes as well as their values.  To do so we again
           use a key composed of a base, however this time we use both the id
           of the partition and the name of the attribute:
           <b>server.db.partition.attributes.</b><i>${id}</i>.<i>${name}</i>.  So
           for partition foo and attribute bar the following key would be used:
           <b>server.db.partition.attributes.foo.bar</b>.    The value of the key
           is a space separated list of values for the bar attribute.  For
           example the apache partition's suffix has an objectClass attribute
           and its values are set to: top domain extensibleObject.
         </p>
       </subsection>

       <subsection name="Programatically">
         <p>
           This is simple create a Hashtable and stuff it with those properties.
           But that's a real pain.  The other option is to set all the properties
           that way minus the one for the suffix entries attributes.  We have
           a shortcut where you can set an Attributes object within the Hashtable
           and it will get picked up instead of using the standard property
           scheme above.
         </p>

         <p>
           Simply put the Attributes into the Hashtable using the following
           key <b>server.db.partition.attributes.</b><i>${id}</i>.  Below we show
           how this can be done for the same example above:
         </p>

 <source>
 BasicAttributes attrs = new BasicAttributes( true );
 BasicAttribute attr = new BasicAttribute( "objectClass" );
 attr.add( "top" );
 attr.add( "organizationalUnit" );
 attr.add( "extensibleObject" );
 attrs.put( attr );
 attr = new BasicAttribute( "ou" );
 attr.add( "testing" );
 attrs.put( attr );

 extras.put( EnvKeys.PARTITIONS, "testing example" );
 extras.put( EnvKeys.SUFFIX + "testing", "ou=testing" );
 extras.put( EnvKeys.INDICES + "testing", "ou objectClass" );
 extras.put( EnvKeys.ATTRIBUTES + "testing", attrs );

 attrs = new BasicAttributes( true );
 attr = new BasicAttribute( "objectClass" );
 attr.add( "top" );
 attr.add( "domain" );
 attr.add( "extensibleObject" );
 attrs.put( attr );
 attr = new BasicAttribute( "dc" );
 attr.add( "example" );
 attrs.put( attr );

 extras.put( EnvKeys.SUFFIX + "example", "dc=example" );
 extras.put( EnvKeys.INDICES + "example", "ou dc objectClass" );
 extras.put( EnvKeys.ATTRIBUTES + "example", attrs );
 </source>

         <p>
           Ok that does not look any shorter.  We'll add to this in the future.
           Perhaps we enable the use of configuration beans that can be used
           with an SPI specific to server.  However this starts making your code
           server provider specific.  You can just change properties and use the
           SUN provider anymore to have your code be location independent.
         </p>
       </subsection>
     </section>

     <section name="Future Progress">
       <subsection name="Partition Nesting">
         <p>
           Today we have some limitations to the way we can partition the DIB.
           Namely we can't have a partition within a partition and sometimes this
           makes sense.  Eventually we intend to enable this kind of
           functionality using a special type of nexus which is both a router
           and a backing store for entries.  It's smart enough to know what to
           route verses when to use its own database.  Here's a <a href=
           "http://issues.apache.org/jira/browse/DIREVE-23">JIRA improvement</a>
           specifically aimed at achieving this goal.
         </p>
       </subsection>

       <subsection name="Partition Variety">
         <p>
           Obviously we want as many different kinds of partitions as possible.
           Some really cool ideas have floated around out there for a while.
           Here's a list of theoretically possible partition types that might
           be useful or just cool:
         </p>

         <ul>
           <li>
             Partitions that use JDBC to store entries.  These would probably
             be way too slow.  However they might be useful if some mapping
             were to be used to represent an existing application's database
             schema as an LDAP DIT.  This would allow us to expose any database
             data via LDAP.
           </li>

           <li>
             Partitions using other LDAP servers to store their entries.  Why
             do this when introducing latency.  Perhaps you want to proxy other
             servers or make other servers behave like the server.
           </li>

           <li>
             A partition that serves out the Windows registry via LDAP.  A
             standard mechanism to map the Windows registry to an LDAP DIT is
             pretty simple.  This would be a neat way to expose client machine
             registry management.
           </li>

           <li>
             A partition based on SleepyCat's JE.  I was going to try this
             and see how it performs against JDBM.
           </li>

           <li>
             A partition based on an in-memory BTree implementation.  This would
             be fast and really cool for storing things like schema info.  It
             would also be cool for staging data between memory and disk.
           </li>

           <li>
             A partition based on Prevalyer.  This is like an in-memory partition
             but you can save it at the end of the day.  This might be really
             useful especially for things the system partition which almost
             always need to be in memory.  The system partition can do this by
             using really large caches equal to the number of entries in the
             system partition.
           </li>
         </ul>
       </subsection>

       <subsection name="Partitioning entries under a single context?">
         <p>
           Other aspirations include entry partitioning within a container
           context.  Imagine having 250 million entries under
           <code>ou=citizens,dc=census,dc=gov</code>.  You don't want all 250
           million in one partition but would like to sub partition these entries
           under the same context based on some attribute.  Basically we will be
           using the attribute's value to implement sub partitioning where within
           a single context we are partitioning entries.  The value is used to
           hash entries across buckets (the buckets are other partitions).  Yeah
           this is a bit wild but it would be useful in several situations.
         </p>
       </subsection>
     </section>
   </body>
 </document>
	<?xml version="1.0" encoding="UTF-8"?>
	<document>
	<properties>
	<author email="akarasulu@apache.org">Alex Karasulu</author>
	<title>Partitions</title>
	</properties>

	<body>
	<section name="Partitions">
	<p>
	Partitions are entry stores assigned to a naming context. The idea
	behind a partition is that it stores a subset of the Directory
	Information Base (DIB). Partitions can be implemented in any way so
	long as they adhere to interfaces.
	</p>

	<subsection name="Status">
	<p>
	Presently the server has a single partition implementation. This
	implementation is used for both the system partition and user
	partitions. It uses <a href="http://jdbm.sourceforge.net/">JDBM</a>
	as the underlying B+Tree implementation for storing entries.
	</p>

	<p>
	Other implementations are possible. I'm particularly interested in
	memory based partitions either BTree based or based on something like
	Prevayer.
	</p>

	<p>
	Partitions have simple interfaces that can be used to align any data
	source to the LDAP data model thereby accessing it via JNDI or via
	LDAP over the wire. This makes the server very flexible as a bridge
	to standardize access to disparate data sources and formats. Dynamic
	mapping based backends are also interesting.
	</p>
	</subsection>

	<subsection name="System Partition">
	<p>
	The system partition is a very special partition that is hardcoded to
	hang off of the <b>ou=system</b> naming context. It is always present
	and contains administrative and operational information needed by the
	server to operate. Hence its name.
	</p>

	<p>
	The server's subsystems will use this partition to store information
	critical to their operation. Things like triggers, stored procedures,
	access control instructions and schema information can be maintained
	here.
	</p>
	</subsection>

	<subsection name="Root Nexus">
	<p>
	Several partitions can be assigned to different naming contexts within
	the server so long as their names do not overlap such that one
	partition's naming context is contained within anothers. The root
	nexus is a fake partition that does not really store entries. It maps
	other entry storing partitions to naming contexts and routes backing
	store calls to the partition containing the entry associated with the
	operation.
	</p>
	</subsection>

	<subsection name="User Partitions">
	<p>
	User partitions are partitions added by users. When you download and
	start using the server you may want to create a separate partition to
	store the entries of your application. To us user (sometimes also
	referred to as application) partitions are those that are not the system
	partition! In the following section we describe how a user partition
	can be created in the server.
	</p>
	</subsection>
	</section>

	<section name="Adding User Partitions">
	<p>
	Adding new application partitions to the server is a matter of
	setting the right JNDI environment properties. These properties are
	used in both standalone and in embedded configurations. We will show
	you how to configure partitions by example using properties files and
	programatically.
	</p>

	<subsection name="Using Properties Files">
	<p>
	Obviously properties files are not the best way to configure a large
	system like an LDAP server. However properties files are the JNDI
	standard for pulling in configuration. The server's JNDI provider tries
	to honor this. Hence the use of a properties file for configuration.
	Below we have the configuration of two user defined partitions within
	a properties file. These partitions are for the naming contexts:
	<code>dc=apache,dc=org</code> and <code>ou=test</code>.
	</p>

	<source>
	# all multivalued properties are space separated like the list of partions here
	server.db.partitions=apache test

	# apache partition configuration
	server.db.partition.suffix.apache=dc=apache,dc=org
	server.db.partition.indices.apache=ou cn objectClass uid
	server.db.partition.attributes.apache.dc=apache
	server.db.partition.attributes.apache.objectClass=top domain extensibleObject

	# test partition configuration
	server.db.partition.suffix.test=ou=test
	server.db.partition.indices.test=ou objectClass
	server.db.partition.attributes.test.ou=test
	server.db.partition.attributes.test.objectClass=top organizationalUnit extensibleObject
	</source>

	<p>
	Although somewhat ugly the way we use properties for settings is
	portable across JNDI LDAP providers. Hopefully we can build a tool
	on top of this to save the user some hassle. Another approach may be
	to use XML or something easier to generate these properties from them.
	For now its the best non-specific (to the server's provider) means we
	have to inject settings through JNDI environment Hashtables while
	still being able to load settings via properties files. Properties
	from proerties files are the common denominator though. Another
	easier means to configure the server is possible programatically.
	</p>
	<h3>Partition Id</h3>
	<p>
	Breifly we'll explain these properties and the scheme used. A
	partition's property set is associated as a set using the partition's
	id. All partition ids are listed as a space separated list using the
	<b>server.db.partitions</b> property: above it lists the ids for the two
	partitions, <i>apache</i> and <i>test</i>.
	</p>
	<h3>Naming Context</h3>
	<p>
	Partitions need to know the naming context they will store entries
	for. This naming context is also referred to as the suffix since all
	entries in the partition have this common suffix. The suffix is a
	distinguished name. The property key for the suffix of a partition is
	composed of the following property key base
	<b>server.db.partition.suffix.</b> concatenated with the id of the
	partition: <b>server.db.partition.suffix.</b><i>${id}</i>. For example
	if the partition id is foo, then the suffix key would be,
	<b>server.db.partition.suffix.foo</b>.
	</p>
	<h3>User Defined Indices</h3>
	<p>
	Partitions can have indices on attributes. Unlike OpenLDAP where you
	can build specific types of indices, the server's indices are of a
	single type. For each partition, a key is assembled from the
	partition id and the property key base:
	<b>server.db.partition.indices.</b><i>${id}</i>. So
	again for foo the key for attribute indices would be
	<b>server.db.partition.indices.foo</b>. This value is a space separated
	list of attributeType names to index. For example the apache
	partition has indices built on top of <b>ou</b>, <b>objectClass</b>
	and <b>uid</b>.
	</p>
	<h3>Suffix Entry</h3>
	<p>
	When creating a context the root entry of the context corresponding
	to the suffix of the partition must be created. This entry is
	composed of single-valued and multi-valued attributes. We must
	specify these attributes as well as their values. To do so we again
	use a key composed of a base, however this time we use both the id
	of the partition and the name of the attribute:
	<b>server.db.partition.attributes.</b><i>${id}</i>.<i>${name}</i>. So
	for partition foo and attribute bar the following key would be used:
	<b>server.db.partition.attributes.foo.bar</b>. The value of the key
	is a space separated list of values for the bar attribute. For
	example the apache partition's suffix has an objectClass attribute
	and its values are set to: top domain extensibleObject.
	</p>
	</subsection>

	<subsection name="Programatically">
	<p>
	This is simple create a Hashtable and stuff it with those properties.
	But that's a real pain. The other option is to set all the properties
	that way minus the one for the suffix entries attributes. We have
	a shortcut where you can set an Attributes object within the Hashtable
	and it will get picked up instead of using the standard property
	scheme above.
	</p>

	<p>
	Simply put the Attributes into the Hashtable using the following
	key <b>server.db.partition.attributes.</b><i>${id}</i>. Below we show
	how this can be done for the same example above:
	</p>

	<source>
	BasicAttributes attrs = new BasicAttributes( true );
	BasicAttribute attr = new BasicAttribute( "objectClass" );
	attr.add( "top" );
	attr.add( "organizationalUnit" );
	attr.add( "extensibleObject" );
	attrs.put( attr );
	attr = new BasicAttribute( "ou" );
	attr.add( "testing" );
	attrs.put( attr );

	extras.put( EnvKeys.PARTITIONS, "testing example" );
	extras.put( EnvKeys.SUFFIX + "testing", "ou=testing" );
	extras.put( EnvKeys.INDICES + "testing", "ou objectClass" );
	extras.put( EnvKeys.ATTRIBUTES + "testing", attrs );

	attrs = new BasicAttributes( true );
	attr = new BasicAttribute( "objectClass" );
	attr.add( "top" );
	attr.add( "domain" );
	attr.add( "extensibleObject" );
	attrs.put( attr );
	attr = new BasicAttribute( "dc" );
	attr.add( "example" );
	attrs.put( attr );

	extras.put( EnvKeys.SUFFIX + "example", "dc=example" );
	extras.put( EnvKeys.INDICES + "example", "ou dc objectClass" );
	extras.put( EnvKeys.ATTRIBUTES + "example", attrs );
	</source>

	<p>
	Ok that does not look any shorter. We'll add to this in the future.
	Perhaps we enable the use of configuration beans that can be used
	with an SPI specific to server. However this starts making your code
	server provider specific. You can just change properties and use the
	SUN provider anymore to have your code be location independent.
	</p>
	</subsection>
	</section>

	<section name="Future Progress">
	<subsection name="Partition Nesting">
	<p>
	Today we have some limitations to the way we can partition the DIB.
	Namely we can't have a partition within a partition and sometimes this
	makes sense. Eventually we intend to enable this kind of
	functionality using a special type of nexus which is both a router
	and a backing store for entries. It's smart enough to know what to
	route verses when to use its own database. Here's a <a href=
	"http://issues.apache.org/jira/browse/DIREVE-23">JIRA improvement</a>
	specifically aimed at achieving this goal.
	</p>
	</subsection>

	<subsection name="Partition Variety">
	<p>
	Obviously we want as many different kinds of partitions as possible.
	Some really cool ideas have floated around out there for a while.
	Here's a list of theoretically possible partition types that might
	be useful or just cool:
	</p>

	<ul>
	<li>
	Partitions that use JDBC to store entries. These would probably
	be way too slow. However they might be useful if some mapping
	were to be used to represent an existing application's database
	schema as an LDAP DIT. This would allow us to expose any database
	data via LDAP.
	</li>

	<li>
	Partitions using other LDAP servers to store their entries. Why
	do this when introducing latency. Perhaps you want to proxy other
	servers or make other servers behave like the server.
	</li>

	<li>
	A partition that serves out the Windows registry via LDAP. A
	standard mechanism to map the Windows registry to an LDAP DIT is
	pretty simple. This would be a neat way to expose client machine
	registry management.
	</li>

	<li>
	A partition based on SleepyCat's JE. I was going to try this
	and see how it performs against JDBM.
	</li>

	<li>
	A partition based on an in-memory BTree implementation. This would
	be fast and really cool for storing things like schema info. It
	would also be cool for staging data between memory and disk.
	</li>

	<li>
	A partition based on Prevalyer. This is like an in-memory partition
	but you can save it at the end of the day. This might be really
	useful especially for things the system partition which almost
	always need to be in memory. The system partition can do this by
	using really large caches equal to the number of entries in the
	system partition.
	</li>
	</ul>
	</subsection>

	<subsection name="Partitioning entries under a single context?">
	<p>
	Other aspirations include entry partitioning within a container
	context. Imagine having 250 million entries under
	<code>ou=citizens,dc=census,dc=gov</code>. You don't want all 250
	million in one partition but would like to sub partition these entries
	under the same context based on some attribute. Basically we will be
	using the attribute's value to implement sub partitioning where within
	a single context we are partitioning entries. The value is used to
	hash entries across buckets (the buckets are other partitions). Yeah
	this is a bit wild but it would be useful in several situations.
	</p>
	</subsection>
	</section>
	</body>
	</document>