content/apacheds/advanced-ug/1.5-backend.mdtext - directory-site - Git at Google

 Title: 1.5 - Backend
 NavPrev: 1.4-interceptors.html
 NavPrevText: 1.4 - Interceptors
 NavUp: 1-architecture.html
 NavUpText: 1 - Architecture
 NavNext: 2-server-config.html
 NavNextText: 2 - Server Configuration
 Notice: Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
     .
     http://www.apache.org/licenses/LICENSE-2.0
     .
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 # 1.5 - Backend

 The _Backend_ is the part of the server responsible for storing data in a way we can easily retrieve them. This storage does not necessarily have to be remanent : we can have a in-memory _backend_.

 ## Existing Backends

 We currently have 3 different backends :

 * JDBM
 * LDIF
 * In-Memory

 ### JDBM Backend

 The **JDBM** backend is storing data on disk, using **BTrees**. It's fast when it comes to retrieve data, slow when you have to add them.

 ### In-Memory Backend

 This Backend loads in memory a full set of entries. ALl of them must be hold by the existig memory, we don't write on disk anything nor we read anything from disk. If the server is stopped, everything is lost.

 ### LDIF Backend

 It comes in two forms : one single file, or many fles (one per entry). It's always backed by a in-memory _Backend_, otherwise it would not be possible to retrieve the entries.

 As we depend on a in-memory backend, which handles the indexes, we have to create those indexes when this _Backend_ is read, which can be a costly operation.

 ### Future Backends

 We intend to add another in-memory backend, based on _Mavibot_, a **MVCC BTREE**. The biggest advantage over the other systems is that it's fast, it allows concurrent reads without locks when the other _Backend_ block the reads when some write operation is being processed. Also it saves on disk it contents periodically, and has a Journal so that we can recover from a crash.

 The only drawback is that all the entries and indexes must hold in memory. On the other hand, we don't anymore need a cache.

 ## How it works

 Basically, each _Backend_ instance inherit from the _AbstractBTreePartition_ class. We can see that a _Backend_ **must** be a **Btree**.

 Data are stored into various tables. In fact, we have one table containing all the entries - the **MasterTable** - and many indexes.

 ### MasterTable

 The _MasterTable_ contains all the entries, serialized.

 This table is a <Key, Value> **BTree**, where the key is the entry's **UUID**, and the value the serialized entry.

 <DIV class="note" markdown="1">
 Theoretically, we could be able to read it, and restore the whole database, indexes included, assuming that we know which index we have to create. Sadly, it's not enough, as the entries are stored without any information about their position in the **DIT**.
 </DIV>

 ### Indexes

 Each index is also a <key, value> **BTree**, with some exceptions : as we may store multi-valued elements, it's perfectly possible that the value will grow up to a point it's extremely costly to store it serialized. For instance, the _ObjectClass_ index may have thousands of entries for the _Person_ key.

 In this case, we use a sub-btree, which is a <key,key> **BTree** (as strange as it sounds, it's an easy way to add a new key without having to rewrite the full value).

 The key can be a _String_, or a _ParentIdAndRdn_.

 We have 7 system indexes, which are created when the server is started :

 * ObjectClass : to easily find any entry associated with a give _ObjectClass_
 * EntryCsn : The Change Sequence Number index
 * Rdn : A special index containing a RDN and its parent
 * Presence : An index used when searching for the presence of an attributeType in an entry
 * Alias : An index used for aliases
 * OneAlias : An index used for children aliases
 * SubAlias : An index used of descendant aliases

 The user may define many different indexes, depending on his or her needs.

 ### The ParentIdAndRdn index

 This index is special, as it's used to associate an entry to a position in the **DIT**. Assuming that each entry has a _Dn_, and that this _Dn_ describes a hierarchy, the _ParentIdAndRdn_ index depicts this hierarchy.

 The _ParentId_ part refers to the _UUID_ of the parent for the current entry. The _Rdn_ part is the entry _Rdn_. In order to rebuild the full _Dn_ for a given entry, we must get all the _ParentIdAndRdn_ up to the root to grab all the needed _Rdn_.

 This index is also used to process one level and sub level indexes.
	Title: 1.5 - Backend
	NavPrev: 1.4-interceptors.html
	NavPrevText: 1.4 - Interceptors
	NavUp: 1-architecture.html
	NavUpText: 1 - Architecture
	NavNext: 2-server-config.html
	NavNextText: 2 - Server Configuration
	Notice: Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at
	.
	http://www.apache.org/licenses/LICENSE-2.0
	.
	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	# 1.5 - Backend

	The _Backend_ is the part of the server responsible for storing data in a way we can easily retrieve them. This storage does not necessarily have to be remanent : we can have a in-memory _backend_.

	## Existing Backends

	We currently have 3 different backends :

	* JDBM
	* LDIF
	* In-Memory

	### JDBM Backend

	The JDBM backend is storing data on disk, using BTrees. It's fast when it comes to retrieve data, slow when you have to add them.

	### In-Memory Backend

	This Backend loads in memory a full set of entries. ALl of them must be hold by the existig memory, we don't write on disk anything nor we read anything from disk. If the server is stopped, everything is lost.

	### LDIF Backend

	It comes in two forms : one single file, or many fles (one per entry). It's always backed by a in-memory _Backend_, otherwise it would not be possible to retrieve the entries.

	As we depend on a in-memory backend, which handles the indexes, we have to create those indexes when this _Backend_ is read, which can be a costly operation.

	### Future Backends

	We intend to add another in-memory backend, based on _Mavibot_, a MVCC BTREE. The biggest advantage over the other systems is that it's fast, it allows concurrent reads without locks when the other _Backend_ block the reads when some write operation is being processed. Also it saves on disk it contents periodically, and has a Journal so that we can recover from a crash.

	The only drawback is that all the entries and indexes must hold in memory. On the other hand, we don't anymore need a cache.

	## How it works

	Basically, each _Backend_ instance inherit from the _AbstractBTreePartition_ class. We can see that a _Backend_ must be a Btree.

	Data are stored into various tables. In fact, we have one table containing all the entries - the MasterTable - and many indexes.

	### MasterTable

	The _MasterTable_ contains all the entries, serialized.

	This table is a <Key, Value> BTree, where the key is the entry's UUID, and the value the serialized entry.

	<DIV class="note" markdown="1">
	Theoretically, we could be able to read it, and restore the whole database, indexes included, assuming that we know which index we have to create. Sadly, it's not enough, as the entries are stored without any information about their position in the DIT.
	</DIV>

	### Indexes

	Each index is also a <key, value> BTree, with some exceptions : as we may store multi-valued elements, it's perfectly possible that the value will grow up to a point it's extremely costly to store it serialized. For instance, the _ObjectClass_ index may have thousands of entries for the _Person_ key.

	In this case, we use a sub-btree, which is a <key,key> BTree (as strange as it sounds, it's an easy way to add a new key without having to rewrite the full value).

	The key can be a _String_, or a _ParentIdAndRdn_.

	We have 7 system indexes, which are created when the server is started :

	* ObjectClass : to easily find any entry associated with a give _ObjectClass_
	* EntryCsn : The Change Sequence Number index
	* Rdn : A special index containing a RDN and its parent
	* Presence : An index used when searching for the presence of an attributeType in an entry
	* Alias : An index used for aliases
	* OneAlias : An index used for children aliases
	* SubAlias : An index used of descendant aliases

	The user may define many different indexes, depending on his or her needs.

	### The ParentIdAndRdn index

	This index is special, as it's used to associate an entry to a position in the DIT. Assuming that each entry has a _Dn_, and that this _Dn_ describes a hierarchy, the _ParentIdAndRdn_ index depicts this hierarchy.

	The _ParentId_ part refers to the _UUID_ of the parent for the current entry. The _Rdn_ part is the entry _Rdn_. In order to rebuild the full _Dn_ for a given entry, we must get all the _ParentIdAndRdn_ up to the root to grab all the needed _Rdn_.

	This index is also used to process one level and sub level indexes.