blob: e0d07166940b108c43c43e5f7368556da426f564 [file] [log] [blame]
Title: 1.5 - Backend
NavPrev: 1.4-interceptors.html
NavPrevText: 1.4 - Interceptors
NavUp: 1-architecture.html
NavUpText: 1 - Architecture
NavNext: 2-server-config.html
NavNextText: 2 - Server Configuration
Notice: Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.
http://www.apache.org/licenses/LICENSE-2.0
.
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
# 1.5 - Backend
The _Backend_ is the part of the server responsible for storing data in a way we can easily retrieve them. This storage does not necessarily have to be remanent : we can have a in-memory _backend_.
## Existing Backends
We currently have 3 different backends :
* JDBM
* LDIF
* In-Memory
### JDBM Backend
The **JDBM** backend is storing data on disk, using **BTrees**. It's fast when it comes to retrieve data, slow when you have to add them.
### In-Memory Backend
This Backend loads in memory a full set of entries. ALl of them must be hold by the existig memory, we don't write on disk anything nor we read anything from disk. If the server is stopped, everything is lost.
### LDIF Backend
It comes in two forms : one single file, or many fles (one per entry). It's always backed by a in-memory _Backend_, otherwise it would not be possible to retrieve the entries.
As we depend on a in-memory backend, which handles the indexes, we have to create those indexes when this _Backend_ is read, which can be a costly operation.
### Future Backends
We intend to add another in-memory backend, based on _Mavibot_, a **MVCC BTREE**. The biggest advantage over the other systems is that it's fast, it allows concurrent reads without locks when the other _Backend_ block the reads when some write operation is being processed. Also it saves on disk it contents periodically, and has a Journal so that we can recover from a crash.
The only drawback is that all the entries and indexes must hold in memory. On the other hand, we don't anymore need a cache.
## How it works
Basically, each _Backend_ instance inherit from the _AbstractBTreePartition_ class. We can see that a _Backend_ **must** be a **Btree**.
Data are stored into various tables. In fact, we have one table containing all the entries - the **MasterTable** - and many indexes.
### MasterTable
The _MasterTable_ contains all the entries, serialized.
This table is a <Key, Value> **BTree**, where the key is the entry's **UUID**, and the value the serialized entry.
<DIV class="note" markdown="1">
Theoretically, we could be able to read it, and restore the whole database, indexes included, assuming that we know which index we have to create. Sadly, it's not enough, as the entries are stored without any information about their position in the **DIT**.
</DIV>
### Indexes
Each index is also a <key, value> **BTree**, with some exceptions : as we may store multi-valued elements, it's perfectly possible that the value will grow up to a point it's extremely costly to store it serialized. For instance, the _ObjectClass_ index may have thousands of entries for the _Person_ key.
In this case, we use a sub-btree, which is a <key,key> **BTree** (as strange as it sounds, it's an easy way to add a new key without having to rewrite the full value).
The key can be a _String_, or a _ParentIdAndRdn_.
We have 7 system indexes, which are created when the server is started :
* ObjectClass : to easily find any entry associated with a give _ObjectClass_
* EntryCsn : The Change Sequence Number index
* Rdn : A special index containing a RDN and its parent
* Presence : An index used when searching for the presence of an attributeType in an entry
* Alias : An index used for aliases
* OneAlias : An index used for children aliases
* SubAlias : An index used of descendant aliases
The user may define many different indexes, depending on his or her needs.
### The ParentIdAndRdn index
This index is special, as it's used to associate an entry to a position in the **DIT**. Assuming that each entry has a _Dn_, and that this _Dn_ describes a hierarchy, the _ParentIdAndRdn_ index depicts this hierarchy.
The _ParentId_ part refers to the _UUID_ of the parent for the current entry. The _Rdn_ part is the entry _Rdn_. In order to rebuild the full _Dn_ for a given entry, we must get all the _ParentIdAndRdn_ up to the root to grab all the needed _Rdn_.
This index is also used to process one level and sub level indexes.