content/api/internal-design-guide/8-schema.mdtext - directory-site - Git at Google

 Title: 8 - Schema
 NavPrev: 7-ldap-messages.html
 NavPrevText: 7 - LDAP Messages
 NavUp: ../internal-design-guide.html
 NavUpText: Internal Design Guide
 NavNext: 9-dn.html
 NavNextText: 9 - DN
 Notice: Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
     .
     http://www.apache.org/licenses/LICENSE-2.0
     .
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 # 8 - Schema

 ## Schema parsers

 We use a set of classes to parse schema elements. There are 11 flavors of schema elements, 8 of them being described in a **RFC**, 3 of them being ApacheDS proprietary:

 * [AttributeType](https://tools.ietf.org/html/rfc4512#section-4.1.2)
 * [DitContentRule](https://tools.ietf.org/html/rfc4512#section-4.1.6)
 * [DitStructureRule](https://tools.ietf.org/html/rfc4512#section-4.1.7.1)
 * [LDAPSyntax](https://tools.ietf.org/html/rfc4512#section-4.1.5)
 * [MatchingRule](https://tools.ietf.org/html/rfc4512#section-4.1.3)
 * [MatchingRuleUse](https://tools.ietf.org/html/rfc4512#section-4.1.4)
 * [NameForm](https://tools.ietf.org/html/rfc4512#section-4.1.7.2)
 * [ObjectClass](https://tools.ietf.org/html/rfc4512#section-4.1.1)

 and

 * LdapComparator
 * Normalizer
 * SyntaxChecker

 We need to be able to parse those schema elements because they can be added into the server as a description (ie, a String representing one of those schema elements as defined by the RFC). For the same reason, the **LDAP API** need to validate that those schema elements are valid before sending them to a **LDAP SERVER**, or to be able to properly parse what it gets from a **LDAP server**.

 ## Strict vs quirks mode

 Here we have a problem : most of the LDAP server implementation violate the RFC. We can't simply expect the String representing a schema element to be compliant with the RFC. Some typical deviations are :

 * OpenLDAP uses some macro instead of OIDs. This is convenient, as it allows to define the root OID with a name, and reuse it in the associated schema elements
 * AD and many other servers expect some specific characters to be accepted, like '_', ':', '#', ...
 * Sometime, the values may come without quotes, when it's required
 * etc.

 We will define the _strict mode_ a mode which follows the **RFC** tightly, and the _quirks mode_ a relaxed version of the parser, more permissive. One can use either the strict or relaxed mode using a flag.

 ### Strict mode

 The only thing we will relax is the order in which the various parts of each description is present in a schema description : we don't expect them to be ordered as described in the RFC.

 The various parts are defined using a few syntaxes :

 * _NAME_: qdescrs
 * _DESC_: qdstring
 * _SUP_ (**ObjectClass**), _MUST_, _MAY_, _APPLIES_, _AUX_, _NOT_: oids
 * _SUP_ (**AttributeType**), _EQUALITY_, _ORDERING_, _SUBSTR_, _FORM_, _OC_: oid
 * _SYNTAX_ (**AttributeType**): noidlen
 * _SYNTAX_ (**MathingRule**): numericoid
 * _SUP_ (**DitStructureRule**): ruleids

 * _descr_: oid, qdescrs
 * _qdescr_: qdescrs, qdescrlist

 _qdescrs_ and _oids_ may contain one or many _qdescr_ and _oid_.

 #### descr, strict

 The _descr_ construct is used by _oid_ and _qdescrs_ (an _OID_ can be a name). The strict mode will use this grammar :

     descr       ::= keystring
     keystring   ::= leadkeychar keychar*
     leadkeychar ::= ALPHA
     keychar     ::= ALPHA | DIGIT | HYPHEN
     ALPHA       ::= ['A'..'Z'] | ['a'..'z']
     DIGIT       ::= ['0'..'9']
     HYPHEN      ::= '-'
     SQUOTE      ::= '\''

 #### qdstring, strict

 A _qdstring_ can contain any type of **UTF-8** characters, except the simple quote or the backslash, which must be encoded. It's always surrounded by simple quotes :

     :::text
     qdstring    ::= SQUOTE dstring SQUOTE
     dstring     ::= ( QS | QQ | QUTF8 )*
     QQ          ::= ESC %x32 %x37
     QS          ::= ESC %x35 ( %x43 / %x63 )
     QUTF8       ::= QUTF1 | UTFMB
     QUTF1       ::= %x00-26 | %x28-5B | %x5D-7F

 #### qdescr, strict

 _qdescr_ is a quoted name, where the first char must be alphabetic, and the following chars must be alphabetic, digits or hyphen. Here is the **ABNF** for _qdescr_ :

     :::text
     qdescr      ::= SQUOTE descr SQUOTE

 #### noidlen, strict

 ### Relaxed mode

 #### qdstring, relaxed

 There

 #### descr, relaxed

 The relaxed _descr_ accepts more characters, like underscore, semi-colon, dot, colon or sharp. The leadkeychar will not be mandatory, too. Here is the **ABNF** we will accept :

     relaxed-descr   ::= relaxed-keystring
     leaxed-keystring::= keychar+
     relaxed-keychar ::= ALPHA | DIGIT | HYPHEN | UNDERSCORE | SEMICOLON | DOT | COLON | SHARP
     ALPHA           ::= ['A'..'Z'] | ['a'..'z']
     DIGIT           ::= ['0'..'9']
     HYPHEN          ::= '-'
     UNDERSCORE      ::= '_'
     SEMI_COLON      ::= ';'
     COLON           ::= ':'
     SDOT            ::= '.'
     SHARP           ::= '#'


 #### qdescr, relaxed

 Compared to the strict mode, we will accept a non-quoted String, or a String using double quotes.

     :::text
     relaxed-qdescr  ::= SQUOTE relaxed-descr SQUOTE | DQUOTE relaxed-descr DQUOTE | relaxed-descr

 #### oid, relaxed

 We will accept quoted and double quoted OIDs and Names, in relaxed mode. Here is teh supported **ABNF** :

     :::text
     oid-relaxed ::= SQUOTE relaxed-descr SQUOTE | DQUOTE relaxed-descr DQUOTE | descr-relaxed |
                     SQUOTE numericoid SQUOTE | DQUOTE numericoid DQUOTE | numericoid

 #### noidlen, strict

 Here, we will allow textual syntax name to be used, not only OIDs. For instance, something like _SYNTAX IA5String_ will be allowed.

 We also allow quoted and double quoted OIDs.
	Title: 8 - Schema
	NavPrev: 7-ldap-messages.html
	NavPrevText: 7 - LDAP Messages
	NavUp: ../internal-design-guide.html
	NavUpText: Internal Design Guide
	NavNext: 9-dn.html
	NavNextText: 9 - DN
	Notice: Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at
	.
	http://www.apache.org/licenses/LICENSE-2.0
	.
	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	# 8 - Schema

	## Schema parsers

	We use a set of classes to parse schema elements. There are 11 flavors of schema elements, 8 of them being described in a RFC, 3 of them being ApacheDS proprietary:

	* [AttributeType](https://tools.ietf.org/html/rfc4512#section-4.1.2)
	* [DitContentRule](https://tools.ietf.org/html/rfc4512#section-4.1.6)
	* [DitStructureRule](https://tools.ietf.org/html/rfc4512#section-4.1.7.1)
	* [LDAPSyntax](https://tools.ietf.org/html/rfc4512#section-4.1.5)
	* [MatchingRule](https://tools.ietf.org/html/rfc4512#section-4.1.3)
	* [MatchingRuleUse](https://tools.ietf.org/html/rfc4512#section-4.1.4)
	* [NameForm](https://tools.ietf.org/html/rfc4512#section-4.1.7.2)
	* [ObjectClass](https://tools.ietf.org/html/rfc4512#section-4.1.1)

	and

	* LdapComparator
	* Normalizer
	* SyntaxChecker

	We need to be able to parse those schema elements because they can be added into the server as a description (ie, a String representing one of those schema elements as defined by the RFC). For the same reason, the LDAP API need to validate that those schema elements are valid before sending them to a LDAP SERVER, or to be able to properly parse what it gets from a LDAP server.

	## Strict vs quirks mode

	Here we have a problem : most of the LDAP server implementation violate the RFC. We can't simply expect the String representing a schema element to be compliant with the RFC. Some typical deviations are :

	* OpenLDAP uses some macro instead of OIDs. This is convenient, as it allows to define the root OID with a name, and reuse it in the associated schema elements
	* AD and many other servers expect some specific characters to be accepted, like '_', ':', '#', ...
	* Sometime, the values may come without quotes, when it's required
	* etc.

	We will define the _strict mode_ a mode which follows the RFC tightly, and the _quirks mode_ a relaxed version of the parser, more permissive. One can use either the strict or relaxed mode using a flag.

	### Strict mode

	The only thing we will relax is the order in which the various parts of each description is present in a schema description : we don't expect them to be ordered as described in the RFC.

	The various parts are defined using a few syntaxes :

	* _NAME_: qdescrs
	* _DESC_: qdstring
	* _SUP_ (ObjectClass), _MUST_, _MAY_, _APPLIES_, _AUX_, _NOT_: oids
	* _SUP_ (AttributeType), _EQUALITY_, _ORDERING_, _SUBSTR_, _FORM_, _OC_: oid
	* _SYNTAX_ (AttributeType): noidlen
	* _SYNTAX_ (MathingRule): numericoid
	* _SUP_ (DitStructureRule): ruleids

	* _descr_: oid, qdescrs
	* _qdescr_: qdescrs, qdescrlist

	_qdescrs_ and _oids_ may contain one or many _qdescr_ and _oid_.

	#### descr, strict

	The _descr_ construct is used by _oid_ and _qdescrs_ (an _OID_ can be a name). The strict mode will use this grammar :

	descr ::= keystring
	keystring ::= leadkeychar keychar*
	leadkeychar ::= ALPHA
	keychar ::= ALPHA \| DIGIT \| HYPHEN
	ALPHA ::= ['A'..'Z'] \| ['a'..'z']
	DIGIT ::= ['0'..'9']
	HYPHEN ::= '-'
	SQUOTE ::= '\''

	#### qdstring, strict

	A _qdstring_ can contain any type of UTF-8 characters, except the simple quote or the backslash, which must be encoded. It's always surrounded by simple quotes :

	:::text
	qdstring ::= SQUOTE dstring SQUOTE
	dstring ::= ( QS \| QQ \| QUTF8 )*
	QQ ::= ESC %x32 %x37
	QS ::= ESC %x35 ( %x43 / %x63 )
	QUTF8 ::= QUTF1 \| UTFMB
	QUTF1 ::= %x00-26 \| %x28-5B \| %x5D-7F

	#### qdescr, strict

	_qdescr_ is a quoted name, where the first char must be alphabetic, and the following chars must be alphabetic, digits or hyphen. Here is the ABNF for _qdescr_ :

	:::text
	qdescr ::= SQUOTE descr SQUOTE

	#### noidlen, strict

	### Relaxed mode

	#### qdstring, relaxed

	There

	#### descr, relaxed

	The relaxed _descr_ accepts more characters, like underscore, semi-colon, dot, colon or sharp. The leadkeychar will not be mandatory, too. Here is the ABNF we will accept :

	relaxed-descr ::= relaxed-keystring
	leaxed-keystring::= keychar+
	relaxed-keychar ::= ALPHA \| DIGIT \| HYPHEN \| UNDERSCORE \| SEMICOLON \| DOT \| COLON \| SHARP
	ALPHA ::= ['A'..'Z'] \| ['a'..'z']
	DIGIT ::= ['0'..'9']
	HYPHEN ::= '-'
	UNDERSCORE ::= '_'
	SEMI_COLON ::= ';'
	COLON ::= ':'
	SDOT ::= '.'
	SHARP ::= '#'


	#### qdescr, relaxed

	Compared to the strict mode, we will accept a non-quoted String, or a String using double quotes.

	:::text
	relaxed-qdescr ::= SQUOTE relaxed-descr SQUOTE \| DQUOTE relaxed-descr DQUOTE \| relaxed-descr

	#### oid, relaxed

	We will accept quoted and double quoted OIDs and Names, in relaxed mode. Here is teh supported ABNF :

	:::text
	oid-relaxed ::= SQUOTE relaxed-descr SQUOTE \| DQUOTE relaxed-descr DQUOTE \| descr-relaxed \|
	SQUOTE numericoid SQUOTE \| DQUOTE numericoid DQUOTE \| numericoid

	#### noidlen, strict

	Here, we will allow textual syntax name to be used, not only OIDs. For instance, something like _SYNTAX IA5String_ will be allowed.

	We also allow quoted and double quoted OIDs.