content/api/internal-design-guide/4-asn1.mdtext - directory-site - Git at Google

 Title: 4 - ASN/1
 NavPrev: 3-building.html
 NavPrevText: 3 - Building
 NavUp: ../internal-design-guide.html
 NavUpText: Internal Design Guide
 NavNext: 4.1-asn1-tlv.html
 NavNextText: 4.1 - ASN/1 TLV
 Notice: Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
     .
     http://www.apache.org/licenses/LICENSE-2.0
     .
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.

 # 4 - ASN/1

 To be completed...


 The **LDAP** protocol is based on an **ASN/1** description. We will notexplain in detail what is **ASN/1** about, you would rather check [This page](https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One) for a very limited introduction, or if you feel teh need to understand what is **ASN/1** in detail, just read the [Olivier Dubuisson's book on ASN.1](http://www.oss.com/asn1/resources/books-whitepapers-pubs/dubuisson-asn1-book.PDF) (This is probably the best reference !)

 Anyway, we use a subset of **ASN/1**, as what we have to deal with is the **BER/DER** encoding. (**BER** or **DER** stands for **B**asic **E**ncoding **R**ule and **D**istinguished **E**ncoding **R**ule. There are other possible encoding, like **PER**, **XER**, **CER**, but they are irrelevant for **LDAP**)

 What is needed to know is that **ASN/1** is just a notation used to describe the messages being exchanged between a client and a server, and in order to use it, we need an encoder and a decoder on both sides :

 ![Client/Server communication](images/asn1-codec.png)

 ## ASN/1 implementation in Apache LDAP API

 It took a long time to get it right ! And it's not perfect :-)

 The very first iteration was using a proprietary library (**IBM SNACC**), but that was before **ApacheDS** became a **TLP** ! The next iteration was based on a rewriting system, which was pretty slow. Then came **Snicker**, a _State Machine_ based decoder, which is currently what we use. We might change for a faster implementation, like what **Kerby** is using...

 ### ASN/1 messages

 Let's start with the basic information.

 An encoded ASN/1 message is a tuple contianing two or three elements : a **T**ype, a **L**ength and optionally - ie if the length is not 0 - a **V**alue. This tuple is called a **TLV**. Every message is a **TLV**.

 But a message can be have complex structure, so a **TLV** itself can encapsulate some **TLV**s. Actually the **V** part can be a list of **TLV**s. This is recursive...

 A typical encoded message can therefore represented this way :

     :::
     [TL [TLV] [TL [TLV] [TLV]]]

 Here, the message **TLV** value is a set of two **TLV**s, teh second one being itself a composition of 2 **TLV**s.

 The **T** describe the type of value, the **L** gives the length of this value (can be 0) and of course the **V** is the value, which can itself be a **TLV**.

 ### Encoder/Decoder

 There are two aspects we have to deal with :

 * encoding messages
 * decoding messages

 Those are two different things, and we don't use the same mechanism. **Encoding** is done using a _State Machine_, and **Decoding** which is hard wired in each class implementing a message.

 As we said, it's not perfect, first because it's complex to implement, complex to add a new message, and complex to test. We don't have a compiler that generates the stubs to encode or decode messages.

 ### Decoder

 The _Decoder_ work is to take a **byte[]** and transform it into an instance of a jave object. When we receive the **byte[]**, we don't know yet what kind of message we are dealing with, so the creation of the instance is differed.

 We have built a generic decoder that takes some imputs and produces the result, based on those elements :

 * A _Grammar_
 * A _Container_
 * A _StateEnum_
 * A _Decorator_
 * and optionally a _Factory_

 The _Grammar_ describes the transitions and actions of the state machine used to decode a message. Note that the actions can be stored in separate classes.

 The _Container_ is a wrapper around a message that is fed by the State Machine and that will contain the Java instance once fully decoded. It's initally empty.

 The _StateEnum_ is a Java enumeration listing all the possible _Grammar_ states.

 The _Decorator_ is a wrapper used to store a decoded message.

 The _Factory_ is used to create the message instance (it's optional)

 And of course, you have the messsage class that will be created and stored in the _Decorator_

 So what we have is based on a **State Engine**, which means you have to describe


 ### Encoder

 It's slightly simpler : we use the *Decorator* to implement the encoding of a message. Two methods are necessary :

 * _int computeLength()_ : compute the _ByteBuffer_ size necessary to stored the encoded message
 * _ByteBuffer encode( ByteBuffer )_ : actually encode the message into a _ByteBuffer_

 ### The state machine

 So we decode a message using a state machine, which basically transit from one state to another, and optionally execute an action in between :

 ![State Machine transition](images/sm-transition.png)

 Now, let's see a real example.
	Title: 4 - ASN/1
	NavPrev: 3-building.html
	NavPrevText: 3 - Building
	NavUp: ../internal-design-guide.html
	NavUpText: Internal Design Guide
	NavNext: 4.1-asn1-tlv.html
	NavNextText: 4.1 - ASN/1 TLV
	Notice: Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at
	.
	http://www.apache.org/licenses/LICENSE-2.0
	.
	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	# 4 - ASN/1

	To be completed...


	The LDAP protocol is based on an ASN/1 description. We will notexplain in detail what is ASN/1 about, you would rather check [This page](https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One) for a very limited introduction, or if you feel teh need to understand what is ASN/1 in detail, just read the [Olivier Dubuisson's book on ASN.1](http://www.oss.com/asn1/resources/books-whitepapers-pubs/dubuisson-asn1-book.PDF) (This is probably the best reference !)

	Anyway, we use a subset of ASN/1, as what we have to deal with is the BER/DER encoding. (BER or DER stands for Basic Encoding Rule and Distinguished Encoding Rule. There are other possible encoding, like PER, XER, CER, but they are irrelevant for LDAP)

	What is needed to know is that ASN/1 is just a notation used to describe the messages being exchanged between a client and a server, and in order to use it, we need an encoder and a decoder on both sides :

	![Client/Server communication](images/asn1-codec.png)

	## ASN/1 implementation in Apache LDAP API

	It took a long time to get it right ! And it's not perfect :-)

	The very first iteration was using a proprietary library (IBM SNACC), but that was before ApacheDS became a TLP ! The next iteration was based on a rewriting system, which was pretty slow. Then came Snicker, a _State Machine_ based decoder, which is currently what we use. We might change for a faster implementation, like what Kerby is using...

	### ASN/1 messages

	Let's start with the basic information.

	An encoded ASN/1 message is a tuple contianing two or three elements : a Type, a Length and optionally - ie if the length is not 0 - a Value. This tuple is called a TLV. Every message is a TLV.

	But a message can be have complex structure, so a TLV itself can encapsulate some TLVs. Actually the V part can be a list of TLVs. This is recursive...

	A typical encoded message can therefore represented this way :

	:::
	[TL [TLV] [TL [TLV] [TLV]]]

	Here, the message TLV value is a set of two TLVs, teh second one being itself a composition of 2 TLVs.

	The T describe the type of value, the L gives the length of this value (can be 0) and of course the V is the value, which can itself be a TLV.

	### Encoder/Decoder

	There are two aspects we have to deal with :

	* encoding messages
	* decoding messages

	Those are two different things, and we don't use the same mechanism. Encoding is done using a _State Machine_, and Decoding which is hard wired in each class implementing a message.

	As we said, it's not perfect, first because it's complex to implement, complex to add a new message, and complex to test. We don't have a compiler that generates the stubs to encode or decode messages.

	### Decoder

	The _Decoder_ work is to take a byte[] and transform it into an instance of a jave object. When we receive the byte[], we don't know yet what kind of message we are dealing with, so the creation of the instance is differed.

	We have built a generic decoder that takes some imputs and produces the result, based on those elements :

	* A _Grammar_
	* A _Container_
	* A _StateEnum_
	* A _Decorator_
	* and optionally a _Factory_

	The _Grammar_ describes the transitions and actions of the state machine used to decode a message. Note that the actions can be stored in separate classes.

	The _Container_ is a wrapper around a message that is fed by the State Machine and that will contain the Java instance once fully decoded. It's initally empty.

	The _StateEnum_ is a Java enumeration listing all the possible _Grammar_ states.

	The _Decorator_ is a wrapper used to store a decoded message.

	The _Factory_ is used to create the message instance (it's optional)

	And of course, you have the messsage class that will be created and stored in the _Decorator_

	So what we have is based on a State Engine, which means you have to describe


	### Encoder

	It's slightly simpler : we use the Decorator to implement the encoding of a message. Two methods are necessary :

	* _int computeLength()_ : compute the _ByteBuffer_ size necessary to stored the encoded message
	* _ByteBuffer encode( ByteBuffer )_ : actually encode the message into a _ByteBuffer_

	### The state machine

	So we decode a message using a state machine, which basically transit from one state to another, and optionally execute an action in between :

	![State Machine transition](images/sm-transition.png)

	Now, let's see a real example.