| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <document> |
| <properties> |
| <author email="akarasulu@apache.org">Alex Karasulu</author> |
| <title>Refactoring the ASN.1 Runtime</title> |
| </properties> |
| |
| <body> |
| <section name="Refactoring the ASN.1 Runtime"> |
| <p> |
| The use of Snacc4J as the runtime ASN.1 BER codec for LDAP impossed an |
| IP issue for the new Directory Project under incubation. This resulted |
| in the creation of our own implementation, and hence the Apache ASN.1 |
| Runtime library was created. |
| </p> |
| |
| <p> |
| Before continuing any further it might be a good idea to read about |
| the existing architecture to understand the changes that are being |
| proposed. |
| </p> |
| |
| <subsection name="High Level Goals and Changes"> |
| <p> |
| The internal 0.2 release was the first successful attempt to produce a |
| replacement for Snacc4J. As of release 0.8 of ApacheDS it provides |
| BER encoders and decoders for LDAP requests and responses. The library |
| was designed with performance in mind. Some very good ideas were |
| introduced and really put to the test. However the library does have |
| performance problems. The designs to make this into a high performance |
| library were not totally followed through. Furthermore the code base |
| is very difficult to maintain needing some reorganization. We hope to |
| refactor the library so it is more efficient, and easier to maintain |
| while reducing the number of dependencies it has. In the process we |
| would like to introduce some new features and improvements which are |
| listed below: |
| </p> |
| |
| <ul> |
| <li> |
| Better ByteBuffer utilization by splicing buffers instead of copying |
| them. |
| </li> |
| |
| <li> |
| Repace current Tuple class with well defined Tuple interfaces: |
| specifically we need to remove TLV field processing from a Tuple |
| as well as tag cooking functionality. Tag cooking refers to the |
| application of transformations that turn tag bytes into a 4 byte |
| Java primitive integers. These functions need to be localized |
| within utility classes. |
| </li> |
| |
| <li> |
| Some BER based protocols only use a subset of the encoding rules. |
| For example LDAP only uses definite length encodings for constructed |
| tuples. A reduced set of rules are much easier to code, maintain, |
| and often will perform significantly better than codecs designed for |
| the entire rule set. The key here however is to make sure that |
| the core of the codec can be replaced transparently without imposing |
| code changes. |
| </li> |
| |
| <li> |
| The Tuples of primitives like binary values store the Tag, Length |
| and Value of the primitive TLV Tuple in memory. Sometimes primitive |
| values can be dangerously large for a server to encode or decode. |
| Primitive tuples could be blobs of large binaries like images. If |
| tuple values are larger than some application defined limit they |
| aught to be streamed to disk rather than kept in main memory. |
| Streaming to disk makes the server more efficient overall since it |
| can maintain a constant sized decoding footprint. However switching |
| to disk based storage will rightfully slow down the current operation |
| which involves a large primitive. This is a tradeoff that should |
| be configurable by API users and ultimately ApacheDS administrators. |
| </li> |
| |
| <li> |
| Better logging and error handling for codecs with pershaps some |
| management interfaces to control the properties of codecs. |
| </li> |
| |
| <li> |
| A single deployable artifact where the ber and codec jars are fused. |
| </li> |
| |
| <li> |
| Make the code easier to maintain while improving its structure. |
| </li> |
| </ul> |
| </subsection> |
| |
| |
| </section> |
| |
| <section name="Tuple Interface/Class Hierarchies"> |
| <p> |
| Presently Tuples contain the functionality to decode and encode |
| fields. Tuples can even encode themselves to a buffer as BER or |
| DER. A Tuple is not a simple bean and that's all that it should be. |
| Hence one of our goals is to factor out this additional functionality. |
| </p> |
| |
| <p> |
| A Tuple is a single class that acts more like a union of different |
| types rather than using inheritance to differentiate. There are |
| distinct types of tuples, constructed verses primitive for example. |
| Instead of using complex logic to differentiate what kind of Tuple an |
| instance is it is much better to differentiate the Tuple into |
| subclasses. Hence we propose a new interface and implementation |
| hierarchy for Tuples. |
| </p> |
| |
| <p> |
| Let's start by proposing a minimal Tuple interface. |
| </p> |
| |
| <source> |
| interface Tuple |
| { |
| /** |
| * Gets the zero based index into a PDU where the first byte of this |
| * Tuple's tag resides. |
| * |
| * @return zero based index of Tag's first byte in the PDU |
| */ |
| int getTagStartIndex(); |
| |
| /** |
| * Gets this TLV Tuple's Tag (T) as a type safe enumeration. |
| * |
| * @return type safe enumeration for the Tag |
| */ |
| TagEnum getTag(); |
| |
| /** |
| * Gets whether or not this Tuple is constructed. |
| * |
| * @return true if the Tag is constructed false if it is primitive. |
| */ |
| boolean isConstructed(); |
| } |
| </source> |
| |
| <p> |
| These interfaces give the minimum information needed for a Tuple |
| that is not specific to another specialized type of Tuple. Meaning |
| all Tuples share these methods. We can also go a step further and |
| implement an AbstractTuple where protected members are used to |
| implement these methods. Note that isConstructed() will probably be |
| left abstract so subclasses can just return true or false. For |
| brevity this code is not shown but other classes in the section below |
| will extend from AbstractTuple. |
| </p> |
| |
| <subsection name="Primitive Vs. Constructed Tuples"> |
| <p> |
| We need to go a step further and start differentiating between Tuples |
| that are primitive and those that are constructed. In this step we |
| introduce two new abstract classes PrimitiveTuple and |
| ConstructedTuple. |
| </p> |
| |
| <p> |
| These two classes will be described below but one might ask why both |
| are still abstract. This is because we need to differentiate further |
| for buffered verses streamed Tuples in the case of primitive Tuples. |
| For constructed Tuples we need to differentiate between definate |
| length verses indefinite length Tuples. With our approach, only the |
| leaf nodes of the inheritance hierarchy will be concrete. Below is |
| the definition for the PrimitiveTuple. |
| </p> |
| |
| <source> |
| public abstract class PrimitiveTuple extends AbstractTuple |
| { |
| /** the number of bytes used to compose the Tuple's length field */ |
| protected int lengthFieldSz = 0; |
| /** the number of bytes used to compose the Tuple's value field */ |
| protected int valueFieldSz = 0; |
| |
| ... |
| |
| public final boolean isConstructed() |
| { |
| return false; |
| } |
| |
| /** |
| * Gets whether or not this Tuple's value is buffered in memory or |
| * streamed to disk. |
| * |
| * @return true if the value is buffered in memory, false if it is streamed |
| * to disk |
| */ |
| public abstract boolean isBuffered(); |
| |
| /** |
| * Gets the number of bytes in the length (L) field of this TLV Tuple. |
| * |
| * @return number of bytes for the length |
| */ |
| public final int getLengthFieldSize() |
| { |
| return lengthFieldSz; |
| } |
| |
| /** |
| * Gets the number of bytes in the value (V) field of this TLV Tuple. |
| * |
| * @return number of bytes for the value |
| */ |
| public final int getValueFieldSize(); |
| { |
| return valueFieldSz; |
| } |
| |
| ... |
| } |
| </source> |
| <p> |
| This abstract class adds two new concrete methods for tracking the |
| size of the length and value fields. Constructed Tuples may not |
| necessarily have a length value associated with them if they are |
| of the indeterminate form. Furthermore the value of constructed |
| Tuples are the nested child Tuples subordinate to them. So there |
| is no need to track the value prematurely now for anything other |
| than primitive Tuples. |
| </p> |
| |
| <p> |
| Note that the isBuffered() method is implemented as final and always |
| returns false for this lineage of Tuples. A final modifier on the |
| method makes sense and sometimes helps the compiler inline this |
| method so we don't always pay a price for using it in addition to |
| subclassing. A new abstract method isBuffered() is introduced which |
| is discussed in detail within the Buffered Vs. Streamed section. |
| </p> |
| |
| <p> |
| Now let's take a look at the ConstructedTuple abstract class. |
| </p> |
| |
| <source> |
| public abstract class ConstructedTuple extends AbstractTuple |
| { |
| public final boolean isConstructed() |
| { |
| return true; |
| } |
| |
| /** |
| * Gets whether or not the length of this constructed Tuple is of the |
| * definate form or of the indefinite length form. |
| * |
| * @return true if the length is definate, false if the length is of the |
| * indefinite form |
| */ |
| public abstract boolean isLengthDefinate(); |
| } |
| </source> |
| |
| <p> |
| ConstructedTuple implements the <code>isConstructed()</code> method |
| as final since it will always return false for this lineage of |
| Tuples. Also a new abstract method isLengthDefinate() is introduced |
| to see if the Tuple uses the indefinite length form or not. |
| </p> |
| </subsection> |
| |
| <subsection name="Definate Vs. Indefinite Length"> |
| <p> |
| The ConstructedTuple can be further differentiated into two |
| subclasses to represent definate and indefinite length constructed |
| TLV Tuples. The indefinite form does not have a length value |
| associated with it where as the definate lenght form does. Let's |
| explore the concrete IndefiniteLegthTuple definition. |
| </p> |
| |
| <source> |
| public class IndefiniteLength extends ConstructedTuple |
| { |
| public final boolean isLengthDefinate() |
| { |
| return false; |
| } |
| } |
| </source> |
| |
| <p> |
| Yep this is pretty simple. There is very little to track for this |
| Tuple since most of the tracking is handled by its decendent Tuples. |
| The class also is concrete. What about the DefinateLength |
| implementation ... |
| </p> |
| |
| <source> |
| public class DefinateLength extends ConstructedTuple |
| { |
| /** the number of bytes used to compose the Tuple's length field */ |
| protected int lengthFieldSz = 0; |
| /** the number of bytes used to compose the Tuple's value field */ |
| protected int valueFieldSz = 0; |
| |
| ... |
| |
| public final boolean isLengthDefinate() |
| { |
| return true; |
| } |
| |
| /** |
| * Gets the number of bytes in the length (L) field of this TLV Tuple. |
| * |
| * @return number of bytes for the length |
| */ |
| public final int getLengthFieldSize() |
| { |
| return lengthFieldSz; |
| } |
| |
| /** |
| * Gets the number of bytes in the value (V) field of this TLV Tuple. |
| * |
| * @return number of bytes for the value |
| */ |
| public final int getValueFieldSize(); |
| { |
| return valueFieldSz; |
| } |
| } |
| </source> |
| <p> |
| Now this introduces two new concrete methods for getting the length |
| of the length field and the length of the value field. A determinate |
| length TLV has a valid value within the Length (L) field. The value |
| of the length field is the length of the value field. Hence the |
| reason why we include both these concrete methods. |
| </p> |
| </subsection> |
| |
| <subsection name="Buffered Vs. Streamed PrimitiveTuples"> |
| <p> |
| As we mentioned before, there are two kinds of primitive Tuples. |
| Those that keep there value in a buffer within the TLV Tuple object, |
| in which case it is buffered within memory, and those that stream |
| the value to disk and store a referral to the value on disk. These |
| two beasts are so different it makes sense to differentiate between |
| them using subclasses. Let's take a look at a BufferedTuple which |
| is the simplest one. |
| </p> |
| |
| <source> |
| public class BufferedTuple extends PrimitiveTuple |
| { |
| /** contains ByteBuffers which contain parts of the value */ |
| private final ArrayList value = new ArrayList(); |
| /** pre-fab final unmodifiable wrapper around our modifiable list */ |
| private final List unmodifiable = Collections.unmodifiableList( value ); |
| |
| public final boolean isBuffered() |
| { |
| return true; |
| } |
| |
| /** |
| * Gets the value of this Tuple as a List of ByteBuffers. |
| * |
| * @return a list of ByteBuffers containing parts of the value |
| */ |
| public final List getValue() |
| { |
| return unmodifiable; |
| } |
| } |
| </source> |
| |
| <p> |
| The implementation introduces a final <code>getValue()</code> method |
| which returns an unmodifiable wrapper around a modifiable list of |
| ByteBuffers. The <code>isBuffered()</code> method is made final and |
| implemented to return true all the time. This is easy so let's now |
| take a look at the StreamedTuple implementation. |
| </p> |
| |
| <source> |
| public abstract class StreamedTuple extends PrimitiveTuple |
| { |
| public final boolean isBuffered() |
| { |
| return false; |
| } |
| |
| // might experiment with a getURL to represent the source of |
| // the data stream - we need to discuss this on the list |
| |
| /** |
| * Depending on the backing store used for accessing streamed data there |
| * may need to be multiple subclasses that implement this method. |
| * |
| * @return an InputStream that can be used to read this Tuple's streamed |
| * value data |
| */ |
| public abstract InputStream getValueStream(); |
| |
| // another question is whether or not to offer a readable Channel instead |
| // of an InputStream? This is another topic for discussion. |
| } |
| </source> |
| |
| <p> |
| At this point we know that there could be multiple ways to implement |
| this kind of StreamedTuple. Notice though the value is accessed |
| through a stream provided by the Tuple. This way the large value |
| stored on disk need not all be kept in memory at one time during the |
| decode or encode process. |
| </p> |
| |
| </subsection> |
| |
| <p> |
| Some code will be removed from the Tuple class today during the |
| refactoring and kept in a TupleUtils class. Functionality like |
| the encoding, decoding of Tuple fields and tag cooking can be |
| offloaded to this class. |
| </p> |
| </section> |
| |
| <section name="notes"> |
| <p> |
| By far the largest part of the refactoring effort is in introducing |
| this new hierarchy and introducing some patterns that improve the |
| maintainability of the code like the State pattern. Other minor |
| details for this dev cycle are discussed below. |
| </p> |
| |
| <subsection name="Termination Tuples"> |
| <p> |
| A lot of effort is made to track the position of a Tuple within a |
| PDU. This is why we have methods like getTagStartIndex(). We want |
| to know where the first byte of a Tuple's tag is within a PDU. This |
| positional accounting enables better error reporting when problems |
| result. They also allow us to detect when we start and stop |
| processing a PDU. |
| </p> |
| |
| <p> |
| The minimum amount of information needed to track the position of a |
| Tuple within a PDU or the start and stop points of a PDU is to have |
| the Tuple's tag start index, and the lengths of fields within the |
| Tuple. |
| </p> |
| |
| <p> |
| In a decoder for example we know that we've processed the last |
| topmost Tuple of a PDU when we get a Tuple whose <code> |
| getTagStartIndex()</code> returns 0. <b>WARNING</b>: AbstractTuple |
| should default the value the start tag index to -1 so it cannot |
| be interpretted as a terminator. |
| </p> |
| </subsection> |
| |
| <subsection name="New Coherent Replacement for Stateful Codec API"> |
| <p> |
| There have been many complaints about the codec API being too |
| generic or the callback mechanism being somewhat unintuitive. |
| Perhaps we can work on more specific interfaces which incorporate |
| the concepts of producer and consumer. Plus let's see if we can |
| make these interfaces specific so we don't have ugly codes and casts |
| all over the place. |
| </p> |
| |
| <p> |
| Also in the end we want to do away with this codec API which was |
| originally intended to fuse back into commons. I've abandoned this |
| idea because it is too difficult to make all parties happy. The |
| best thing to do is create our own interface that fit well and |
| enable them to be wrapped for other APIs. Hence going towards custom |
| codec API's is not an issue. The old codec stuff can be pushed into |
| the protocol framework API. |
| </p> |
| |
| <p> |
| Furthermore at the end of the day we want there to be a single runtime |
| jar without any dependencies for the ASN.1 stuff. That means no more |
| codec API as it is with jar today within the ASN.1 project. |
| </p> |
| |
| <p> |
| Some new producer consumer interface ideas are listed below: |
| </p> |
| |
| <ul> |
| <li> |
| BufferConsumer: consumes ByteBuffers. Something like <code>void |
| consume(ByteBuffer bb)</code> comes to mind. Perhaps even with |
| overloads to take a list or array of BBs. |
| </li> |
| |
| <li> |
| TupleProducer: generates Tuples (often is a BufferConsumer). Some |
| thing like <code>void setConsumer(TupleConsumer consumer)</code> |
| comes to mind. |
| </li> |
| |
| <li> |
| TupleConsumer: consumes Tuples generated by a TupleProducer. |
| Something like <code>void consume(Tuple tlv)</code> comes to mind. |
| </li> |
| |
| <li> |
| MessageProducer: produces populated message stubs |
| </li> |
| </ul> |
| </subsection> |
| |
| <subsection name="Possibly Merging TupleNode and Tuple"> |
| <p> |
| Right now to build Tuple trees we use yet another class to wrap |
| Tuples called TupleNodes. This kept the contents of the Tuple |
| class less conjested. The Tuple class will no longer exist and the |
| conjestion issues is no longer valid. The question now is, is it |
| worth keeping parent child methods in TupleNode when creating trees |
| while paying for extra object creation? |
| </p> |
| |
| <p> |
| Note that the TupleNode methods are not required on Tuple to process |
| a byte stream of encoded TLV data in a sax-like fashion. These |
| methods are only required for higher level operations like visitations |
| from visitors during the encoding process. The question really is |
| whether we will make Tuple impure to save a little time so we don't |
| have to create TupleNode objects to wrap Tuples and model the |
| hierarchy? This is something that needs to be discussed. |
| </p> |
| |
| <p> |
| Contrary to the purist approach of keeping Tuple and TupleNode |
| separate one can merge the two. A codec need not honor these methods |
| by building the tree. Meaning these tree node (TupleNode) methods |
| may simply return null. If these methods are honored then it is the |
| intent of the codec to build a tree. If the tree is built the |
| processing is more like DOM and if not then it is more like SAX. We |
| should not tax the DOM like processing use case by forcing the need |
| to create extra wrappers, while accomodating the purist view. |
| </p> |
| </subsection> |
| |
| <subsection name="Removing the Digester Concept"> |
| <p> |
| I don't know what I was thinking when I devised this rule based |
| approach similar to the Digester in commons. This was a big mistake |
| and IMO one of the reasons why we have performance issues. This |
| datastructure can be removed entirely from upper layers that depend |
| on it. |
| </p> |
| |
| <p> |
| Granted this means we are going to have to weave once again our own |
| classes for handling LDAP specific PDU's however I think this will be |
| easy to do. I will essentially rewrite the LDAP provider based on |
| our runtime to hardcode the switching rather than using this rule |
| based triggering approach. The new approach is also going to |
| simplify the code significantly making it more maintainable. |
| Hopefully these changes will also speed up the code since less |
| objects will need to be created every time a decoder is instantiated. |
| </p> |
| </subsection> |
| |
| <subsection name="It's Time For DER and CER"> |
| <p> |
| We need to find a way to make the rules used while decoding and |
| encoding Tuples plugable. This way we can change the rules to |
| encode as generic BER, reduced BER (for increases in performance |
| in the case of specific protocol needs). DER likewise is a reduced |
| set of BER with restrictions on the encoding and range of values |
| that can be interpreted from primitive values. If the plugability |
| is there the runtime is a flexible TLV Tuple codec that can change |
| the rules use to handle the substrate. |
| </p> |
| |
| <p> |
| We could easily have BerDecoder, CerDecoder and even protocol specific |
| decoders with those BER rules used by a protocol such as |
| LdapBerDecoder for those BER decoding rules that only apply to LDAP. |
| </p> |
| </subsection> |
| |
| </section> |
| </body> |
| </document> |