asn1/src/site/xdoc/refactor.xml - directory-ldap-api - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
 -->
 <document>
   <properties>
     <author email="akarasulu@apache.org">Alex Karasulu</author>
     <title>Refactoring the ASN.1 Runtime</title>
   </properties>

   <body>
     <section name="Refactoring the ASN.1 Runtime">
       <p>
         The use of Snacc4J as the runtime ASN.1 BER codec for LDAP impossed an
         IP issue for the new Directory Project under incubation.  This resulted
         in the creation of our own implementation, and hence the Apache ASN.1
         Runtime library was created.
       </p>

       <p>
         Before continuing any further it might be a good idea to read about
         the existing architecture to understand the changes that are being
         proposed.
       </p>

       <subsection name="High Level Goals and Changes">
       <p>
         The internal 0.2 release was the first successful attempt to produce a
         replacement for Snacc4J.  As of release 0.8 of ApacheDS it provides
         BER encoders and decoders for LDAP requests and responses.  The library
         was designed with performance in mind.  Some very good ideas were
         introduced and really put to the test.  However the library does have
         performance problems.  The designs to make this into a high performance
         library were not totally followed through.  Furthermore the code base
         is very difficult to maintain needing some reorganization.  We hope to
         refactor the library so it is more efficient, and easier to maintain
         while reducing the number of dependencies it has.  In the process we
         would like to introduce some new features and improvements which are
         listed below:
       </p>

       <ul>
         <li>
 	  Better ByteBuffer utilization by splicing buffers instead of copying
           them.
         </li>

         <li>
 	  Repace current Tuple class with well defined Tuple interfaces:
           specifically we need to remove TLV field processing from a Tuple
           as well as tag cooking functionality.  Tag cooking refers to the
           application of transformations that turn tag bytes into a 4 byte
           Java primitive integers.  These functions need to be localized
           within utility classes.
         </li>

         <li>
 	  Some BER based protocols only use a subset of the encoding rules.
           For example LDAP only uses definite length encodings for constructed
           tuples.  A reduced set of rules are much easier to code, maintain,
           and often will perform significantly better than codecs designed for
           the entire rule set.  The key here however is to make sure that
           the core of the codec can be replaced transparently without imposing
           code changes.
         </li>

         <li>
 	  The Tuples of primitives like binary values store the Tag, Length
           and Value of the primitive TLV Tuple in memory.  Sometimes primitive
           values can be dangerously large for a server to encode or decode.
           Primitive tuples could be blobs of large binaries like images.  If
           tuple values are larger than some application defined limit they
           aught to be streamed to disk rather than kept in main memory.
           Streaming to disk makes the server more efficient overall since it
           can maintain a constant sized decoding footprint.  However switching
           to disk based storage will rightfully slow down the current operation
           which involves a large primitive.  This is a tradeoff that should
           be configurable by API users and ultimately ApacheDS administrators.
         </li>

         <li>
 	  Better logging and error handling for codecs with pershaps some
           management interfaces to control the properties of codecs.
         </li>

         <li>
 	  A single deployable artifact where the ber and codec jars are fused.
         </li>

         <li>
 	  Make the code easier to maintain while improving its structure.
         </li>
         </ul>
       </subsection>


     </section>

     <section name="Tuple Interface/Class Hierarchies">
       <p>
         Presently Tuples contain the functionality to decode and encode
         fields.  Tuples can even encode themselves to a buffer as BER or
         DER.  A Tuple is not a simple bean and that's all that it should be.
         Hence one of our goals is to factor out this additional functionality.
       </p>

       <p>
         A Tuple is a single class that acts more like a union of different
         types rather than using inheritance to differentiate.  There are
         distinct types of tuples, constructed verses primitive for example.
         Instead of using complex logic to differentiate what kind of Tuple an
         instance is it is much better to differentiate the Tuple into
         subclasses.  Hence we propose a new interface and implementation
         hierarchy for Tuples.
       </p>

       <p>
         Let's start by proposing a minimal Tuple interface.
       </p>

 <source>
 interface Tuple
 {
     /**
      * Gets the zero based index into a PDU where the first byte of this
      * Tuple's tag resides.
      *
      * @return zero based index of Tag's first byte in the PDU
      */
     int getTagStartIndex();

     /**
      * Gets this TLV Tuple's Tag (T) as a type safe enumeration.
      *
      * @return type safe enumeration for the Tag
      */
     TagEnum getTag();

     /**
      * Gets whether or not this Tuple is constructed.
      *
      * @return true if the Tag is constructed false if it is primitive.
      */
     boolean isConstructed();
 }
 </source>

       <p>
         These interfaces give the minimum information needed for a Tuple
         that is not specific to another specialized type of Tuple.  Meaning
         all Tuples share these methods.  We can also go a step further and
         implement an AbstractTuple where protected members are used to
         implement these methods.  Note that isConstructed() will probably be
         left abstract so subclasses can just return true or false.  For
         brevity this code is not shown but other classes in the section below
         will extend from AbstractTuple.
       </p>

       <subsection name="Primitive Vs. Constructed Tuples">
         <p>
           We need to go a step further and start differentiating between Tuples
           that are primitive and those that are constructed.  In this step we
           introduce two new abstract classes PrimitiveTuple and
           ConstructedTuple.
         </p>

         <p>
           These two classes will be described below but one might ask why both
           are still abstract.  This is because we need to differentiate further
           for buffered verses streamed Tuples in the case of primitive Tuples.
           For constructed Tuples we need to differentiate between definate
           length verses indefinite length Tuples.  With our approach, only the
           leaf nodes of the inheritance hierarchy will be concrete.  Below is
           the definition for the PrimitiveTuple.
         </p>

 <source>
 public abstract class PrimitiveTuple extends AbstractTuple
 {
     /** the number of bytes used to compose the Tuple's length field */
     protected int lengthFieldSz = 0;
     /** the number of bytes used to compose the Tuple's value field */
     protected int valueFieldSz = 0;

     ...

     public final boolean isConstructed()
     {
         return false;
     }

     /**
      * Gets whether or not this Tuple's value is buffered in memory or
      * streamed to disk.
      *
      * @return true if the value is buffered in memory, false if it is streamed
      * to disk
      */
     public abstract boolean isBuffered();

     /**
      * Gets the number of bytes in the length (L) field of this TLV Tuple.
      *
      * @return number of bytes for the length
      */
     public final int getLengthFieldSize()
     {
         return lengthFieldSz;
     }

     /**
      * Gets the number of bytes in the value (V) field of this TLV Tuple.
      *
      * @return number of bytes for the value
      */
     public final int getValueFieldSize();
     {
         return valueFieldSz;
     }

     ...
 }
 </source>
         <p>
           This abstract class adds two new concrete methods for tracking the
           size of the length and value fields.  Constructed Tuples may not
           necessarily have a length value associated with them if they are
           of the indeterminate form.  Furthermore the value of constructed
           Tuples are the nested child Tuples subordinate to them.  So there
           is no need to track the value prematurely now for anything other
           than primitive Tuples.
         </p>

         <p>
           Note that the isBuffered() method is implemented as final and always
           returns false for this lineage of Tuples.  A final modifier on the
           method makes sense and sometimes helps the compiler inline this
           method so we don't always pay a price for using it in addition to
           subclassing.  A new abstract method isBuffered() is introduced which
           is discussed in detail within the Buffered Vs. Streamed section.
         </p>

         <p>
           Now let's take a look at the ConstructedTuple abstract class.
         </p>

 <source>
 public abstract class ConstructedTuple extends AbstractTuple
 {
     public final boolean isConstructed()
     {
         return true;
     }

     /**
      * Gets whether or not the length of this constructed Tuple is of the
      * definate form or of the indefinite length form.
      *
      * @return true if the length is definate, false if the length is of the
      * indefinite form
      */
     public abstract boolean isLengthDefinate();
 }
 </source>

         <p>
           ConstructedTuple implements the <code>isConstructed()</code> method
           as final since it will always return false for this lineage of
           Tuples.  Also a new abstract method isLengthDefinate() is introduced
           to see if the Tuple uses the indefinite length form or not.
         </p>
       </subsection>

       <subsection name="Definate Vs. Indefinite Length">
         <p>
           The ConstructedTuple can be further differentiated into two
           subclasses to represent definate and indefinite length constructed
           TLV Tuples.  The indefinite form does not have a length value
           associated with it where as the definate lenght form does.  Let's
           explore the concrete IndefiniteLegthTuple definition.
         </p>

 <source>
 public class IndefiniteLength extends ConstructedTuple
 {
     public final boolean isLengthDefinate()
     {
         return false;
     }
 }
 </source>

         <p>
           Yep this is pretty simple.  There is very little to track for this
           Tuple since most of the tracking is handled by its decendent Tuples.
           The class also is concrete.  What about the DefinateLength
           implementation ...
         </p>

 <source>
 public class DefinateLength extends ConstructedTuple
 {
     /** the number of bytes used to compose the Tuple's length field */
     protected int lengthFieldSz = 0;
     /** the number of bytes used to compose the Tuple's value field */
     protected int valueFieldSz = 0;

     ...

     public final boolean isLengthDefinate()
     {
         return true;
     }

     /**
      * Gets the number of bytes in the length (L) field of this TLV Tuple.
      *
      * @return number of bytes for the length
      */
     public final int getLengthFieldSize()
     {
         return lengthFieldSz;
     }

     /**
      * Gets the number of bytes in the value (V) field of this TLV Tuple.
      *
      * @return number of bytes for the value
      */
     public final int getValueFieldSize();
     {
         return valueFieldSz;
     }
 }
 </source>
         <p>
           Now this introduces two new concrete methods for getting the length
           of the length field and the length of the value field.  A determinate
           length TLV has a valid value within the Length (L) field.  The value
           of the length field is the length of the value field.  Hence the
           reason why we include both these concrete methods.
         </p>
       </subsection>

       <subsection name="Buffered Vs. Streamed PrimitiveTuples">
         <p>
           As we mentioned before, there are two kinds of primitive Tuples.
           Those that keep there value in a buffer within the TLV Tuple object,
           in which case it is buffered within memory, and those that stream
           the value to disk and store a referral to the value on disk.  These
           two beasts are so different it makes sense to differentiate between
           them using subclasses.  Let's take a look at a BufferedTuple which
           is the simplest one.
         </p>

 <source>
 public class BufferedTuple extends PrimitiveTuple
 {
     /** contains ByteBuffers which contain parts of the value */
     private final ArrayList value = new ArrayList();
     /** pre-fab final unmodifiable wrapper around our modifiable list */
     private final List unmodifiable = Collections.unmodifiableList( value );

     public final boolean isBuffered()
     {
         return true;
     }

     /**
      * Gets the value of this Tuple as a List of ByteBuffers.
      *
      * @return a list of ByteBuffers containing parts of the value
      */
     public final List getValue()
     {
         return unmodifiable;
     }
 }
 </source>

         <p>
           The implementation introduces a final <code>getValue()</code> method
           which returns an unmodifiable wrapper around a modifiable list of
           ByteBuffers.  The <code>isBuffered()</code> method is made final and
           implemented to return true all the time.  This is easy so let's now
           take a look at the StreamedTuple implementation.
         </p>

 <source>
 public abstract class StreamedTuple extends PrimitiveTuple
 {
     public final boolean isBuffered()
     {
         return false;
     }

     // might experiment with a getURL to represent the source of
     // the data stream - we need to discuss this on the list

     /**
      * Depending on the backing store used for accessing streamed data there
      * may need to be multiple subclasses that implement this method.
      *
      * @return an InputStream that can be used to read this Tuple's streamed
      * value data
      */
     public abstract InputStream getValueStream();

     // another question is whether or not to offer a readable Channel instead
     // of an InputStream?  This is another topic for discussion.
 }
 </source>

         <p>
           At this point we know that there could be multiple ways to implement
           this kind of StreamedTuple.  Notice though the value is accessed
           through a stream provided by the Tuple.  This way the large value
           stored on disk need not all be kept in memory at one time during the
           decode or encode process.
         </p>

       </subsection>

       <p>
         Some code will be removed from the Tuple class today during the
         refactoring and kept in a TupleUtils class.  Functionality like
         the encoding, decoding of Tuple fields and tag cooking can be
         offloaded to this class.
       </p>
     </section>

     <section name="notes">
       <p>
         By far the largest part of the refactoring effort is in introducing
         this new hierarchy and introducing some patterns that improve the
         maintainability of the code like the State pattern.  Other minor
         details for this dev cycle are discussed below.
       </p>

       <subsection name="Termination Tuples">
         <p>
           A lot of effort is made to track the position of a Tuple within a
           PDU.  This is why we have methods like getTagStartIndex().  We want
           to know where the first byte of a Tuple's tag is within a PDU.  This
           positional accounting enables better error reporting when problems
           result.  They also allow us to detect when we start and stop
           processing a PDU.
         </p>

         <p>
           The minimum amount of information needed to track the position of a
           Tuple within a PDU or the start and stop points of a PDU is to have
           the Tuple's tag start index, and the lengths of fields within the
           Tuple.
         </p>

         <p>
           In a decoder for example we know that we've processed the last
           topmost Tuple of a PDU when we get a Tuple whose <code>
           getTagStartIndex()</code> returns 0.  <b>WARNING</b>: AbstractTuple
           should default the value the start tag index to -1 so it cannot
           be interpretted as a terminator.
         </p>
       </subsection>

       <subsection name="New Coherent Replacement for Stateful Codec API">
         <p>
           There have been many complaints about the codec API being too
           generic or the callback mechanism being somewhat unintuitive.
           Perhaps we can work on more specific interfaces which incorporate
           the concepts of producer and consumer.  Plus let's see if we can
           make these interfaces specific so we don't have ugly codes and casts
           all over the place.
         </p>

         <p>
           Also in the end we want to do away with this codec API which was
           originally intended to fuse back into commons.  I've abandoned this
           idea because it is too difficult to make all parties happy.  The
           best thing to do is create our own interface that fit well and
           enable them to be wrapped for other APIs.  Hence going towards custom
           codec API's is not an issue.  The old codec stuff can be pushed into
           the protocol framework API.
         </p>

         <p>
           Furthermore at the end of the day we want there to be a single runtime
           jar without any dependencies for the ASN.1 stuff.  That means no more
           codec API as it is with jar today within the ASN.1 project.
         </p>

         <p>
           Some new producer consumer interface ideas are listed below:
         </p>

         <ul>
           <li>
             BufferConsumer: consumes ByteBuffers. Something like <code>void
             consume(ByteBuffer bb)</code> comes to mind.  Perhaps even with
             overloads to take a list or array of BBs.
           </li>

           <li>
             TupleProducer: generates Tuples (often is a BufferConsumer).  Some
             thing like <code>void setConsumer(TupleConsumer consumer)</code>
             comes to mind.
           </li>

           <li>
             TupleConsumer: consumes Tuples generated by a TupleProducer.
             Something like <code>void consume(Tuple tlv)</code> comes to mind.
           </li>

           <li>
             MessageProducer: produces populated message stubs
           </li>
         </ul>
       </subsection>

       <subsection name="Possibly Merging TupleNode and Tuple">
         <p>
           Right now to build Tuple trees we use yet another class to wrap
           Tuples called TupleNodes.  This kept the contents of the Tuple
           class less conjested.  The Tuple class will no longer exist and the
           conjestion issues is no longer valid.  The question now is, is it
           worth keeping parent child methods in TupleNode when creating trees
           while paying for extra object creation?
         </p>

         <p>
           Note that the TupleNode methods are not required on Tuple to process
           a byte stream of encoded TLV data in a sax-like fashion.  These
           methods are only required for higher level operations like visitations
           from visitors during the encoding process.  The question really is
           whether we will make Tuple impure to save a little time so we don't
           have to create TupleNode objects to wrap Tuples and model the
           hierarchy?  This is something that needs to be discussed.
         </p>

         <p>
           Contrary to the purist approach of keeping Tuple and TupleNode
           separate one can merge the two.  A codec need not honor these methods
           by building the tree.  Meaning these tree node (TupleNode) methods
           may simply return null.  If these methods are honored then it is the
           intent of the codec to build a tree.  If the tree is built the
           processing is more like DOM and if not then it is more like SAX.  We
           should not tax the DOM like processing use case by forcing the need
           to create extra wrappers, while accomodating the purist view.
         </p>
       </subsection>

       <subsection name="Removing the Digester Concept">
         <p>
           I don't know what I was thinking when I devised this rule based
           approach similar to the Digester in commons.  This was a big mistake
           and IMO one of the reasons why we have performance issues.  This
           datastructure can be removed entirely from upper layers that depend
           on it.
         </p>

         <p>
           Granted this means we are going to have to weave once again our own
           classes for handling LDAP specific PDU's however I think this will be
           easy to do.  I will essentially rewrite the LDAP provider based on
           our runtime to hardcode the switching rather than using this rule
           based triggering approach.  The new approach is also going to
           simplify the code significantly making it more maintainable.
           Hopefully these changes will also speed up the code since less
           objects will need to be created every time a decoder is instantiated.
         </p>
       </subsection>

       <subsection name="It's Time For DER and CER">
         <p>
           We need to find a way to make the rules used while decoding and
           encoding Tuples plugable.  This way we can change the rules to
           encode as generic BER, reduced BER (for increases in performance
           in the case of specific protocol needs).  DER likewise is a reduced
           set of BER with restrictions on the encoding and range of values
           that can be interpreted from primitive values.  If the plugability
           is there the runtime is a flexible TLV Tuple codec that can change
           the rules use to handle the substrate.
         </p>

         <p>
           We could easily have BerDecoder, CerDecoder and even protocol specific
           decoders with those BER rules used by a protocol such as
           LdapBerDecoder for those BER decoding rules that only apply to LDAP.
         </p>
       </subsection>

     </section>
   </body>
 </document>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->
	<document>
	<properties>
	<author email="akarasulu@apache.org">Alex Karasulu</author>
	<title>Refactoring the ASN.1 Runtime</title>
	</properties>

	<body>
	<section name="Refactoring the ASN.1 Runtime">
	<p>
	The use of Snacc4J as the runtime ASN.1 BER codec for LDAP impossed an
	IP issue for the new Directory Project under incubation. This resulted
	in the creation of our own implementation, and hence the Apache ASN.1
	Runtime library was created.
	</p>

	<p>
	Before continuing any further it might be a good idea to read about
	the existing architecture to understand the changes that are being
	proposed.
	</p>

	<subsection name="High Level Goals and Changes">
	<p>
	The internal 0.2 release was the first successful attempt to produce a
	replacement for Snacc4J. As of release 0.8 of ApacheDS it provides
	BER encoders and decoders for LDAP requests and responses. The library
	was designed with performance in mind. Some very good ideas were
	introduced and really put to the test. However the library does have
	performance problems. The designs to make this into a high performance
	library were not totally followed through. Furthermore the code base
	is very difficult to maintain needing some reorganization. We hope to
	refactor the library so it is more efficient, and easier to maintain
	while reducing the number of dependencies it has. In the process we
	would like to introduce some new features and improvements which are
	listed below:
	</p>

	<ul>
	<li>
	Better ByteBuffer utilization by splicing buffers instead of copying
	them.
	</li>

	<li>
	Repace current Tuple class with well defined Tuple interfaces:
	specifically we need to remove TLV field processing from a Tuple
	as well as tag cooking functionality. Tag cooking refers to the
	application of transformations that turn tag bytes into a 4 byte
	Java primitive integers. These functions need to be localized
	within utility classes.
	</li>

	<li>
	Some BER based protocols only use a subset of the encoding rules.
	For example LDAP only uses definite length encodings for constructed
	tuples. A reduced set of rules are much easier to code, maintain,
	and often will perform significantly better than codecs designed for
	the entire rule set. The key here however is to make sure that
	the core of the codec can be replaced transparently without imposing
	code changes.
	</li>

	<li>
	The Tuples of primitives like binary values store the Tag, Length
	and Value of the primitive TLV Tuple in memory. Sometimes primitive
	values can be dangerously large for a server to encode or decode.
	Primitive tuples could be blobs of large binaries like images. If
	tuple values are larger than some application defined limit they
	aught to be streamed to disk rather than kept in main memory.
	Streaming to disk makes the server more efficient overall since it
	can maintain a constant sized decoding footprint. However switching
	to disk based storage will rightfully slow down the current operation
	which involves a large primitive. This is a tradeoff that should
	be configurable by API users and ultimately ApacheDS administrators.
	</li>

	<li>
	Better logging and error handling for codecs with pershaps some
	management interfaces to control the properties of codecs.
	</li>

	<li>
	A single deployable artifact where the ber and codec jars are fused.
	</li>

	<li>
	Make the code easier to maintain while improving its structure.
	</li>
	</ul>
	</subsection>


	</section>

	<section name="Tuple Interface/Class Hierarchies">
	<p>
	Presently Tuples contain the functionality to decode and encode
	fields. Tuples can even encode themselves to a buffer as BER or
	DER. A Tuple is not a simple bean and that's all that it should be.
	Hence one of our goals is to factor out this additional functionality.
	</p>

	<p>
	A Tuple is a single class that acts more like a union of different
	types rather than using inheritance to differentiate. There are
	distinct types of tuples, constructed verses primitive for example.
	Instead of using complex logic to differentiate what kind of Tuple an
	instance is it is much better to differentiate the Tuple into
	subclasses. Hence we propose a new interface and implementation
	hierarchy for Tuples.
	</p>

	<p>
	Let's start by proposing a minimal Tuple interface.
	</p>

	<source>
	interface Tuple
	{
	/**
	* Gets the zero based index into a PDU where the first byte of this
	* Tuple's tag resides.
	*
	* @return zero based index of Tag's first byte in the PDU
	*/
	int getTagStartIndex();

	/**
	* Gets this TLV Tuple's Tag (T) as a type safe enumeration.
	*
	* @return type safe enumeration for the Tag
	*/
	TagEnum getTag();

	/**
	* Gets whether or not this Tuple is constructed.
	*
	* @return true if the Tag is constructed false if it is primitive.
	*/
	boolean isConstructed();
	}
	</source>

	<p>
	These interfaces give the minimum information needed for a Tuple
	that is not specific to another specialized type of Tuple. Meaning
	all Tuples share these methods. We can also go a step further and
	implement an AbstractTuple where protected members are used to
	implement these methods. Note that isConstructed() will probably be
	left abstract so subclasses can just return true or false. For
	brevity this code is not shown but other classes in the section below
	will extend from AbstractTuple.
	</p>

	<subsection name="Primitive Vs. Constructed Tuples">
	<p>
	We need to go a step further and start differentiating between Tuples
	that are primitive and those that are constructed. In this step we
	introduce two new abstract classes PrimitiveTuple and
	ConstructedTuple.
	</p>

	<p>
	These two classes will be described below but one might ask why both
	are still abstract. This is because we need to differentiate further
	for buffered verses streamed Tuples in the case of primitive Tuples.
	For constructed Tuples we need to differentiate between definate
	length verses indefinite length Tuples. With our approach, only the
	leaf nodes of the inheritance hierarchy will be concrete. Below is
	the definition for the PrimitiveTuple.
	</p>

	<source>
	public abstract class PrimitiveTuple extends AbstractTuple
	{
	/** the number of bytes used to compose the Tuple's length field */
	protected int lengthFieldSz = 0;
	/** the number of bytes used to compose the Tuple's value field */
	protected int valueFieldSz = 0;

	...

	public final boolean isConstructed()
	{
	return false;
	}

	/**
	* Gets whether or not this Tuple's value is buffered in memory or
	* streamed to disk.
	*
	* @return true if the value is buffered in memory, false if it is streamed
	* to disk
	*/
	public abstract boolean isBuffered();

	/**
	* Gets the number of bytes in the length (L) field of this TLV Tuple.
	*
	* @return number of bytes for the length
	*/
	public final int getLengthFieldSize()
	{
	return lengthFieldSz;
	}

	/**
	* Gets the number of bytes in the value (V) field of this TLV Tuple.
	*
	* @return number of bytes for the value
	*/
	public final int getValueFieldSize();
	{
	return valueFieldSz;
	}

	...
	}
	</source>
	<p>
	This abstract class adds two new concrete methods for tracking the
	size of the length and value fields. Constructed Tuples may not
	necessarily have a length value associated with them if they are
	of the indeterminate form. Furthermore the value of constructed
	Tuples are the nested child Tuples subordinate to them. So there
	is no need to track the value prematurely now for anything other
	than primitive Tuples.
	</p>

	<p>
	Note that the isBuffered() method is implemented as final and always
	returns false for this lineage of Tuples. A final modifier on the
	method makes sense and sometimes helps the compiler inline this
	method so we don't always pay a price for using it in addition to
	subclassing. A new abstract method isBuffered() is introduced which
	is discussed in detail within the Buffered Vs. Streamed section.
	</p>

	<p>
	Now let's take a look at the ConstructedTuple abstract class.
	</p>

	<source>
	public abstract class ConstructedTuple extends AbstractTuple
	{
	public final boolean isConstructed()
	{
	return true;
	}

	/**
	* Gets whether or not the length of this constructed Tuple is of the
	* definate form or of the indefinite length form.
	*
	* @return true if the length is definate, false if the length is of the
	* indefinite form
	*/
	public abstract boolean isLengthDefinate();
	}
	</source>

	<p>
	ConstructedTuple implements the <code>isConstructed()</code> method
	as final since it will always return false for this lineage of
	Tuples. Also a new abstract method isLengthDefinate() is introduced
	to see if the Tuple uses the indefinite length form or not.
	</p>
	</subsection>

	<subsection name="Definate Vs. Indefinite Length">
	<p>
	The ConstructedTuple can be further differentiated into two
	subclasses to represent definate and indefinite length constructed
	TLV Tuples. The indefinite form does not have a length value
	associated with it where as the definate lenght form does. Let's
	explore the concrete IndefiniteLegthTuple definition.
	</p>

	<source>
	public class IndefiniteLength extends ConstructedTuple
	{
	public final boolean isLengthDefinate()
	{
	return false;
	}
	}
	</source>

	<p>
	Yep this is pretty simple. There is very little to track for this
	Tuple since most of the tracking is handled by its decendent Tuples.
	The class also is concrete. What about the DefinateLength
	implementation ...
	</p>

	<source>
	public class DefinateLength extends ConstructedTuple
	{
	/** the number of bytes used to compose the Tuple's length field */
	protected int lengthFieldSz = 0;
	/** the number of bytes used to compose the Tuple's value field */
	protected int valueFieldSz = 0;

	...

	public final boolean isLengthDefinate()
	{
	return true;
	}

	/**
	* Gets the number of bytes in the length (L) field of this TLV Tuple.
	*
	* @return number of bytes for the length
	*/
	public final int getLengthFieldSize()
	{
	return lengthFieldSz;
	}

	/**
	* Gets the number of bytes in the value (V) field of this TLV Tuple.
	*
	* @return number of bytes for the value
	*/
	public final int getValueFieldSize();
	{
	return valueFieldSz;
	}
	}
	</source>
	<p>
	Now this introduces two new concrete methods for getting the length
	of the length field and the length of the value field. A determinate
	length TLV has a valid value within the Length (L) field. The value
	of the length field is the length of the value field. Hence the
	reason why we include both these concrete methods.
	</p>
	</subsection>

	<subsection name="Buffered Vs. Streamed PrimitiveTuples">
	<p>
	As we mentioned before, there are two kinds of primitive Tuples.
	Those that keep there value in a buffer within the TLV Tuple object,
	in which case it is buffered within memory, and those that stream
	the value to disk and store a referral to the value on disk. These
	two beasts are so different it makes sense to differentiate between
	them using subclasses. Let's take a look at a BufferedTuple which
	is the simplest one.
	</p>

	<source>
	public class BufferedTuple extends PrimitiveTuple
	{
	/** contains ByteBuffers which contain parts of the value */
	private final ArrayList value = new ArrayList();
	/** pre-fab final unmodifiable wrapper around our modifiable list */
	private final List unmodifiable = Collections.unmodifiableList( value );

	public final boolean isBuffered()
	{
	return true;
	}

	/**
	* Gets the value of this Tuple as a List of ByteBuffers.
	*
	* @return a list of ByteBuffers containing parts of the value
	*/
	public final List getValue()
	{
	return unmodifiable;
	}
	}
	</source>

	<p>
	The implementation introduces a final <code>getValue()</code> method
	which returns an unmodifiable wrapper around a modifiable list of
	ByteBuffers. The <code>isBuffered()</code> method is made final and
	implemented to return true all the time. This is easy so let's now
	take a look at the StreamedTuple implementation.
	</p>

	<source>
	public abstract class StreamedTuple extends PrimitiveTuple
	{
	public final boolean isBuffered()
	{
	return false;
	}

	// might experiment with a getURL to represent the source of
	// the data stream - we need to discuss this on the list

	/**
	* Depending on the backing store used for accessing streamed data there
	* may need to be multiple subclasses that implement this method.
	*
	* @return an InputStream that can be used to read this Tuple's streamed
	* value data
	*/
	public abstract InputStream getValueStream();

	// another question is whether or not to offer a readable Channel instead
	// of an InputStream? This is another topic for discussion.
	}
	</source>

	<p>
	At this point we know that there could be multiple ways to implement
	this kind of StreamedTuple. Notice though the value is accessed
	through a stream provided by the Tuple. This way the large value
	stored on disk need not all be kept in memory at one time during the
	decode or encode process.
	</p>

	</subsection>

	<p>
	Some code will be removed from the Tuple class today during the
	refactoring and kept in a TupleUtils class. Functionality like
	the encoding, decoding of Tuple fields and tag cooking can be
	offloaded to this class.
	</p>
	</section>

	<section name="notes">
	<p>
	By far the largest part of the refactoring effort is in introducing
	this new hierarchy and introducing some patterns that improve the
	maintainability of the code like the State pattern. Other minor
	details for this dev cycle are discussed below.
	</p>

	<subsection name="Termination Tuples">
	<p>
	A lot of effort is made to track the position of a Tuple within a
	PDU. This is why we have methods like getTagStartIndex(). We want
	to know where the first byte of a Tuple's tag is within a PDU. This
	positional accounting enables better error reporting when problems
	result. They also allow us to detect when we start and stop
	processing a PDU.
	</p>

	<p>
	The minimum amount of information needed to track the position of a
	Tuple within a PDU or the start and stop points of a PDU is to have
	the Tuple's tag start index, and the lengths of fields within the
	Tuple.
	</p>

	<p>
	In a decoder for example we know that we've processed the last
	topmost Tuple of a PDU when we get a Tuple whose <code>
	getTagStartIndex()</code> returns 0. <b>WARNING</b>: AbstractTuple
	should default the value the start tag index to -1 so it cannot
	be interpretted as a terminator.
	</p>
	</subsection>

	<subsection name="New Coherent Replacement for Stateful Codec API">
	<p>
	There have been many complaints about the codec API being too
	generic or the callback mechanism being somewhat unintuitive.
	Perhaps we can work on more specific interfaces which incorporate
	the concepts of producer and consumer. Plus let's see if we can
	make these interfaces specific so we don't have ugly codes and casts
	all over the place.
	</p>

	<p>
	Also in the end we want to do away with this codec API which was
	originally intended to fuse back into commons. I've abandoned this
	idea because it is too difficult to make all parties happy. The
	best thing to do is create our own interface that fit well and
	enable them to be wrapped for other APIs. Hence going towards custom
	codec API's is not an issue. The old codec stuff can be pushed into
	the protocol framework API.
	</p>

	<p>
	Furthermore at the end of the day we want there to be a single runtime
	jar without any dependencies for the ASN.1 stuff. That means no more
	codec API as it is with jar today within the ASN.1 project.
	</p>

	<p>
	Some new producer consumer interface ideas are listed below:
	</p>

	<ul>
	<li>
	BufferConsumer: consumes ByteBuffers. Something like <code>void
	consume(ByteBuffer bb)</code> comes to mind. Perhaps even with
	overloads to take a list or array of BBs.
	</li>

	<li>
	TupleProducer: generates Tuples (often is a BufferConsumer). Some
	thing like <code>void setConsumer(TupleConsumer consumer)</code>
	comes to mind.
	</li>

	<li>
	TupleConsumer: consumes Tuples generated by a TupleProducer.
	Something like <code>void consume(Tuple tlv)</code> comes to mind.
	</li>

	<li>
	MessageProducer: produces populated message stubs
	</li>
	</ul>
	</subsection>

	<subsection name="Possibly Merging TupleNode and Tuple">
	<p>
	Right now to build Tuple trees we use yet another class to wrap
	Tuples called TupleNodes. This kept the contents of the Tuple
	class less conjested. The Tuple class will no longer exist and the
	conjestion issues is no longer valid. The question now is, is it
	worth keeping parent child methods in TupleNode when creating trees
	while paying for extra object creation?
	</p>

	<p>
	Note that the TupleNode methods are not required on Tuple to process
	a byte stream of encoded TLV data in a sax-like fashion. These
	methods are only required for higher level operations like visitations
	from visitors during the encoding process. The question really is
	whether we will make Tuple impure to save a little time so we don't
	have to create TupleNode objects to wrap Tuples and model the
	hierarchy? This is something that needs to be discussed.
	</p>

	<p>
	Contrary to the purist approach of keeping Tuple and TupleNode
	separate one can merge the two. A codec need not honor these methods
	by building the tree. Meaning these tree node (TupleNode) methods
	may simply return null. If these methods are honored then it is the
	intent of the codec to build a tree. If the tree is built the
	processing is more like DOM and if not then it is more like SAX. We
	should not tax the DOM like processing use case by forcing the need
	to create extra wrappers, while accomodating the purist view.
	</p>
	</subsection>

	<subsection name="Removing the Digester Concept">
	<p>
	I don't know what I was thinking when I devised this rule based
	approach similar to the Digester in commons. This was a big mistake
	and IMO one of the reasons why we have performance issues. This
	datastructure can be removed entirely from upper layers that depend
	on it.
	</p>

	<p>
	Granted this means we are going to have to weave once again our own
	classes for handling LDAP specific PDU's however I think this will be
	easy to do. I will essentially rewrite the LDAP provider based on
	our runtime to hardcode the switching rather than using this rule
	based triggering approach. The new approach is also going to
	simplify the code significantly making it more maintainable.
	Hopefully these changes will also speed up the code since less
	objects will need to be created every time a decoder is instantiated.
	</p>
	</subsection>

	<subsection name="It's Time For DER and CER">
	<p>
	We need to find a way to make the rules used while decoding and
	encoding Tuples plugable. This way we can change the rules to
	encode as generic BER, reduced BER (for increases in performance
	in the case of specific protocol needs). DER likewise is a reduced
	set of BER with restrictions on the encoding and range of values
	that can be interpreted from primitive values. If the plugability
	is there the runtime is a flexible TLV Tuple codec that can change
	the rules use to handle the substrate.
	</p>

	<p>
	We could easily have BerDecoder, CerDecoder and even protocol specific
	decoders with those BER rules used by a protocol such as
	LdapBerDecoder for those BER decoding rules that only apply to LDAP.
	</p>
	</subsection>

	</section>
	</body>
	</document>