design/crimson.html - xerces2-j - Git at Google

 <!-- $Id$ -->
 <html>
  <head>
   <title>Xerces 2 | Crimson</title>
   <link rel='stylesheet' type='text/css' href='css/site.css'>
  </head>
  <body>
   <span class='netscape'>
   <a name='TOP'></a>
   <h1>Evaluation of Crimson Code</h1>
   <a name='Overview'></a>
   <h2>Overview</h2>
   <p>
    The Crimson code donated to the <a href='http://xml.apache.org/'>XML
    Apache Project</a> by <a href='http://www.sun.com/'>Sun Microsystems</a>
    is a relatively clean and straightforward implementation of a
    conforming <a href='http://www.w3.org/XML/'>XML</a> parser. However,
    there are some serious drawbacks to its design that hamper its use
    in the Xerces2 effort. This page will highlight some of the problems
    that I see with the Crimson code. This doesn't mean, however, that
    there aren't good ideas in Crimson! I'll highlight some of the things
    that I like about Crimson as well.
   </p>
   <a name='TheGood'></a>
   <h2>The Good</h2>
   <p>
    <table border='0'>
     <tr>
      <th>Size:</th>
      <td>Crimson has a small code footprint.</td>
     </tr>
     <tr>
      <th>Simplicity:</th>
      <td>
       The code is very straightforward and easy to grok. I especially
       like the simple approach to reading the input streams. The advanced
       reader code in Xerces has been a continual source of bugs and
       developer confusion. See my <a href='xerces.html'>evaluation of
       Xerces</a> for more detail.
      </td>
     </tr>
    </table>
   </p>
   <a name='TheBad'></a>
   <h2>The Bad</h2>
   <p>
    <table border='0'>
     <tr>
      <th>Standards:</th>
      <td>
       Crimson is lacking implementation of important standards. Some
       examples are DOM Level 2 and XML Schema.
      </td>
     </tr>
     <tr>
      <th>Modularity:</th>
      <td>
       The design of Crimson is not modular enough to be of general
       use in a wide variety of applications. For example, the document
       and DTD scanning code is hard-coded into the parser. Also, a lot
       of the classes used by the parser rely on package visibility of
       members. (Yuck!)
      </td>
     </tr>
     <tr>
      <th>Validation:</th>
      <td>
       The validation engine is rather simplistic and not very fast.
       Plus, it doesn't seem to be able to handle the advanced
       validation requirements of XML Schema.
      </td>
     </tr>
     <tr>
      <th>Performance:</th>
      <td>
       The general performance of the Crimson code is good but there
       are some areas where it can (and should) be tuned for performance.
       First, validation is not as fast as it could be but there are
       comments in the code that suggest "compiling" the model into a
       DFA for faster validation. Also, the DOM implementation wastes a
       lot memory when traversing the document.
      </td>
     </tr>
    </table>
   </p>
   </span>
   <a name='BOTTOM'></a>
   <hr>
   <span class='netscape'>
    Author: Andy Clark <br>
    Last modified: $Date$
   </span>
  </body>
 </html>
	<!-- $Id$ -->
	<html>
	<head>
	<title>Xerces 2 \| Crimson</title>
	<link rel='stylesheet' type='text/css' href='css/site.css'>
	</head>
	<body>
	<span class='netscape'>
	<a name='TOP'></a>
	<h1>Evaluation of Crimson Code</h1>
	<a name='Overview'></a>
	<h2>Overview</h2>
	<p>
	The Crimson code donated to the <a href='http://xml.apache.org/'>XML
	Apache Project</a> by <a href='http://www.sun.com/'>Sun Microsystems</a>
	is a relatively clean and straightforward implementation of a
	conforming <a href='http://www.w3.org/XML/'>XML</a> parser. However,
	there are some serious drawbacks to its design that hamper its use
	in the Xerces2 effort. This page will highlight some of the problems
	that I see with the Crimson code. This doesn't mean, however, that
	there aren't good ideas in Crimson! I'll highlight some of the things
	that I like about Crimson as well.
	</p>
	<a name='TheGood'></a>
	<h2>The Good</h2>
	<p>
	<table border='0'>
	<tr>
	<th>Size:</th>
	<td>Crimson has a small code footprint.</td>
	</tr>
	<tr>
	<th>Simplicity:</th>
	<td>
	The code is very straightforward and easy to grok. I especially
	like the simple approach to reading the input streams. The advanced
	reader code in Xerces has been a continual source of bugs and
	developer confusion. See my <a href='xerces.html'>evaluation of
	Xerces</a> for more detail.
	</td>
	</tr>
	</table>
	</p>
	<a name='TheBad'></a>
	<h2>The Bad</h2>
	<p>
	<table border='0'>
	<tr>
	<th>Standards:</th>
	<td>
	Crimson is lacking implementation of important standards. Some
	examples are DOM Level 2 and XML Schema.
	</td>
	</tr>
	<tr>
	<th>Modularity:</th>
	<td>
	The design of Crimson is not modular enough to be of general
	use in a wide variety of applications. For example, the document
	and DTD scanning code is hard-coded into the parser. Also, a lot
	of the classes used by the parser rely on package visibility of
	members. (Yuck!)
	</td>
	</tr>
	<tr>
	<th>Validation:</th>
	<td>
	The validation engine is rather simplistic and not very fast.
	Plus, it doesn't seem to be able to handle the advanced
	validation requirements of XML Schema.
	</td>
	</tr>
	<tr>
	<th>Performance:</th>
	<td>
	The general performance of the Crimson code is good but there
	are some areas where it can (and should) be tuned for performance.
	First, validation is not as fast as it could be but there are
	comments in the code that suggest "compiling" the model into a
	DFA for faster validation. Also, the DOM implementation wastes a
	lot memory when traversing the document.
	</td>
	</tr>
	</table>
	</p>
	</span>
	<a name='BOTTOM'></a>
	<hr>
	<span class='netscape'>
	Author: Andy Clark <br>
	Last modified: $Date$
	</span>
	</body>
	</html>