design/xerces.html - xerces2-j - Git at Google

 <!-- $Id$ -->
 <html>
  <head>
   <title>Xerces 2 | Xerces</title>
   <link rel='stylesheet' type='text/css' href='css/site.css'>
  </head>
  <body>
   <span class='netscape'>
   <a name='TOP'></a>
   <h1>Evaluation of Xerces Code</h1>
   <a name='Overview'></a>
   <h2>Overview</h2>
   <p>
    Historically, the <a href='http://xml.apache.org/xerces-j/'>Xerces</a>
    code was developed to be the fastest
    <a href='http://www.w3.org/XML/'>XML</a> parser on the planet.
    This impacted design decisions and caused the parser to be
    written from the inside out. While this produced an extremely
    fast XML parser with DTD validation, the overall design
    suffered. Xerces developers have found it difficult to understand,
    fix bugs, and add new features. Hence the Xerces 2 effort.
   </p>
   <a name='TheGood'></a>
   <h2>The Good</h2>
   <p>
    <table border='0'>
     <tr>
      <th>Standards:</th>
      <td>
       The Xerces parser is an extremely complete XML parser. Besides
       conforming to the XML and Namespace specifications, it offers
       support for SAX 1 and 2; DOM Level 1 and 2; and most of
       the latest working draft of XML Schema.
      </td>
     </tr>
     <tr>
      <th>Modularity:</th>
      <td>
       The current Xerces code made a decent attempt at modularity by
       defining a set of interfaces between components of the parser
       such as the scanner and validator. The parser is designed as
       a pipeline of components. This is a good idea whose implementation
       got complicated by performance considerations and feature creep.
      </td>
     </tr>
     <tr>
      <th>Validation:</th>
      <td>
       Xerces is able to validate documents with grammars specified
       in DTD and XML Schema syntax. All validation is performed by
       a universal validator that can validate the union of features
       found in both syntaxes. This enables the parser to handle
       current and future grammars in a consistent way.
      </td>
     </tr>
     <tr>
      <th>Performance:</th>
      <td>
       The Xerces parser has always performed well. Implementation of
       XML Schema has caused the performance to slip but this is to
       be expected -- you can't do a lot more work per element without
       incurring a performance penalty.
      </td>
     </tr>
    </table>
   </p>
   <a name='TheBad'></a>
   <h2>The Bad</h2>
   <p>
    <table border='0'>
     <tr>
      <th>Size:</th>
      <td>
       The parser is too big but this is not all due to the code
       required to parse XML files. A lot of contributed features
       have been rolled into the Xerces jar file. For example:
       HTML and WML DOM implementations; document serializers; etc.
       It would be nice to find a way to package the features into
       separate distributable jars.
      </td>
     </tr>
     <tr>
      <th>Simplicity:</th>
      <td>
       The code needs to be simplified. A lot of complexity of the
       Xerces parser can be found in the entity readers and the use
       of the string pool throughout the system.
      </td>
     </tr>
     <tr>
      <th>Documentation:</th>
      <td>
       This is little to no documentation of the Xerces code. And
       frequently the javadoc comments are missing or incorrect.
       More effort must be taken in Xerces 2 in order to make sure
       that everything is well documented.
      </td>
     </tr>
    </table>
   </p>
   </span>
   <a name='BOTTOM'></a>
   <hr>
   <span class='netscape'>
    Last modified: $Date$
   </span>
  </body>
 </html>
	<!-- $Id$ -->
	<html>
	<head>
	<title>Xerces 2 \| Xerces</title>
	<link rel='stylesheet' type='text/css' href='css/site.css'>
	</head>
	<body>
	<span class='netscape'>
	<a name='TOP'></a>
	<h1>Evaluation of Xerces Code</h1>
	<a name='Overview'></a>
	<h2>Overview</h2>
	<p>
	Historically, the <a href='http://xml.apache.org/xerces-j/'>Xerces</a>
	code was developed to be the fastest
	<a href='http://www.w3.org/XML/'>XML</a> parser on the planet.
	This impacted design decisions and caused the parser to be
	written from the inside out. While this produced an extremely
	fast XML parser with DTD validation, the overall design
	suffered. Xerces developers have found it difficult to understand,
	fix bugs, and add new features. Hence the Xerces 2 effort.
	</p>
	<a name='TheGood'></a>
	<h2>The Good</h2>
	<p>
	<table border='0'>
	<tr>
	<th>Standards:</th>
	<td>
	The Xerces parser is an extremely complete XML parser. Besides
	conforming to the XML and Namespace specifications, it offers
	support for SAX 1 and 2; DOM Level 1 and 2; and most of
	the latest working draft of XML Schema.
	</td>
	</tr>
	<tr>
	<th>Modularity:</th>
	<td>
	The current Xerces code made a decent attempt at modularity by
	defining a set of interfaces between components of the parser
	such as the scanner and validator. The parser is designed as
	a pipeline of components. This is a good idea whose implementation
	got complicated by performance considerations and feature creep.
	</td>
	</tr>
	<tr>
	<th>Validation:</th>
	<td>
	Xerces is able to validate documents with grammars specified
	in DTD and XML Schema syntax. All validation is performed by
	a universal validator that can validate the union of features
	found in both syntaxes. This enables the parser to handle
	current and future grammars in a consistent way.
	</td>
	</tr>
	<tr>
	<th>Performance:</th>
	<td>
	The Xerces parser has always performed well. Implementation of
	XML Schema has caused the performance to slip but this is to
	be expected -- you can't do a lot more work per element without
	incurring a performance penalty.
	</td>
	</tr>
	</table>
	</p>
	<a name='TheBad'></a>
	<h2>The Bad</h2>
	<p>
	<table border='0'>
	<tr>
	<th>Size:</th>
	<td>
	The parser is too big but this is not all due to the code
	required to parse XML files. A lot of contributed features
	have been rolled into the Xerces jar file. For example:
	HTML and WML DOM implementations; document serializers; etc.
	It would be nice to find a way to package the features into
	separate distributable jars.
	</td>
	</tr>
	<tr>
	<th>Simplicity:</th>
	<td>
	The code needs to be simplified. A lot of complexity of the
	Xerces parser can be found in the entity readers and the use
	of the string pool throughout the system.
	</td>
	</tr>
	<tr>
	<th>Documentation:</th>
	<td>
	This is little to no documentation of the Xerces code. And
	frequently the javadoc comments are missing or incorrect.
	More effort must be taken in Xerces 2 in order to make sure
	that everything is well documented.
	</td>
	</tr>
	</table>
	</p>
	</span>
	<a name='BOTTOM'></a>
	<hr>
	<span class='netscape'>
	Last modified: $Date$
	</span>
	</body>
	</html>