design/architecture.html - xerces2-j - Git at Google

 <!-- $Id$ -->
 <html>
  <head>
   <title>Xerces 2 | Architecture</title>
   <link rel='stylesheet' type='text/css' href='css/site.css'>
   <link rel='stylesheet' type='text/css' href='css/diagram.css'>
   <style type='text/css'>
    .note { font-size: smaller }
    .pipeline { color: black; background: white;
                border-style: solid; border-color: black; border-width: 1;
                font-weight: normal  }
   </style>
  </head>
  <body>
   <span class='netscape'>
   <a name='TOP'></a>
   <h1>Xerces2 Architecture</h1>
   <h2>Table of Contents</h2>
   <p>
    <ul>
     <li><a href='#Overview'>Overview</a></li>
     <li><a href='#DocumentInformation'>Document Information</a></li>
     <li>
      <a href='#ParserConfiguration'>Parser Configuration</a>
      <ul>
       <li><a href='#Configuration.FeaturesAndProperties'>Features &amp; Properties</a></li>
       <li><a href='#Configuration.SettingsManagement'>Settings Management</a></li>
      </ul>
     </li>
    </ul>
   </p>
   <hr>
   <a name='Overview'></a>
   <h2>Overview</h2>
   <p>
    The Xerces Native Interface (XNI) is a framework for communicating
    a "streaming" document information set and constructing generic parser
    configurations. XNI is part of the Xerces2 development but it is
    important to note that the Xerces2 parser is just a standards compliant
    reference implementation of the Xerces Native Interface. Other parsers
    can be written that conform to XNI without conforming to any particular
    standards.
   </p>
   <a name='DocumentInformation'></a>
   <h2>Document Information</h2>
   <p>
    An XML parser can be viewed as a pipeline in which information flows
    from a scanner to a validator to the parser. In this pipeline, one
    component (the scanner) acts as a source of events; the final component
    (the parser) is the final target of the events; and any components
    between the source and target are known as filters. Filter components
    are both targets for the information sent by the previous component in
    the pipeline and sources for the information that the filter chooses to
    propagate to the next component in the pipeline. The following diagram
    illustrates the layout of the pipeline in this kind of parser.
   </p>
   <p>
    <table border='2' cellpadding='10' cellspacing='0'>
     <tr class='diagram'>
      <td>
       <table cellpadding='7' cellspacing='0'>
        <tr class='diagram'>
         <td class='diagram'>XML<br>Document</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component'>Scanner</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component'>Validator</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component'>Parser</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='diagram'>Application<br>API</td>
        </tr>
       </table>
      </td>
     </tr>
    </table>
   </p>
   <p>
    Parsing of DTDs can also be viewed as a pipeline. Since the
    DTD is referenced in the document instance by XML syntax
    (the DOCTYPE declaration), the DTD pipeline is triggered by
    the document scanner. This contrasts with XML Schema because
    there is no XML syntax that associates a Schema grammar with
    a document; a special attribute in the document instance is
    used as a <em>hint</em> to the location of the grammar. The
    following diagram illustrates the layout of the DTD pipeline.
   </p>
   <p>
    <table border='2' cellpadding='10' cellspacing='0'>
     <tr class='diagram'>
      <td>
       <table cellpadding='7' cellspacing='0'>
        <tr class='diagram'>
         <td class='diagram'>DTD<br>Document</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component'>DTD<br>Scanner</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component' rowspan='3'>Validator</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component'>Parser</td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='diagram'>Application<br>API</td>
        </tr>
        <tr><td>&nbsp;</td></tr>
        <tr class='diagram'>
         <td></td>
         <td></td>
         <td></td>
         <td></td>
         <td><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <td class='component'>DTD<br>Grammar</td>
        </tr>
       </table>
      </td>
     </tr>
    </table>
   </p>
   <p>
    Note that the DTD scanner communicates directly with the validator.
    The validator receives the callbacks from the DTD scanner in order
    to create and populate the DTD grammar object. In this way, the
    validator acts as a "tee", propogating the DTD events to both
    the next stage in the pipeline and the DTD grammar object. This
    allows the validation stage in the pipeline to be completely
    removed from the parser configuration, if needed.
   </p>
   <p>
    The XML document information is defined by the
    <code><a href='design.html#XMLDocumentHandler'>XMLDocumentHandler</a></code>
    interface and the DTD information is defined by the
    <code><a href='design.html#XMLDTDHandler'>XMLDTDHandler</a></code>
    and
    <code><a href='design.html#XMLDTDContentModelHandler'>XMLDTDContentModelHandler</a></code>
    interfaces.
    (Note: As of 10 Apr 2001, the DTD interfaces are subject to change
     based on user feedback.)
    This set of interfaces and supporting interfaces and classes
    comprise the XNI Core. However, whereas the XNI Core defines what
    information document and DTD is communicated but does not define
    the semantics for configuring the parser pipeline.
   </p>

   <a name='ParserConfiguration'></a>
   <h2>Parser Configuration</h2>
   <p>
    In the XNI world, a parser object used by an application is merely an
    API generator (e.g. building DOM trees or calling SAX handlers). The
    components and configuration information for that parser is defined
    within a parser configuration object. With this approach, different
    parser configurations can be used with the existing parser instances
    without duplicating code.
   </p>
   <p>
    The parser configuration object, defined by the
    <code><a href='design.html#XMLParserConfiguration'>XMLParserConfiguration</a></code>
    interface, that is used by the application is comprised of a series of
    components. The parser configuration assembles the parsing pipeline
    components, transmits settings to each component, and controls their
    actions. The following diagram shows a general parser configuration
    and its components. (No ordering or direct connection between
    components should be implied.)
   </p>
   <p>
    <table border='2' cellspacing='0' cellpadding='7'>
     <tr class='diagram'>
      <td>
       <table border='0' cellspacing='5' cellpadding='5'>
        <tr align='center' valign='middle'>
         <th class='manager' colspan='9'>Parser Configuration</th>
        </tr>
        <tr align='center' valign='middle'>
         <td class='non-config-component'>Symbol<br>Table</td>
         <td class='non-config-component'>Grammar<br>Pool</td>
         <td class='non-config-component'>Datatype<br>Validator<br>Factory</td>
         <td class='config-component'>Error<br>Reporter</td>
         <td class='config-component'>Entity<br>Manager</td>
         <td class='config-component'>Document<br>Scanner</td>
         <td class='config-component'>DTD<br>Scanner</td>
         <td class='config-component'>Validator</td>
        </tr>
       </table>
      </td>
     </tr>
    </table>
   </p>
   <p>
    The workings of the parser configuration object are unknown to
    the parser. The parser is only able to set features and properties
    on the configuration, set the XNI handlers to receive the document
    information, and initiate a parse. Typically the parser object
    itself will be registered as the target of XNI events produced
    from the parser configuration when a document is parsed, but it
    doesn't have to be. The following diagram illustrates this
    situation.
   </p>
   <p>
    <table border='2' cellspacing='0' cellpadding='7'>
     <tr class='diagram'>
      <td>
       <table border='0' cellspacing='5' cellpadding='5'>
        <tr align='center' valign='top'>
         <th class='parser' colspan='9' rowspan='2'>
          Parser
          <table border='0' cellspacing='5' cellpadding='5'>
           <th class='pipeline'>
            <em>Parser Configuration Pipeline</em>
            <table border='0' cellspacing='0' cellpadding='5'>
             <tr>
              <td class='config-component'>Scanner</td>
              <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td>
              <td class='config-component'>Validator</td>
              <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td>
             </tr>
            </table>
           </th>
          </table>
         </th>
         <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <th class='parser'>DOM<br>Parser</th>
        </tr>
        <tr align='center' valign='top'>
         <td valign='center'><img alt='--&gt;' src='images/arrow-right.gif'></td>
         <th class='parser'>SAX<br>Parser</th>
        </tr>
       </table>
      </td>
     </tr>
    </table>

   <a name='Configuration.FeaturesAndProperties'></a>
   <h3>Features &amp; Properties</h3>
   <p>
    Features and properties are provided via the extensible mechanism
    found in SAX2. Features are boolean settings on the parser
    configuration while properties are object settings. There are a
    number of SAX2 core features and properties but XNI parser components
    are free to define new ones. All of the features and properties are
    managed by the parser configuration, though.
   </p>
   <p>
    <em>TODO:</em> Expand on how features and properties are set, when,
    and by who.
   </p>

   <a name='Configuration.SettingsManagement'></a>
   <h3>Settings Management</h3>
   <p>
    The parser configuration implements the
    <code><a href='design.html#XMLComponentManager'>XMLComponentManager</a></code>
    interface and each component implements the
    <code><a href='design.html#XMLComponent'>XMLComponent</a></code>
    interface. For this configuration system to work, the parser
    configuration must adhere to the following guidelines:
    <span class='netscape'>
    <ul>
     <li>
      Before each parse, the parser configuration <strong>must</strong>
      call the <code>reset</code> method on each configurable component.
      This call allows each component to query the state of only
      those features and properties that are important to the operation
      of the component.
     </li>
     <li>
      Any time that the application sets a feature or property on the
      parser <em>during a parse</em>, the parser configuration
      <strong>must</strong> pass those settings to each configurable
      component. This is important because configuration settings can
      change while parsing an XML document and those settings may
      directly affect the operation of components. But this does
      <em>not</em> need to be done before or after a parse because
      each component will query settings during the call to
      <code>reset</code>.
     </li>
    </ul>
    </span>
   </p>

   </span>
   <a name='BOTTOM'></a>
   <hr>
   <span class='netscape'>
    Author: Andy Clark <br>
    Last modified: $Date$
   </span>
  </body>
 </html>
	<!-- $Id$ -->
	<html>
	<head>
	<title>Xerces 2 \| Architecture</title>
	<link rel='stylesheet' type='text/css' href='css/site.css'>
	<link rel='stylesheet' type='text/css' href='css/diagram.css'>
	<style type='text/css'>
	.note { font-size: smaller }
	.pipeline { color: black; background: white;
	border-style: solid; border-color: black; border-width: 1;
	font-weight: normal }
	</style>
	</head>
	<body>
	<span class='netscape'>
	<a name='TOP'></a>
	<h1>Xerces2 Architecture</h1>
	<h2>Table of Contents</h2>
	<p>
	<ul>
	<li><a href='#Overview'>Overview</a></li>
	<li><a href='#DocumentInformation'>Document Information</a></li>
	<li>
	<a href='#ParserConfiguration'>Parser Configuration</a>
	<ul>
	<li><a href='#Configuration.FeaturesAndProperties'>Features & Properties</a></li>
	<li><a href='#Configuration.SettingsManagement'>Settings Management</a></li>
	</ul>
	</li>
	</ul>
	</p>
	<hr>
	<a name='Overview'></a>
	<h2>Overview</h2>
	<p>
	The Xerces Native Interface (XNI) is a framework for communicating
	a "streaming" document information set and constructing generic parser
	configurations. XNI is part of the Xerces2 development but it is
	important to note that the Xerces2 parser is just a standards compliant
	reference implementation of the Xerces Native Interface. Other parsers
	can be written that conform to XNI without conforming to any particular
	standards.
	</p>
	<a name='DocumentInformation'></a>
	<h2>Document Information</h2>
	<p>
	An XML parser can be viewed as a pipeline in which information flows
	from a scanner to a validator to the parser. In this pipeline, one
	component (the scanner) acts as a source of events; the final component
	(the parser) is the final target of the events; and any components
	between the source and target are known as filters. Filter components
	are both targets for the information sent by the previous component in
	the pipeline and sources for the information that the filter chooses to
	propagate to the next component in the pipeline. The following diagram
	illustrates the layout of the pipeline in this kind of parser.
	</p>
	<p>
	<table border='2' cellpadding='10' cellspacing='0'>
	<tr class='diagram'>
	<td>
	<table cellpadding='7' cellspacing='0'>
	<tr class='diagram'>
	<td class='diagram'>XML<br>Document</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component'>Scanner</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component'>Validator</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component'>Parser</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='diagram'>Application<br>API</td>
	</tr>
	</table>
	</td>
	</tr>
	</table>
	</p>
	<p>
	Parsing of DTDs can also be viewed as a pipeline. Since the
	DTD is referenced in the document instance by XML syntax
	(the DOCTYPE declaration), the DTD pipeline is triggered by
	the document scanner. This contrasts with XML Schema because
	there is no XML syntax that associates a Schema grammar with
	a document; a special attribute in the document instance is
	used as a <em>hint</em> to the location of the grammar. The
	following diagram illustrates the layout of the DTD pipeline.
	</p>
	<p>
	<table border='2' cellpadding='10' cellspacing='0'>
	<tr class='diagram'>
	<td>
	<table cellpadding='7' cellspacing='0'>
	<tr class='diagram'>
	<td class='diagram'>DTD<br>Document</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component'>DTD<br>Scanner</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component' rowspan='3'>Validator</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component'>Parser</td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='diagram'>Application<br>API</td>
	</tr>
	<tr><td> </td></tr>
	<tr class='diagram'>
	<td></td>
	<td></td>
	<td></td>
	<td></td>
	<td><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='component'>DTD<br>Grammar</td>
	</tr>
	</table>
	</td>
	</tr>
	</table>
	</p>
	<p>
	Note that the DTD scanner communicates directly with the validator.
	The validator receives the callbacks from the DTD scanner in order
	to create and populate the DTD grammar object. In this way, the
	validator acts as a "tee", propogating the DTD events to both
	the next stage in the pipeline and the DTD grammar object. This
	allows the validation stage in the pipeline to be completely
	removed from the parser configuration, if needed.
	</p>
	<p>
	The XML document information is defined by the
	<code><a href='design.html#XMLDocumentHandler'>XMLDocumentHandler</a></code>
	interface and the DTD information is defined by the
	<code><a href='design.html#XMLDTDHandler'>XMLDTDHandler</a></code>
	and
	<code><a href='design.html#XMLDTDContentModelHandler'>XMLDTDContentModelHandler</a></code>
	interfaces.
	(Note: As of 10 Apr 2001, the DTD interfaces are subject to change
	based on user feedback.)
	This set of interfaces and supporting interfaces and classes
	comprise the XNI Core. However, whereas the XNI Core defines what
	information document and DTD is communicated but does not define
	the semantics for configuring the parser pipeline.
	</p>

	<a name='ParserConfiguration'></a>
	<h2>Parser Configuration</h2>
	<p>
	In the XNI world, a parser object used by an application is merely an
	API generator (e.g. building DOM trees or calling SAX handlers). The
	components and configuration information for that parser is defined
	within a parser configuration object. With this approach, different
	parser configurations can be used with the existing parser instances
	without duplicating code.
	</p>
	<p>
	The parser configuration object, defined by the
	<code><a href='design.html#XMLParserConfiguration'>XMLParserConfiguration</a></code>
	interface, that is used by the application is comprised of a series of
	components. The parser configuration assembles the parsing pipeline
	components, transmits settings to each component, and controls their
	actions. The following diagram shows a general parser configuration
	and its components. (No ordering or direct connection between
	components should be implied.)
	</p>
	<p>
	<table border='2' cellspacing='0' cellpadding='7'>
	<tr class='diagram'>
	<td>
	<table border='0' cellspacing='5' cellpadding='5'>
	<tr align='center' valign='middle'>
	<th class='manager' colspan='9'>Parser Configuration</th>
	</tr>
	<tr align='center' valign='middle'>
	<td class='non-config-component'>Symbol<br>Table</td>
	<td class='non-config-component'>Grammar<br>Pool</td>
	<td class='non-config-component'>Datatype<br>Validator<br>Factory</td>
	<td class='config-component'>Error<br>Reporter</td>
	<td class='config-component'>Entity<br>Manager</td>
	<td class='config-component'>Document<br>Scanner</td>
	<td class='config-component'>DTD<br>Scanner</td>
	<td class='config-component'>Validator</td>
	</tr>
	</table>
	</td>
	</tr>
	</table>
	</p>
	<p>
	The workings of the parser configuration object are unknown to
	the parser. The parser is only able to set features and properties
	on the configuration, set the XNI handlers to receive the document
	information, and initiate a parse. Typically the parser object
	itself will be registered as the target of XNI events produced
	from the parser configuration when a document is parsed, but it
	doesn't have to be. The following diagram illustrates this
	situation.
	</p>
	<p>
	<table border='2' cellspacing='0' cellpadding='7'>
	<tr class='diagram'>
	<td>
	<table border='0' cellspacing='5' cellpadding='5'>
	<tr align='center' valign='top'>
	<th class='parser' colspan='9' rowspan='2'>
	Parser
	<table border='0' cellspacing='5' cellpadding='5'>
	<th class='pipeline'>
	<em>Parser Configuration Pipeline</em>
	<table border='0' cellspacing='0' cellpadding='5'>
	<tr>
	<td class='config-component'>Scanner</td>
	<td valign='center'><img alt='-->' src='images/arrow-right.gif'></td>
	<td class='config-component'>Validator</td>
	<td valign='center'><img alt='-->' src='images/arrow-right.gif'></td>
	</tr>
	</table>
	</th>
	</table>
	</th>
	<td valign='center'><img alt='-->' src='images/arrow-right.gif'></td>
	<th class='parser'>DOM<br>Parser</th>
	</tr>
	<tr align='center' valign='top'>
	<td valign='center'><img alt='-->' src='images/arrow-right.gif'></td>
	<th class='parser'>SAX<br>Parser</th>
	</tr>
	</table>
	</td>
	</tr>
	</table>

	<a name='Configuration.FeaturesAndProperties'></a>
	<h3>Features & Properties</h3>
	<p>
	Features and properties are provided via the extensible mechanism
	found in SAX2. Features are boolean settings on the parser
	configuration while properties are object settings. There are a
	number of SAX2 core features and properties but XNI parser components
	are free to define new ones. All of the features and properties are
	managed by the parser configuration, though.
	</p>
	<p>
	<em>TODO:</em> Expand on how features and properties are set, when,
	and by who.
	</p>

	<a name='Configuration.SettingsManagement'></a>
	<h3>Settings Management</h3>
	<p>
	The parser configuration implements the
	<code><a href='design.html#XMLComponentManager'>XMLComponentManager</a></code>
	interface and each component implements the
	<code><a href='design.html#XMLComponent'>XMLComponent</a></code>
	interface. For this configuration system to work, the parser
	configuration must adhere to the following guidelines:
	<span class='netscape'>
	<ul>
	<li>
	Before each parse, the parser configuration <strong>must</strong>
	call the <code>reset</code> method on each configurable component.
	This call allows each component to query the state of only
	those features and properties that are important to the operation
	of the component.
	</li>
	<li>
	Any time that the application sets a feature or property on the
	parser <em>during a parse</em>, the parser configuration
	<strong>must</strong> pass those settings to each configurable
	component. This is important because configuration settings can
	change while parsing an XML document and those settings may
	directly affect the operation of components. But this does
	<em>not</em> need to be done before or after a parse because
	each component will query settings during the call to
	<code>reset</code>.
	</li>
	</ul>
	</span>
	</p>

	</span>
	<a name='BOTTOM'></a>
	<hr>
	<span class='netscape'>
	Author: Andy Clark <br>
	Last modified: $Date$
	</span>
	</body>
	</html>