docs/xni.xml - xerces2-j - Git at Google

 <?xml version='1.0' encoding='UTF-8'?>
 <!--
  * Licensed to the Apache Software Foundation (ASF) under one or more
  * contributor license agreements.  See the NOTICE file distributed with
  * this work for additional information regarding copyright ownership.
  * The ASF licenses this file to You under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *      http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
 -->
 <!DOCTYPE s1 SYSTEM 'dtd/document.dtd'>
 <s1 title='Xerces Native Interface'>
  <s2 title='Overview'>
   <p>
    The Xerces Native Interface (XNI) is a framework for communicating
    a "streaming" document information set and constructing generic
    parser configurations. XNI is part of the Xerces2 development but
    the Xerces2 parser is just a standards compliant reference
    implementation of the Xerces Native Interface. Other parsers can be
    written that conform to XNI without conforming to any particular
    standards or using any code from the reference implementation.
   </p>
   <p>
    The Xerces Native Interface is used to implement the Xerces2 parser
    from a set of modular components in a standard configuration. This
    configuration is then used to drive the DOM and SAX parser
    implementations provided with Xerces2. However, XNI is merely an
    <em>internal</em> set of interfaces. There is no need for an XML
    application programmer to learn XNI if they only intend to interface
    to the Xerces2 parser using standard interfaces like JAXP, DOM, and
    SAX. Xerces developers and application developers that need more
    power and flexibility than that provided by the standard interfaces
    should read and understand XNI.
   </p>
   <p>Overview information:</p>
   <ul>
    <li>
     <jump href='#streaming-info-set'>"Streaming" Information
     Set</jump>
    </li>
    <li>
     <jump href='#generic-parser-configurations'>Generic Parser
     Configurations</jump>
    </li>
   </ul>
   <p>Design and implementation information:</p>
   <ul>
    <li><link idref='xni-design'>Design Details</link></li>
    <li><link idref='xni-core'>Core Interfaces</link></li>
    <li><link idref='xni-config'>Parser Configuration</link></li>
    <li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li>
   </ul>
  </s2>
  <anchor name='streaming-info-set'/>
  <s2 title='"Streaming" Information Set'>
   <p>
    What is meant by a "streaming" information set? Quite simply,
    the streaming information set is the document information that can
    be communicated by parsing the document in a serial manner. In
    other words, it is the information received as-you-see-it. An XNI
    parser provides this streaming info set to a registered document
    handler. The XNI document handler is similar to the standard
    SAX <code>ContentHandler</code> interface but is different in
    several important ways:
   </p>
   <ul>
    <li>
     XNI attempts to provide lossless communication of the streaming
     information set. Therefore, XNI passes the encodings of external
     parsed entities and other information that is lost when using SAX.
    </li>
    <li>
     The XNI document handler interface is also designed to build a
     pipeline of parser components where the streaming information set
     can be fully modified and augmented by each stage in the pipeline.
     SAX, however, is primarily a read-only set of interfaces.
    </li>
   </ul>
   <p>
    The Xerces Native Interface breaks the document's streaming
    information set into several more manageable interfaces:
   </p>
   <table>
    <tr><th>Interface</th><th>Description</th></tr>
    <tr>
     <td><code>XMLDocumentHandler</code></td>
     <td>Communicates document structure and content information.</td>
    </tr>
    <tr>
     <td><code>XMLDTDHandler</code></td>
     <td>
      Communicates basic DTD information such as element and attribute
      declarations.
     </td>
    </tr>
    <tr>
     <td><code>XMLDTDContentModelHandler</code></td>
     <td>
      Breaks down each element declaration's content model into a
      set of separate methods so that handlers don't have to reparse
      the content model string given in the
      <code>XMLDTDHandler#elementDecl(String,String)</code> method.
      This separation also helps those applications that want to
      know boundaries of entities when used as part of an element's
      content model.
     </td>
    </tr>
   </table>
   <p>
    And an additional handler is provided for convenience in defining
    document fragments:
   </p>
   <table>
    <tr><th>Interface</th><th>Description</th></tr>
    <tr>
     <td><code>XMLDocumentFragmentHandler</code></td>
     <td>Communicates information about a document fragment.</td>
    </tr>
   </table>
   <p>
    For complete details of the Xerces Native Interface, refer to
    the <link idref='xni-core'>Core Interfaces</link> documentation.
   </p>
  </s2>
  <anchor name='generic-parser-configurations'/>
  <s2 title='Generic Parser Configurations'>
   <p>
    The Xerces Native Interface document handler interfaces define a
    document's streaming information set but XNI also contains a set
    of interfaces that define parser components and configurations.
    These interfaces provide a framework for a library of parser parts
    that can be used interchangeably or completely replaced at the
    programmer's option. This framework allows an unparalleled level
    of configuration and implementation choices to implement XML
    applications.
   </p>
   <p>
    The following list details some possible examples of parsers and
    configurations that can be written using the XNI parser configuration
    framework:
   </p>
   <ul>
    <li>
     <strong>HTML Parser</strong><br/>
     An HTML scanner can be written that breaks an HTML document into
     a series of XNI callbacks. Using a configuration that swaps the
     default XML scanner with the HTML scanner, you can create DOM and
     SAX parsers for HTML documents.
    </li>
    <li>
     <strong>Optimized Parser</strong><br/>
     For improved XML performance, a minimal XML scanner can be written
     and swapped for the default, fully compliant XML scanner. In
     addition, the validator component can be removed from the parser
     pipeline to reduce the amount of work required to parse XML
     documents.
    </li>
    <li>
     <strong>XInclude Processor</strong><br/>
     An XNI parser component can be written to handle XInclude by
     analyzing the streaming information set and automatically
     inserting the contents of referenced links into the event stream.
     By adding this component to the parser pipeline before the
     validator, included content would appear transparent to the
     validator as if that content was in the original document.
    </li>
   </ul>
   <p>
    This is just a small sample of what is possible when using the
    XNI parser configuration framework. For complete details of the
    XNI parser configurations, refer to the
    <link idref='xni-config'>Parser Configuration</link>
    documentation.
   </p>
  </s2>
 </s1>
	<?xml version='1.0' encoding='UTF-8'?>
	<!--
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	-->
	<!DOCTYPE s1 SYSTEM 'dtd/document.dtd'>
	<s1 title='Xerces Native Interface'>
	<s2 title='Overview'>
	<p>
	The Xerces Native Interface (XNI) is a framework for communicating
	a "streaming" document information set and constructing generic
	parser configurations. XNI is part of the Xerces2 development but
	the Xerces2 parser is just a standards compliant reference
	implementation of the Xerces Native Interface. Other parsers can be
	written that conform to XNI without conforming to any particular
	standards or using any code from the reference implementation.
	</p>
	<p>
	The Xerces Native Interface is used to implement the Xerces2 parser
	from a set of modular components in a standard configuration. This
	configuration is then used to drive the DOM and SAX parser
	implementations provided with Xerces2. However, XNI is merely an
	<em>internal</em> set of interfaces. There is no need for an XML
	application programmer to learn XNI if they only intend to interface
	to the Xerces2 parser using standard interfaces like JAXP, DOM, and
	SAX. Xerces developers and application developers that need more
	power and flexibility than that provided by the standard interfaces
	should read and understand XNI.
	</p>
	<p>Overview information:</p>
	<ul>
	<li>
	<jump href='#streaming-info-set'>"Streaming" Information
	Set</jump>
	</li>
	<li>
	<jump href='#generic-parser-configurations'>Generic Parser
	Configurations</jump>
	</li>
	</ul>
	<p>Design and implementation information:</p>
	<ul>
	<li><link idref='xni-design'>Design Details</link></li>
	<li><link idref='xni-core'>Core Interfaces</link></li>
	<li><link idref='xni-config'>Parser Configuration</link></li>
	<li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li>
	</ul>
	</s2>
	<anchor name='streaming-info-set'/>
	<s2 title='"Streaming" Information Set'>
	<p>
	What is meant by a "streaming" information set? Quite simply,
	the streaming information set is the document information that can
	be communicated by parsing the document in a serial manner. In
	other words, it is the information received as-you-see-it. An XNI
	parser provides this streaming info set to a registered document
	handler. The XNI document handler is similar to the standard
	SAX <code>ContentHandler</code> interface but is different in
	several important ways:
	</p>
	<ul>
	<li>
	XNI attempts to provide lossless communication of the streaming
	information set. Therefore, XNI passes the encodings of external
	parsed entities and other information that is lost when using SAX.
	</li>
	<li>
	The XNI document handler interface is also designed to build a
	pipeline of parser components where the streaming information set
	can be fully modified and augmented by each stage in the pipeline.
	SAX, however, is primarily a read-only set of interfaces.
	</li>
	</ul>
	<p>
	The Xerces Native Interface breaks the document's streaming
	information set into several more manageable interfaces:
	</p>
	<table>
	<tr><th>Interface</th><th>Description</th></tr>
	<tr>
	<td><code>XMLDocumentHandler</code></td>
	<td>Communicates document structure and content information.</td>
	</tr>
	<tr>
	<td><code>XMLDTDHandler</code></td>
	<td>
	Communicates basic DTD information such as element and attribute
	declarations.
	</td>
	</tr>
	<tr>
	<td><code>XMLDTDContentModelHandler</code></td>
	<td>
	Breaks down each element declaration's content model into a
	set of separate methods so that handlers don't have to reparse
	the content model string given in the
	<code>XMLDTDHandler#elementDecl(String,String)</code> method.
	This separation also helps those applications that want to
	know boundaries of entities when used as part of an element's
	content model.
	</td>
	</tr>
	</table>
	<p>
	And an additional handler is provided for convenience in defining
	document fragments:
	</p>
	<table>
	<tr><th>Interface</th><th>Description</th></tr>
	<tr>
	<td><code>XMLDocumentFragmentHandler</code></td>
	<td>Communicates information about a document fragment.</td>
	</tr>
	</table>
	<p>
	For complete details of the Xerces Native Interface, refer to
	the <link idref='xni-core'>Core Interfaces</link> documentation.
	</p>
	</s2>
	<anchor name='generic-parser-configurations'/>
	<s2 title='Generic Parser Configurations'>
	<p>
	The Xerces Native Interface document handler interfaces define a
	document's streaming information set but XNI also contains a set
	of interfaces that define parser components and configurations.
	These interfaces provide a framework for a library of parser parts
	that can be used interchangeably or completely replaced at the
	programmer's option. This framework allows an unparalleled level
	of configuration and implementation choices to implement XML
	applications.
	</p>
	<p>
	The following list details some possible examples of parsers and
	configurations that can be written using the XNI parser configuration
	framework:
	</p>
	<ul>
	<li>
	<strong>HTML Parser</strong><br/>
	An HTML scanner can be written that breaks an HTML document into
	a series of XNI callbacks. Using a configuration that swaps the
	default XML scanner with the HTML scanner, you can create DOM and
	SAX parsers for HTML documents.
	</li>
	<li>
	<strong>Optimized Parser</strong><br/>
	For improved XML performance, a minimal XML scanner can be written
	and swapped for the default, fully compliant XML scanner. In
	addition, the validator component can be removed from the parser
	pipeline to reduce the amount of work required to parse XML
	documents.
	</li>
	<li>
	<strong>XInclude Processor</strong><br/>
	An XNI parser component can be written to handle XInclude by
	analyzing the streaming information set and automatically
	inserting the contents of referenced links into the event stream.
	By adding this component to the parser pipeline before the
	validator, included content would appear transparent to the
	validator as if that content was in the original document.
	</li>
	</ul>
	<p>
	This is just a small sample of what is possible when using the
	XNI parser configuration framework. For complete details of the
	XNI parser configurations, refer to the
	<link idref='xni-config'>Parser Configuration</link>
	documentation.
	</p>
	</s2>
	</s1>