docs/xni-design.xml - xerces2-j - Git at Google

 <?xml version='1.0' encoding='UTF-8'?>
 <!--
  * Licensed to the Apache Software Foundation (ASF) under one or more
  * contributor license agreements.  See the NOTICE file distributed with
  * this work for additional information regarding copyright ownership.
  * The ASF licenses this file to You under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *      http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
 -->
 <!DOCTYPE s1 SYSTEM 'dtd/document.dtd'>
 <s1 title='XNI Design Details'>
  <s2 title='Overview'>
   <p>
    A parser written to conform to the Xerces Native Interface (XNI)
    framework is configured as a pipeline of parser components. The
    document's "streaming" information set flows through this pipeline
    of components to produce some sort of programming interface as the
    output. For example, the pipeline could produce a W3C Document
    Object Model (DOM) or a series of Simple API for XML (SAX) events.
   </p>
   <p>
    The core XNI interfaces provide a mechanism for the document
    information to flow from component to component. However, beyond
    the basic information interfaces, XNI also defines a framework for
    constructing these pipelines and parser configurations. This
    document is designed to give you an overview of this framework
    and what a parser looks like that is written to conform to the
    Xerces Native Interface. An overview of these frameworks are
    described below:
   </p>
   <ul>
    <li><link anchor='pipeline'>Pipeline</link></li>
    <li><link anchor='configuration'>Configuration</link></li>
   </ul>
   <p>
    For more detailed information, refer to the following documents:
   </p>
   <ul>
    <li><link idref='xni-core'>Core Interfaces</link></li>
    <li><link idref='xni-config'>Parser Configuration</link></li>
    <li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li>
   </ul>
  </s2>
  <anchor name='pipeline'/>
  <s2 title='Pipeline'>
   <p>
    The XNI parser pipeline is any combination of components that
    are either capable of producing XNI events, consuming XNI events,
    or both. All pipelines consist of a source, zero or more filters,
    and a target. The source is typically the XML scanner; common
    filters are DTD and XML Schema validators, a namespace binder,
    etc; and the target is the parser that consumes the XNI events
    and produces a common programming interface such as DOM or SAX.
    The following diagram illustrates the basic pipeline configuration.
   </p>
   <p>
    <img alt='Basic Pipeline Configuration' src='xni-pipeline-basic.gif'/>
   </p>
   <p>
    However, this is a simplified view of the pipeline configuration.
    The Xerces Native Interface actually defines two different pipelines
    with three interfaces: one for document information and two for DTD
    information.
   </p>
   <p>
    The Xerces2 parser, the reference implementation of XNI,
    contains more components than the basic pipeline configuration
    diagram shows. The following diagram shows the Xerces2 pipeline
    configuration. The arrow going from left to right on the top of the
    image represents the flow of document information and the arrows on
    the bottom of the image represent the DTD information flowing through
    the parser pipeline.
   </p>
   <p>
    <img alt='Xerces2 Pipeline Configuration' src='xni-pipeline-detailed.gif'/>
   </p>
   <p>
    As the diagram shows, the "Document Scanner" is the source for
    document information and the "DTD Scanner" is the source for DTD
    information. Both document and DTD information generated by the
    scanners flow into the "DTD Validator" where structure and content
    is validated according to the DTD grammar, if present. From here,
    the validated document information with possible augmentations such
    as default attribute values and attribute value normalization flows
    to the "Namespace Binder" which applies the namespace information to
    elements and attributes. The newly namespace-bound document
    document information then flows to the "Schema Validator" for
    validation based on the XML Schema, if present. Finally, the
    document and DTD information flow to the "Parser" which generates
    a programming interface such as DOM or SAX.
   </p>
   <p>
    XNI defines the document information using a number of core
    interfaces. (These interfaces are described in more detail in the
    <link idref='xni-api-core'>Core API</link> documentation.) But XNI
    also defines a set of interfaces to build parser configurations
    that assemble the pipelines in order to parse documents. The next
    section gives a general overview of the this parser configuration
    provided by XNI.
   </p>
  </s2>
  <anchor name='configuration'/>
  <s2 title='Configuration'>
   <p>
    A parser implementation written using the Xerces Native Interface
    can be seen as a collection of components, some of which are
    connected together to form the pipelines for document and DTD
    information. All of the components in the parser are managed by
    a "Component Manager" that does the following:
   </p>
   <ul>
    <li>Keeps track of parser settings and options,</li>
    <li>
     Instantiates and configures the various components in the parser, and
    </li>
    <li>Assembles the parsing pipeline and initiates parsing of documents.</li>
   </ul>
   <p>
    The following diagram represents a typical parser configuration
    that has a component manager and various components such as a
    "Symbol Table", "Scanner", etc.
   </p>
   <p>
    <img alt='Generic Parser Configuration' src='xni-components-overview.gif'/>
   </p>
   <p>
    Some of the components in a configuration are configurable and others
    are not. The actual details regarding component configuration, however,
    can be found in the <link idref='xni-config'>XNI Parser Configuration</link>
    document. But for now it is sufficient to understand the basic overview
    of parser configurations.
   </p>
   <p>
    The XNI parser configuration framework provides an easy and
    convenient way to construct different kinds of parser configurations.
    By separating the configuration from the API generation (in each
    specific parser object), different parser configurations can be used to
    build a DOM tree or emit SAX events without re-implementing the DOM or
    SAX code. The following diagram shows this separation. Notice how the
    document information flows through the pipeline in the parser
    configuration and then to the parser object which generates different
    APIs.
   </p>
   <p>
    <img alt='Configuration and Parser Separation' src='xni-parser-configuration.gif'/>
   </p>
  </s2>
 </s1>
	<?xml version='1.0' encoding='UTF-8'?>
	<!--
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing, software
	* distributed under the License is distributed on an "AS IS" BASIS,
	* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	* See the License for the specific language governing permissions and
	* limitations under the License.
	-->
	<!DOCTYPE s1 SYSTEM 'dtd/document.dtd'>
	<s1 title='XNI Design Details'>
	<s2 title='Overview'>
	<p>
	A parser written to conform to the Xerces Native Interface (XNI)
	framework is configured as a pipeline of parser components. The
	document's "streaming" information set flows through this pipeline
	of components to produce some sort of programming interface as the
	output. For example, the pipeline could produce a W3C Document
	Object Model (DOM) or a series of Simple API for XML (SAX) events.
	</p>
	<p>
	The core XNI interfaces provide a mechanism for the document
	information to flow from component to component. However, beyond
	the basic information interfaces, XNI also defines a framework for
	constructing these pipelines and parser configurations. This
	document is designed to give you an overview of this framework
	and what a parser looks like that is written to conform to the
	Xerces Native Interface. An overview of these frameworks are
	described below:
	</p>
	<ul>
	<li><link anchor='pipeline'>Pipeline</link></li>
	<li><link anchor='configuration'>Configuration</link></li>
	</ul>
	<p>
	For more detailed information, refer to the following documents:
	</p>
	<ul>
	<li><link idref='xni-core'>Core Interfaces</link></li>
	<li><link idref='xni-config'>Parser Configuration</link></li>
	<li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li>
	</ul>
	</s2>
	<anchor name='pipeline'/>
	<s2 title='Pipeline'>
	<p>
	The XNI parser pipeline is any combination of components that
	are either capable of producing XNI events, consuming XNI events,
	or both. All pipelines consist of a source, zero or more filters,
	and a target. The source is typically the XML scanner; common
	filters are DTD and XML Schema validators, a namespace binder,
	etc; and the target is the parser that consumes the XNI events
	and produces a common programming interface such as DOM or SAX.
	The following diagram illustrates the basic pipeline configuration.
	</p>
	<p>
	<img alt='Basic Pipeline Configuration' src='xni-pipeline-basic.gif'/>
	</p>
	<p>
	However, this is a simplified view of the pipeline configuration.
	The Xerces Native Interface actually defines two different pipelines
	with three interfaces: one for document information and two for DTD
	information.
	</p>
	<p>
	The Xerces2 parser, the reference implementation of XNI,
	contains more components than the basic pipeline configuration
	diagram shows. The following diagram shows the Xerces2 pipeline
	configuration. The arrow going from left to right on the top of the
	image represents the flow of document information and the arrows on
	the bottom of the image represent the DTD information flowing through
	the parser pipeline.
	</p>
	<p>
	<img alt='Xerces2 Pipeline Configuration' src='xni-pipeline-detailed.gif'/>
	</p>
	<p>
	As the diagram shows, the "Document Scanner" is the source for
	document information and the "DTD Scanner" is the source for DTD
	information. Both document and DTD information generated by the
	scanners flow into the "DTD Validator" where structure and content
	is validated according to the DTD grammar, if present. From here,
	the validated document information with possible augmentations such
	as default attribute values and attribute value normalization flows
	to the "Namespace Binder" which applies the namespace information to
	elements and attributes. The newly namespace-bound document
	document information then flows to the "Schema Validator" for
	validation based on the XML Schema, if present. Finally, the
	document and DTD information flow to the "Parser" which generates
	a programming interface such as DOM or SAX.
	</p>
	<p>
	XNI defines the document information using a number of core
	interfaces. (These interfaces are described in more detail in the
	<link idref='xni-api-core'>Core API</link> documentation.) But XNI
	also defines a set of interfaces to build parser configurations
	that assemble the pipelines in order to parse documents. The next
	section gives a general overview of the this parser configuration
	provided by XNI.
	</p>
	</s2>
	<anchor name='configuration'/>
	<s2 title='Configuration'>
	<p>
	A parser implementation written using the Xerces Native Interface
	can be seen as a collection of components, some of which are
	connected together to form the pipelines for document and DTD
	information. All of the components in the parser are managed by
	a "Component Manager" that does the following:
	</p>
	<ul>
	<li>Keeps track of parser settings and options,</li>
	<li>
	Instantiates and configures the various components in the parser, and
	</li>
	<li>Assembles the parsing pipeline and initiates parsing of documents.</li>
	</ul>
	<p>
	The following diagram represents a typical parser configuration
	that has a component manager and various components such as a
	"Symbol Table", "Scanner", etc.
	</p>
	<p>
	<img alt='Generic Parser Configuration' src='xni-components-overview.gif'/>
	</p>
	<p>
	Some of the components in a configuration are configurable and others
	are not. The actual details regarding component configuration, however,
	can be found in the <link idref='xni-config'>XNI Parser Configuration</link>
	document. But for now it is sufficient to understand the basic overview
	of parser configurations.
	</p>
	<p>
	The XNI parser configuration framework provides an easy and
	convenient way to construct different kinds of parser configurations.
	By separating the configuration from the API generation (in each
	specific parser object), different parser configurations can be used to
	build a DOM tree or emit SAX events without re-implementing the DOM or
	SAX code. The following diagram shows this separation. Notice how the
	document information flows through the pipeline in the parser
	configuration and then to the parser object which generates different
	APIs.
	</p>
	<p>
	<img alt='Configuration and Parser Separation' src='xni-parser-configuration.gif'/>
	</p>
	</s2>
	</s1>