blob: d6f4842e1af7032c92130c61ac0202ad7bee5857 [file] [log] [blame]
<?xml version='1.0' encoding='UTF-8'?>
<!--
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
-->
<!DOCTYPE s1 SYSTEM 'dtd/document.dtd'>
<s1 title='XNI Design Details'>
<s2 title='Overview'>
<p>
A parser written to conform to the Xerces Native Interface (XNI)
framework is configured as a pipeline of parser components. The
document's "streaming" information set flows through this pipeline
of components to produce some sort of programming interface as the
output. For example, the pipeline could produce a W3C Document
Object Model (DOM) or a series of Simple API for XML (SAX) events.
</p>
<p>
The core XNI interfaces provide a mechanism for the document
information to flow from component to component. However, beyond
the basic information interfaces, XNI also defines a framework for
constructing these pipelines and parser configurations. This
document is designed to give you an overview of this framework
and what a parser looks like that is written to conform to the
Xerces Native Interface. An overview of these frameworks are
described below:
</p>
<ul>
<li><link anchor='pipeline'>Pipeline</link></li>
<li><link anchor='configuration'>Configuration</link></li>
</ul>
<p>
For more detailed information, refer to the following documents:
</p>
<ul>
<li><link idref='xni-core'>Core Interfaces</link></li>
<li><link idref='xni-config'>Parser Configuration</link></li>
<li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li>
</ul>
</s2>
<anchor name='pipeline'/>
<s2 title='Pipeline'>
<p>
The XNI parser pipeline is any combination of components that
are either capable of producing XNI events, consuming XNI events,
or both. All pipelines consist of a source, zero or more filters,
and a target. The source is typically the XML scanner; common
filters are DTD and XML Schema validators, a namespace binder,
etc; and the target is the parser that consumes the XNI events
and produces a common programming interface such as DOM or SAX.
The following diagram illustrates the basic pipeline configuration.
</p>
<p>
<img alt='Basic Pipeline Configuration' src='xni-pipeline-basic.gif'/>
</p>
<p>
However, this is a simplified view of the pipeline configuration.
The Xerces Native Interface actually defines two different pipelines
with three interfaces: one for document information and two for DTD
information.
</p>
<p>
The Xerces2 parser, the reference implementation of XNI,
contains more components than the basic pipeline configuration
diagram shows. The following diagram shows the Xerces2 pipeline
configuration. The arrow going from left to right on the top of the
image represents the flow of document information and the arrows on
the bottom of the image represent the DTD information flowing through
the parser pipeline.
</p>
<p>
<img alt='Xerces2 Pipeline Configuration' src='xni-pipeline-detailed.gif'/>
</p>
<p>
As the diagram shows, the "Document Scanner" is the source for
document information and the "DTD Scanner" is the source for DTD
information. Both document and DTD information generated by the
scanners flow into the "DTD Validator" where structure and content
is validated according to the DTD grammar, if present. From here,
the validated document information with possible augmentations such
as default attribute values and attribute value normalization flows
to the "Namespace Binder" which applies the namespace information to
elements and attributes. The newly namespace-bound document
document information then flows to the "Schema Validator" for
validation based on the XML Schema, if present. Finally, the
document and DTD information flow to the "Parser" which generates
a programming interface such as DOM or SAX.
</p>
<p>
XNI defines the document information using a number of core
interfaces. (These interfaces are described in more detail in the
<link idref='xni-api-core'>Core API</link> documentation.) But XNI
also defines a set of interfaces to build parser configurations
that assemble the pipelines in order to parse documents. The next
section gives a general overview of the this parser configuration
provided by XNI.
</p>
</s2>
<anchor name='configuration'/>
<s2 title='Configuration'>
<p>
A parser implementation written using the Xerces Native Interface
can be seen as a collection of components, some of which are
connected together to form the pipelines for document and DTD
information. All of the components in the parser are managed by
a "Component Manager" that does the following:
</p>
<ul>
<li>Keeps track of parser settings and options,</li>
<li>
Instantiates and configures the various components in the parser, and
</li>
<li>Assembles the parsing pipeline and initiates parsing of documents.</li>
</ul>
<p>
The following diagram represents a typical parser configuration
that has a component manager and various components such as a
"Symbol Table", "Scanner", etc.
</p>
<p>
<img alt='Generic Parser Configuration' src='xni-components-overview.gif'/>
</p>
<p>
Some of the components in a configuration are configurable and others
are not. The actual details regarding component configuration, however,
can be found in the <link idref='xni-config'>XNI Parser Configuration</link>
document. But for now it is sufficient to understand the basic overview
of parser configurations.
</p>
<p>
The XNI parser configuration framework provides an easy and
convenient way to construct different kinds of parser configurations.
By separating the configuration from the API generation (in each
specific parser object), different parser configurations can be used to
build a DOM tree or emit SAX events without re-implementing the DOM or
SAX code. The following diagram shows this separation. Notice how the
document information flows through the pipeline in the parser
configuration and then to the parser object which generates different
APIs.
</p>
<p>
<img alt='Configuration and Parser Separation' src='xni-parser-configuration.gif'/>
</p>
</s2>
</s1>