| <?xml version='1.0' encoding='UTF-8'?> |
| <!-- |
| * Licensed to the Apache Software Foundation (ASF) under one or more |
| * contributor license agreements. See the NOTICE file distributed with |
| * this work for additional information regarding copyright ownership. |
| * The ASF licenses this file to You under the Apache License, Version 2.0 |
| * (the "License"); you may not use this file except in compliance with |
| * the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| --> |
| <!DOCTYPE s1 SYSTEM 'dtd/document.dtd'> |
| <s1 title='XNI Design Details'> |
| <s2 title='Overview'> |
| <p> |
| A parser written to conform to the Xerces Native Interface (XNI) |
| framework is configured as a pipeline of parser components. The |
| document's "streaming" information set flows through this pipeline |
| of components to produce some sort of programming interface as the |
| output. For example, the pipeline could produce a W3C Document |
| Object Model (DOM) or a series of Simple API for XML (SAX) events. |
| </p> |
| <p> |
| The core XNI interfaces provide a mechanism for the document |
| information to flow from component to component. However, beyond |
| the basic information interfaces, XNI also defines a framework for |
| constructing these pipelines and parser configurations. This |
| document is designed to give you an overview of this framework |
| and what a parser looks like that is written to conform to the |
| Xerces Native Interface. An overview of these frameworks are |
| described below: |
| </p> |
| <ul> |
| <li><link anchor='pipeline'>Pipeline</link></li> |
| <li><link anchor='configuration'>Configuration</link></li> |
| </ul> |
| <p> |
| For more detailed information, refer to the following documents: |
| </p> |
| <ul> |
| <li><link idref='xni-core'>Core Interfaces</link></li> |
| <li><link idref='xni-config'>Parser Configuration</link></li> |
| <li><link idref='xni-xerces2'>Xerces2 Parser Components</link></li> |
| </ul> |
| </s2> |
| <anchor name='pipeline'/> |
| <s2 title='Pipeline'> |
| <p> |
| The XNI parser pipeline is any combination of components that |
| are either capable of producing XNI events, consuming XNI events, |
| or both. All pipelines consist of a source, zero or more filters, |
| and a target. The source is typically the XML scanner; common |
| filters are DTD and XML Schema validators, a namespace binder, |
| etc; and the target is the parser that consumes the XNI events |
| and produces a common programming interface such as DOM or SAX. |
| The following diagram illustrates the basic pipeline configuration. |
| </p> |
| <p> |
| <img alt='Basic Pipeline Configuration' src='xni-pipeline-basic.gif'/> |
| </p> |
| <p> |
| However, this is a simplified view of the pipeline configuration. |
| The Xerces Native Interface actually defines two different pipelines |
| with three interfaces: one for document information and two for DTD |
| information. |
| </p> |
| <p> |
| The Xerces2 parser, the reference implementation of XNI, |
| contains more components than the basic pipeline configuration |
| diagram shows. The following diagram shows the Xerces2 pipeline |
| configuration. The arrow going from left to right on the top of the |
| image represents the flow of document information and the arrows on |
| the bottom of the image represent the DTD information flowing through |
| the parser pipeline. |
| </p> |
| <p> |
| <img alt='Xerces2 Pipeline Configuration' src='xni-pipeline-detailed.gif'/> |
| </p> |
| <p> |
| As the diagram shows, the "Document Scanner" is the source for |
| document information and the "DTD Scanner" is the source for DTD |
| information. Both document and DTD information generated by the |
| scanners flow into the "DTD Validator" where structure and content |
| is validated according to the DTD grammar, if present. From here, |
| the validated document information with possible augmentations such |
| as default attribute values and attribute value normalization flows |
| to the "Namespace Binder" which applies the namespace information to |
| elements and attributes. The newly namespace-bound document |
| document information then flows to the "Schema Validator" for |
| validation based on the XML Schema, if present. Finally, the |
| document and DTD information flow to the "Parser" which generates |
| a programming interface such as DOM or SAX. |
| </p> |
| <p> |
| XNI defines the document information using a number of core |
| interfaces. (These interfaces are described in more detail in the |
| <link idref='xni-api-core'>Core API</link> documentation.) But XNI |
| also defines a set of interfaces to build parser configurations |
| that assemble the pipelines in order to parse documents. The next |
| section gives a general overview of the this parser configuration |
| provided by XNI. |
| </p> |
| </s2> |
| <anchor name='configuration'/> |
| <s2 title='Configuration'> |
| <p> |
| A parser implementation written using the Xerces Native Interface |
| can be seen as a collection of components, some of which are |
| connected together to form the pipelines for document and DTD |
| information. All of the components in the parser are managed by |
| a "Component Manager" that does the following: |
| </p> |
| <ul> |
| <li>Keeps track of parser settings and options,</li> |
| <li> |
| Instantiates and configures the various components in the parser, and |
| </li> |
| <li>Assembles the parsing pipeline and initiates parsing of documents.</li> |
| </ul> |
| <p> |
| The following diagram represents a typical parser configuration |
| that has a component manager and various components such as a |
| "Symbol Table", "Scanner", etc. |
| </p> |
| <p> |
| <img alt='Generic Parser Configuration' src='xni-components-overview.gif'/> |
| </p> |
| <p> |
| Some of the components in a configuration are configurable and others |
| are not. The actual details regarding component configuration, however, |
| can be found in the <link idref='xni-config'>XNI Parser Configuration</link> |
| document. But for now it is sufficient to understand the basic overview |
| of parser configurations. |
| </p> |
| <p> |
| The XNI parser configuration framework provides an easy and |
| convenient way to construct different kinds of parser configurations. |
| By separating the configuration from the API generation (in each |
| specific parser object), different parser configurations can be used to |
| build a DOM tree or emit SAX events without re-implementing the DOM or |
| SAX code. The following diagram shows this separation. Notice how the |
| document information flows through the pipeline in the parser |
| configuration and then to the parser object which generates different |
| APIs. |
| </p> |
| <p> |
| <img alt='Configuration and Parser Separation' src='xni-parser-configuration.gif'/> |
| </p> |
| </s2> |
| </s1> |