| <?xml version="1.0" encoding="UTF-8" standalone="no"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!-- $Id$ --> |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.3//EN" "http://forrest.apache.org/dtd/document-v13.dtd"> |
| <document> |
| <header> |
| <title>Apache™ FOP Design: Input Parsing</title> |
| <version>$Revision$</version> |
| </header> |
| <body> |
| <section id="intro"> |
| <title>Introduction</title> |
| <p>Parsing is the process of reading the XSL-FO input and making the information in it available to Apache™ FOP.</p> |
| </section> |
| <section id="input"> |
| <title>SAX for Input</title> |
| <p>The two standard ways of dealing with XML input are SAX and DOM. |
| SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document. |
| DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole. |
| One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input. |
| (DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).</p> |
| <p>Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM. |
| Instead, FOP takes SAX events and builds its own tree-like structure. Why?</p> |
| <ul> |
| <li>DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.</li> |
| <li>DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.</li> |
| </ul> |
| <p>See the <link href="../../trunk/embedding.html#input">Input Section of the User Embedding Document</link> for a discussion of input usage patterns and some implementation details.</p> |
| <p>FOP's <link href="fotree.html">FO Tree Mechanism</link> is responsible for catching the SAX events and processing them.</p> |
| </section> |
| <section id="validation"> |
| <title>Validation</title> |
| <p>If the input XML is not well-formed, that will be reported.</p> |
| <p>There is no DTD for XSL-FO, so no formal validation is possible at the parser level.</p> |
| <p>The SAX handler will report an error for unrecognized <link href="#namespaces">namespaces</link>.</p> |
| </section> |
| <section id="namespaces"> |
| <title>Namespaces</title> |
| <p>To allow for extensions to the XSL-FO language, FOP provides a mechanism for handling foreign namespaces.</p> |
| <p>See <link href="../../trunk/extensions.html">User Extensions</link> for a discussion of standard extensions shipped with FOP, and their related namespaces.</p> |
| <p>See <link href="../extensions.html">Developer Extensions</link> for a discussion of the mechanisms in place to allow developers to add their own extensions, including how to tell FOP about the foreign namespace.</p> |
| </section> |
| <section id="status"> |
| <title>Status</title> |
| <section id="status-todo"> |
| <title>To Do</title> |
| </section> |
| <section id="status-wip"> |
| <title>Work In Progress</title> |
| </section> |
| <section id="status-complete"> |
| <title>Completed</title> |
| <ul> |
| <li>better handling of unknown xml and xml from an unknown namespace</li> |
| <li>Changed extensions to allow for external xml</li> |
| <li>Can have a default element mapping for extensions</li> |
| </ul> |
| </section> |
| </section> |
| </body> |
| </document> |