| <?xml version="1.0" standalone="no"?> |
| <!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd"> |
| |
| <faqs title="Writing Application FAQs"> |
| <faq title="Creating a DOM Parser"> |
| <q>How do I create a DOM parser?</q> |
| <a> |
| <source>import org.apache.xerces.parsers.DOMParser; |
| import org.w3c.dom.Document; |
| import org.xml.sax.SAXException; |
| import java.io.IOException; |
| |
| ... |
| |
| String xmlFile = "file:///&javaparserdirectory;/data/personal.xml"; |
| |
| DOMParser parser = new DOMParser(); |
| |
| try { |
| parser.parse(xmlFile); |
| |
| } catch (SAXException se) { |
| se.printStackTrace(); |
| } catch (IOException ioe) { |
| ioe.printStackTrace(); |
| } |
| |
| Document document = parser.getDocument();</source> |
| </a> |
| </faq> |
| |
| <faq title="Creating a SAX Parser"> |
| <q>How do I create a SAX parser?</q> |
| <a> |
| <source>import org.apache.xerces.parsers.SAXParser; |
| import org.xml.sax.Parser; |
| import org.xml.sax.ParserFactory; |
| import org.xml.sax.SAXException; |
| import java.io.IOException; |
| |
| ... |
| |
| String xmlFile = "file:///&javaparserdirectory;/data/personal.xml"; |
| |
| String parserClass = "org.apache.xerces.parsers.SAXParser"; |
| Parser parser = ParserFactory.makeParser(parserClass); |
| |
| try { |
| parser.parse(xmlFile); |
| } catch (SAXException se) { |
| se.printStackTrace(); |
| } catch (IOException ioe) { |
| ioe.printStackTrace(); |
| }</source> |
| </a> |
| </faq> |
| |
| <faq title="Controlling parser options"> |
| <q>How do I control the various parser options?</q> |
| <a><p>For this release, all of the parser control API's have |
| been switched over to the SAX2 Configurable interface. This |
| provide a uniform and extensible mechanism for setting and |
| querying parser options. Here are guides to the set of |
| available <link idref="features">features</link> and |
| <link idref="properties">properties</link>.</p> |
| </a> |
| </faq> |
| <faq title="Using lazy DOM"> |
| <q>How do I use the lazy evaluating DOM implementation?</q> |
| <a><p>The DOM parser class <code>org.apache.xerces.parsers.DOMParser</code> uses a |
| DOM implementation that can take advantage of lazy evaluation to |
| improve performance. There is also a mode where the parser creates |
| all of the nodes as the document is parsed. By default, the parser |
| uses the lazy evaluation DOM implementation.</p> |
| <p>Nodes in the DOM tree are only created when they are accessed. |
| The initial call to <code>getDocument()</code> will return a DOM tree that |
| consists only of the Document node. All of the immediate children |
| of a Node are created when any of that Node's children are accessed. |
| This shortens the time it takes to parse an XML file and create a DOM |
| tree at the expense of requiring more memory during parsing and |
| traversing the document.</p> |
| <p>The lazy evaluation feature is set using the SAX2 Configurable |
| interface. To turn off this feature, use the following code:</p> |
| <source>DOMParser parser = new DOMParser(); |
| parser.setFeature("http://apache.org/xml/features/dom/defer-node-expansion", false);</source> |
| <p>To turn the lazy evaluation feature back on, use the following code:</p> |
| |
| <source>parser.setFeature("http://apache.org/xml/features/dom/defer-node-expansion", true);</source> |
| </a> |
| </faq> |
| |
| <faq title="Handling Errors"> |
| <q>How do handle errors?</q> |
| <a><p>When you create a parser instance, the default error handler does nothing. |
| This means that your program will fail silently when it encounters an error. |
| You should register an error handler with the parser by supplying a class |
| which implements the <code>org.xml.sax.ErrorHandler</code> |
| interface. This is true regardless of whether your parser is a |
| DOM based or SAX based parser.</p> |
| </a> |
| </faq> |
| |
| <faq title="Controlling Entity Representation"> |
| <q>How can I control the way that entities are represented in |
| the DOM?</q> |
| <a><p> |
| The feature <code>http://apache.org/xml/features/dom/create-entity-ref-nodes</code> |
| controls how entities appear in the DOM tree. When this |
| feature is set to true (the default), an occurance of an |
| entity reference in the XML document will be represented by a |
| subtree with an EntityReference node at the root whose |
| children represent the entity expansion.</p> |
| |
| <p>If the property is false, an entity reference in the XML |
| document is represented by only the nodes that represent the |
| entity expansion.</p> |
| |
| <p>In either case, the entity expansion will be a DOM tree |
| representing the structure of the entity expansion, not a text |
| node containing the entity expansion as text.</p> |
| </a> |
| </faq> |
| |
| <faq title="What does "non-validating" mean?"> |
| <q>Why does "non-validating" not mean "well-formedness checking only"?</q> |
| <a><p>Using a "non-validating" parser does not mean that only well-formedness |
| checking is done! There are still many things that the XML specification |
| requires of the parser, including entity substitution, defaulting of |
| attribute values, and attribute normalization.</p> |
| <p>This table describes what "non-validating" really means for &javaparsername; parsers. |
| In this table, "no DTD" means no internal or external DTD subset is present.</p> |
| |
| <table> |
| <tr><tn/> |
| <th colspan="2">non-validating parsers</th> |
| <th colspan="2">validating parsers</th> |
| </tr> |
| <tr><tn/> <th>DTD present</th> <th>no DTD</th> <th>DTD present</th> <th>no DTD</th></tr> |
| <tr><th>DTD is read</th> <td>Yes</td> <td>No</td> <td>Yes</td> <td>Error</td></tr> |
| <tr><th>entity substitution</th> <td>Yes</td> <td>No</td> <td>Yest</td> <td>Error</td></tr> |
| <tr><th>defaulting of attributes</th> <td>Yes</td> <td>No</td> <td>Yes</td> <td>Error</td></tr> |
| <tr><th>attribute normalization</th> <td>Yes</td> <td>No</td> <td>Yes</td> <td>Error</td></tr> |
| <tr><th>checking against model</th> <td>No</td> <td>No</td> <td>Yes</td> <td>Error</td></tr> |
| </table> |
| </a> |
| </faq> |
| |
| <faq title="Associating Data with a Node"> |
| <q>How do I associate my own data with a node in the DOM tree?</q> |
| <a><p>The class <code>org.apache.xerces.dom.NodeImpl</code> provides a |
| <code>void setUserData(Object o)</code> and an <code>Object getUserData()</code> |
| method that you can use to attach any object to a node in the DOM tree.</p> |
| <p>Beware that you should try and remove references to your |
| data on nodes you no longer use (by calling |
| <code>setUserData(null)</code>, or these nodes will not be |
| garbage collected until the whole document is.</p> |
| </a> |
| </faq> |
| |
| <faq title="Parsing Several Documents"> |
| <q>How do I more efficiently parse several documents sharing a common DTD?</q> |
| <a> <p>DTDs are not currently cached by the parser. The common DTD, since it is |
| specified in each XML document, will be re-parsed once for each document.</p> |
| <p>However, there are things that you can do now, to make the process of |
| reading DTD's more efficient:</p> |
| <ul> |
| <li>keep your DTD and DTD references local</li> |
| <li>use internal DTD subsets, if possible</li> |
| <li>load files from server to local client before parsing</li> |
| <li>Cache document files into a local client cache. You should do an |
| HTTP header request to check whether the document has changed, |
| before accessing it over the network.</li> |
| <li>Do not reference an external DTD or internal DTD subset at all. |
| In this case, no DTD will be read.</li> |
| <li>Use a custom <code>EntityResolver</code> and keep common |
| DTDs in a memory buffer.</li> |
| </ul> |
| </a> |
| </faq> |
| |
| <faq title="How do I access the DOM Level 2 functionality?"> |
| <q>How do I access the DOM Level 2 functionality?</q> |
| <a><p>The <jump href="http://www.w3.org/TR/DOM-Level-2/">DOM Level 2</jump> |
| specification is at the stage of |
| "Candidate Recommendation" (CR), which allows feedback from implementors |
| before it becomes a "Recommedation". It is comprised of "core" |
| functionality, which is mainly the DOM |
| <jump href="http://www.w3.org/TR/REC-xml-names/">Namespaces</jump> implementation, |
| and a number of optional modules (called Chapters in the spec).</p> |
| <p>Please refer to:</p> |
| <p><jump href="http://www.w3.org/TR/DOM-Level-2/"> |
| http://www.w3.org/TR/DOM-Level-2/</jump> for the |
| latest DOM Level 2 specification.</p> |
| <p>The following DOM Level 2 modules are fully implemented in &javaparsername;: </p> |
| <ul> |
| <li><jump href="http://www.w3.org/TR/DOM-Level-2/core.html"> |
| Chapter 1: Core</jump> - most of these enhancements are for |
| Namespaces, and can be acessed through additional functions which |
| have been added directly to the <ref>org.w3c.dom.*</ref> classes.</li> |
| |
| <li><jump href="http://www.w3.org/TR/DOM-Level-2/events.html"> |
| Chapter 6: Events</jump> - The <ref>org.w3c.dom.events.EventTarget</ref> |
| interface is implemented by all <code>Nodes</code> of the DOM. |
| The &javaparsername; DOM implementation handles all of the event |
| triggering, capture and flow.</li> |
| |
| <li><jump href="http://www.w3.org/TR/DOM-Level-2/traversal.html"> |
| Chapter 7: Traversal</jump> - The Traversal module interfaces |
| are located in <ref>org.w3c.dom.traversal</ref>. |
| The <code>NodeIterator</code> and <code>TreeWalker</code>, and |
| <code>NodeFilter</code> interfaces have been supplied to allow |
| traversal of the DOM at a higher-level. Our DOM Document |
| implementation class, <code>DocumentImpl</code> class now |
| implements <code>DocumentTraversal</code>, which supplies the |
| factory methods to create the iterators and treewalkers.</li> |
| |
| <li><jump href="http://www.w3.org/TR/DOM-Level-2/range.html"> |
| Chapter 8. Range</jump> - The Range module interfaces are |
| located in <ref>org.w3c.dom.range</ref>. The Range interface |
| allows you to specify ranges or selections using boundary |
| points in the DOM, along with functions (like delete, |
| clone, extract..) that can be performed on these ranges. |
| Our DOM Document implementation class, <code>DocumentImpl</code> |
| class now implements <code>DocumentRange</code>, that supplies |
| the factory method to create a <code>Range</code>.</li> |
| </ul> |
| <note>Since the DOM Level 2 is still in the CR phase, some changes |
| to these specs are still possible. The purpose of this phase is to |
| provide feedback to the W3C, so that the specs can be clarified and |
| implementation concerns can be addressed.</note> |
| </a> |
| </faq> |
| |
| <faq title="How do I read data from a stream as it arrives?"> |
| <q>How do I read data from a stream as it arrives?</q> |
| <a><p>For performance reasons, all the standard Xerces processing |
| uses readers which buffer the input. In order to read data |
| from a stream as it arrives, you need to instruct Xerces to |
| use the <code>StreamingCharReader</code> class as its reader. |
| To do this, create a subclass of |
| <code>org.apache.xerces.readers.DefaultReaderFactory</code> |
| and override <code>createCharReader</code> and |
| <code>createUTF8Reader</code> as shown below. |
| </p> |
| <source>public class StreamingCharFactory extends org.apache.xerces.readers.DefaultReaderFactory { |
| public XMLEntityHandler.EntityReader createCharReader(XMLEntityHandler entityHandler, |
| XMLErrorReporter errorReporter, |
| boolean sendCharDataAsCharArray, |
| Reader reader, |
| StringPool stringPool) throws Exception |
| { |
| return new org.apache.xerces.readers.StreamingCharReader(entityHandler, |
| errorReporter, sendCharDataAsCharArray, reader, stringPool); |
| } |
| |
| public XMLEntityHandler.EntityReader createUTF8Reader(XMLEntityHandler entityHandler, |
| XMLErrorReporter errorReporter, |
| boolean sendCharDataAsCharArray, |
| InputStream data, |
| StringPool stringPool) throws Exception |
| { |
| XMLEntityHandler.EntityReader reader; |
| reader = new org.apache.xerces.readers.StreamingCharReader(entityHandler, |
| errorReporter, sendCharDataAsCharArray, |
| new InputStreamReader(data, "UTF8"), stringPool); |
| return reader; |
| } |
| |
| }</source> |
| <p>In your program, after you instantiate a parser class, replace |
| the <code>DefaultReaderFactory</code> with <code>StreamingCharFactory</code>, and be |
| sure to wrap the <code>InputStream</code> that you are reading |
| from with an <code>InputStreamReader</code>.</p> |
| <source>InputStream in = ... ; |
| SAXParser p = new SAXParser(); |
| DocumentHandler h = ... ; |
| // set the correct reader factory |
| p.setReaderFactory(((StreamingSAXClient)h).new StreamingCharFactory()); |
| p.setDocumentHandler(h); |
| |
| // be sure to wrap the input stream in an InputStreamReader. |
| p.parse(new InputSource(new InputStreamReader(in)));</source> |
| </a> |
| </faq> |
| </faqs> |