| <?xml version='1.0' encoding='UTF-8'?> |
| <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'> |
| <faqs title='Writing Application FAQs'> |
| <faq title='Creating a DOM Parser'> |
| <q>How do I create a DOM parser?</q> |
| <a> |
| <p> |
| You can create a DOM parser by using the Java APIs for |
| XML Processing (JAXP). The following source code shows |
| how: |
| </p> |
| <source>import java.io.IOException; |
| import javax.xml.parsers.DocumentBuilder; |
| import javax.xml.parsers.DocumentBuilderFactory; |
| import javax.xml.parsers.FactoryConfigurationError; |
| import javax.xml.parsers.ParserConfigurationException; |
| import org.w3c.dom.Document; |
| import org.xml.sax.SAXException; |
| |
| ... |
| |
| String xmlFile = "file:///&parserdir;/data/personal.xml"; |
| try { |
| DocumentBuilderFactory factory = |
| DocumentBuilderFactory.newInstance(); |
| DocumentBuilder builder = factory.newDocumentBuilder(); |
| Document document = builder.parse(xmlFile); |
| } |
| catch (FactoryConfigurationError e) { |
| // unable to get a document builder factory |
| } |
| catch (ParserConfigurationException e) { |
| // parser was unable to be configured |
| catch (SAXException e) { |
| // parsing error |
| } |
| catch (IOException e) { |
| // i/o error |
| }</source> |
| </a> |
| </faq> |
| <faq title="Creating a SAX Parser"> |
| <q>How do I create a SAX parser?</q> |
| <a> |
| <p> |
| You can create a SAX parser by using the Java APIs for |
| XML Processing (JAXP). The following source code shows |
| how: |
| </p> |
| <source> |
| import java.io.IOException; |
| import javax.xml.parsers.FactoryConfigurationError; |
| import javax.xml.parsers.ParserConfigurationException; |
| import javax.xml.parsers.SAXParser; |
| import javax.xml.parsers.SAXParserFactory; |
| import org.xml.sax.SAXException; |
| import org.xml.sax.helpers.DefaultHandler; |
| |
| ... |
| |
| String xmlFile = "file:///&parserdir;/data/personal.xml"; |
| try { |
| SAXParserFactory factory = SAXParserFactory.newInstance(); |
| SAXParser parser = factory.newSAXParser(); |
| DefaultHandler handler = /* custom handler class */; |
| parser.parse(xmlFile, handler); |
| } |
| catch (FactoryConfigurationError e) { |
| // unable to get a document builder factory |
| } |
| catch (ParserConfigurationException e) { |
| // parser was unable to be configured |
| catch (SAXException e) { |
| // parsing error |
| } |
| catch (IOException e) { |
| // i/o error |
| }</source> |
| </a> |
| </faq> |
| <!-- |
| - REVISIT: make sure that JAXP implementation can handle |
| - passing features and properties through to the |
| - parser implementation. Then complete this section. |
| <faq title='Controlling parser options'> |
| <q>How do I control the various parser options?</q> |
| <a>TBD</a> |
| </faq> |
| --> |
| <faq title='Handling Errors'> |
| <q>How do handle errors?</q> |
| <a> |
| <p> |
| You should register an error handler with the parser by supplying |
| a class which implements the <code>org.xml.sax.ErrorHandler</code> |
| interface. This is true regardless of whether your parser is a |
| DOM based or SAX based parser. |
| </p> |
| <p> |
| You can register an error handler on a <code>DocumentBuilder</code> |
| created using JAXP like this: |
| </p> |
| <source>import javax.xml.parsers.DocumentBuilder; |
| import org.xml.sax.ErrorHandler; |
| import org.xml.sax.SAXException; |
| import org.xml.sax.SAXParseException; |
| |
| ErrorHandler handler = new ErrorHandler() { |
| public void warning(SAXParseException e) throws SAXException { |
| System.err.println("[warning] "+e.getMessage()); |
| } |
| public void error(SAXParseException e) throws SAXException { |
| System.err.println("[error] "+e.getMessage()); |
| } |
| public void fatalError(SAXParseException e) throws SAXException { |
| System.err.println("[fatal error] "+e.getMessage()); |
| throw e; |
| } |
| }; |
| |
| DocumentBuilder builder = /* builder instance */; |
| builder.setErrorHandler(handler);</source> |
| <p> |
| You can also register an error handler on a SAXParser using JAXP |
| like this: |
| </p> |
| <source>import javax.xml.parsers.SAXParser; |
| import org.xml.sax.ErrorHandler; |
| import org.xml.sax.SAXException; |
| import org.xml.sax.SAXParseException; |
| |
| ErrorHandler handler = new ErrorHandler() { |
| public void warning(SAXParseException e) throws SAXException { |
| System.err.println("[warning] "+e.getMessage()); |
| } |
| public void error(SAXParseException e) throws SAXException { |
| System.err.println("[error] "+e.getMessage()); |
| } |
| public void fatalError(SAXParseException e) throws SAXException { |
| System.err.println("[fatal error] "+e.getMessage()); |
| throw e; |
| } |
| }; |
| |
| SAXParser parser = /* parser instance */; |
| parser.getXMLReader().setErrorHandler(handler);</source> |
| </a> |
| </faq> |
| <faq title='Controlling Entity Representation'> |
| <q> |
| How can I control the way that entities are represented in the DOM? |
| </q> |
| <a> |
| <p> |
| The feature |
| <code>http://apache.org/xml/features/dom/create-entity-ref-nodes</code> |
| controls how entities appear in the DOM tree. When this feature |
| is set to true (the default), an occurance of an entity reference |
| in the XML document will be represented by a subtree with an |
| EntityReference node at the root whose children represent the |
| entity expansion. |
| </p> |
| <p> |
| If the property is false, an entity reference in the XML document |
| is represented by only the nodes that represent the entity |
| expansion. |
| </p> |
| <p> |
| In either case, the entity expansion will be a DOM tree |
| representing the structure of the entity expansion, not a text |
| node containing the entity expansion as text. |
| </p> |
| </a> |
| </faq> |
| <faq title='What does "non-validating" mean?'> |
| <q> |
| Why does "non-validating" not mean "well-formedness |
| checking only"? |
| </q> |
| <a> |
| <p> |
| Using a "non-validating" parser does not mean that |
| only well-formedness checking is done! There are still many |
| things that the XML specification requires of the parser, |
| including entity substitution, defaulting of attribute values, |
| and attribute normalization. |
| </p> |
| <p> |
| This table describes what "non-validating" really |
| means for &ParserName; parsers. In this table, "no DTD" |
| means no internal or external DTD subset is present. |
| </p> |
| <table> |
| <tr> |
| <tn/> |
| <th colspan="2">non-validating parsers</th> |
| <th colspan="2">validating parsers</th> |
| </tr> |
| <tr> |
| <tn/> |
| <th>DTD present</th> |
| <th>no DTD</th> |
| <th>DTD present</th> |
| <th>no DTD</th> |
| </tr> |
| <tr> |
| <th>DTD is read</th> |
| <td>Yes</td> |
| <td>No</td> |
| <td>Yes</td> |
| <td>Error</td> |
| </tr> |
| <tr> |
| <th>entity substitution</th> |
| <td>Yes</td> |
| <td>No</td> |
| <td>Yes</td> |
| <td>Error</td> |
| </tr> |
| <tr> |
| <th>defaulting of attributes</th> |
| <td>Yes</td> |
| <td>No</td> |
| <td>Yes</td> |
| <td>Error</td> |
| </tr> |
| <tr> |
| <th>attribute normalization</th> |
| <td>Yes</td> |
| <td>No</td> |
| <td>Yes</td> |
| <td>Error</td> |
| </tr> |
| <tr> |
| <th>checking against model</th> |
| <td>No</td> |
| <td>No</td> |
| <td>Yes</td> |
| <td>Error</td> |
| </tr> |
| </table> |
| </a> |
| </faq> |
| <faq title='Associating Data with a Node'> |
| <q>How do I associate my own data with a node in the DOM tree?</q> |
| <a> |
| <p> |
| The class <code>org.apache.xerces.dom.NodeImpl</code> provides a |
| <code>void setUserData(Object o)</code> and an <code>Object |
| getUserData()</code> method that you can use to attach any object |
| to a node in the DOM tree. |
| </p> |
| <p> |
| Beware that you should try and remove references to your data on |
| nodes you no longer use (by calling <code>setUserData(null)</code>, |
| or these nodes will not be garbage collected until the entire |
| document is garbage collected. |
| </p> |
| </a> |
| </faq> |
| <faq title='Parsing Several Documents'> |
| <q> |
| How do I more efficiently parse several documents sharing a |
| common DTD? |
| </q> |
| <a> |
| <p> |
| DTDs are not currently cached by the parser. The common DTD, |
| since it is specified in each XML document, will be re-parsed |
| once for each document. |
| </p> |
| <p> |
| However, there are things that you can do now, to make the |
| process of reading DTD's more efficient: |
| </p> |
| <ul> |
| <li>keep your DTD and DTD references local</li> |
| <li>use internal DTD subsets, if possible</li> |
| <li>load files from server to local client before parsing</li> |
| <li> |
| Cache document files into a local client cache. You should do an |
| HTTP header request to check whether the document has changed, |
| before accessing it over the network. |
| </li> |
| <li> |
| Do not reference an external DTD or internal DTD subset at all. |
| In this case, no DTD will be read. |
| </li> |
| <li> |
| Use a custom <code>EntityResolver</code> and keep common |
| DTDs in a memory buffer. |
| </li> |
| </ul> |
| </a> |
| </faq> |
| <!-- |
| - REVISIT: Rewrite this section with better information. |
| <faq title='How do I read data from a stream as it arrives?'> |
| <q>How do I read data from a stream as it arrives?</q> |
| <a> |
| <p>There are 2 problems you have to deal with:</p> |
| <ol> |
| <li> |
| The Apache parsers terminate when they reach end-of-file; with |
| a data stream, unless the sender drops the socket, you have no |
| end-of-file, so you need to terminate in some other way |
| </li> |
| <li> |
| The Apache parsers close the input stream on termination, and |
| this closes the socket; you normally don't want this, because |
| you'll want to send an ack to the data stream source, and you |
| may want to have further exchanges on the socket anyway. |
| </li> |
| </ol> |
| <p>Terminating the parse</p> |
| <p> |
| One way that works forSAX is to throw an exception when you |
| detect the logical end-of-document. |
| </p> |
| <p> |
| For instance, in your class extending DefaultHandler, you can |
| have: |
| </p> |
| <source>public class DocProcessor extends DefaultHandler { |
| private int level; |
| . |
| . |
| public void startElement(String uri, |
| String localName, |
| String raw, |
| Attributes attrs) throws SAXException |
| { |
| ++level; |
| } |
| |
| public void endElement (String namespaceURI, |
| String localName, |
| String qName) throws SAXException |
| { |
| level = level - 1; |
| if (level == 0) { |
| throw new SAXException ("Finished"); |
| } |
| }</source> |
| <p>Preventing the parser from closing the socket</p> |
| <p> |
| One way is to subclass BufferedReader to provide an empty close |
| method. So, invoke the parser as follows: |
| </p> |
| <source>Socket socket; |
| |
| // code to set the socket |
| |
| parser.parse(new InputSource(new MyBufferedReader(new InputStreamReader(socket.getInputStream())))); |
| . |
| . |
| class MyBufferedReader extends BufferedReader |
| { |
| public MyBufferedReader(InputStreamReader i) { |
| super(i); |
| } |
| |
| public void close() { |
| } |
| }</source> |
| </a> |
| </faq> |
| --> |
| </faqs> |