| <?xml version="1.0" standalone="no"?> |
| <!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd"> |
| |
| <faqs title="Common Problems"> |
| <faq title="Parsing HTML Generated an Error."> |
| <q>I tried to use &javaparsername; to parse an HTML file and it generated an error. What did I do wrong?</q> |
| <a><p>Unfortunately, HTML does not, in general, follow the XML grammar rules. |
| Most HTML files do not meet the XML style quidelines. Therefore, the XML parser |
| generates XML well-formedness errors.</p> |
| <p>Typical errors include:</p> |
| <ul> |
| <li>Missing end tags, e.g. <P> with no </P> (end tags are not required in HTML)</li> |
| <li>Missing closing slash on <IMG HREF="foo" <em>/</em>> (not required in HTML)</li> |
| <li>Missing quotes on attribute values, e.g. <IMG width="600"> (not generally required in HTML)</li> |
| </ul> |
| <p>HTML must match the XHTML standard for well-formedness before it |
| can be parsed by Xerces-J or any other XML parser. You can find the |
| <jump href="http://www.w3c.org/TR/1999/PR-xhtml1-19991210">XHTML standard</jump> |
| on the <jump href="http://www.w3c.org">W3C web site</jump>.</p> |
| </a> |
| </faq> |
| <faq title="UTF-8 Character Error"> |
| <q>I get an "invalid UTF-8 character" error.</q> |
| <a><p>There are many Unicode characters that are not allowed in an XML document, |
| according to the XML spec. Typical disallowed characters are control |
| characters, even if you escape them using the Character Reference form: |
| &#xxxx; . See the XML spec, sections |
| <jump href="http://www.w3.org/TR/REC-xml#charsets">2.2</jump> and |
| <jump href="http://www.w3.org/TR/REC-xml#sec-references">4.1</jump> |
| for details. If the parser is generating this error, it is very likely |
| that there is a character in the file that you can not see. |
| You can generally use a UNIX command like "od -hc" to find it.</p> |
| |
| </a> |
| </faq> |
| <faq title="Error Accessing EBCDIC XML Files"> |
| <q>I get an error when I access EBCDIC XML files, what is happening?</q> |
| <a><p>If an XML document/file is not UTF-8, then you MUST specify the encoding. |
| When transcoding a UTF8 document to EBCDIC, remember to change this:</p> |
| <ul> |
| <li><?xml version="1.0" encoding="UTF-8"?> |
| <br/>to something like this: |
| <br/><?xml version="1.0" encoding="ebcdic-cp-us"?> |
| </li> |
| </ul> |
| </a> |
| </faq> |
| <faq title="EOF Character Error"> |
| <q>I get an error on the EOF character (0x1A) -- what is happening?</q> |
| <a><p>You are probably using the <em>LPEX</em> |
| editor, which automatically inserts an End-of-file character (0x1A) at the end of your |
| XML document (other editors might do this as well). Unfortunately, the |
| EOF character (0x1A) is an illegal character according to the XML specification, |
| and &javaparsername; correctly generates an error.</p> |
| </a> |
| </faq> |
| <faq title='DOS Filenames No Longer Work'> |
| <q>I used to be able to use DOS filenames with the parser and now |
| they don't work. Why not?</q> |
| <a><p>DOS filenames are not legal URIs as required by the XML 1.0 |
| specification. Therefore, it was an error for the parser to accept |
| DOS filenames. This bug is now fixed.</p> |
| <p>DOS filenames can be converted to legal URIs, however. For |
| example, the DOS filename "c:\xerces\data\personal.xml" would become |
| "file:///c:/xerces/data/personal.xml", which is a legal URI.</p> |
| </a> |
| </faq> |
| </faqs> |