| <?xml version='1.0' encoding='UTF-8'?> |
| <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'> |
| <faqs title='Common Problems FAQs'> |
| <faq title='Parsing HTML Generated an Error.'> |
| <q> |
| I tried to use &ParserName; to parse an HTML file and it |
| generated an error. What did I do wrong? |
| </q> |
| <a> |
| <p> |
| Unfortunately, HTML does not, in general, follow the XML |
| grammar rules. Most HTML files do not meet the XML style |
| quidelines. Therefore, the XML parser generates XML |
| well-formedness errors. |
| </p> |
| <p>Typical errors include:</p> |
| <ul> |
| <li> |
| Missing end tags, e.g. <P> with no </P> (end |
| tags are not required in HTML) |
| </li> |
| <li> |
| Missing closing slash on <IMG HREF="foo" <em>/</em>> |
| (not required in HTML) |
| </li> |
| <li> |
| Missing quotes on attribute values, e.g. <IMG width="600"> |
| (not generally required in HTML) |
| </li> |
| </ul> |
| <p> |
| HTML must match the XHTML standard for well-formedness before it |
| can be parsed by &ParserName; or any other XML parser. You can |
| find the |
| <jump href="http://www.w3c.org/TR/1999/PR-xhtml1-19991210">XHTML |
| standard</jump> on the |
| <jump href="http://www.w3c.org">W3C web site</jump>. |
| </p> |
| </a> |
| </faq> |
| <faq title='UTF-8 Character Error'> |
| <q>I get an "invalid UTF-8 character" error.</q> |
| <a> |
| <p> |
| There are many Unicode characters that are not allowed in an |
| XML document, according to the XML spec. Typical disallowed |
| characters are control characters, even if you escape them |
| using the Character Reference form: &#xxxx; . See the XML |
| spec, sections |
| <jump href="http://www.w3.org/TR/REC-xml#charsets">2.2</jump> |
| and |
| <jump href="http://www.w3.org/TR/REC-xml#sec-references">4.1</jump> |
| for details. If the parser is generating this error, it is very |
| likely that there is a character in the file that you can not see. |
| You can generally use a UNIX command like "od -hc" to |
| find it. |
| </p> |
| </a> |
| </faq> |
| <faq title='Error Accessing EBCDIC XML Files'> |
| <q> |
| I get an error when I access EBCDIC XML files, what is happening? |
| </q> |
| <a> |
| <p> |
| If an XML document/file is not UTF-8, then you MUST specify the |
| encoding. When transcoding a UTF8 document to EBCDIC, remember |
| to change this: |
| </p> |
| <ul> |
| <li> |
| <?xml version="1.0" encoding="UTF-8"?> |
| <br/> |
| to something like this: |
| <br/> |
| <?xml version="1.0" encoding="ebcdic-cp-us"?> |
| </li> |
| </ul> |
| </a> |
| </faq> |
| <faq title='EOF Character Error'> |
| <q> |
| I get an error on the EOF character (0x1A) -- what is happening? |
| </q> |
| <a> |
| <p> |
| You are probably using the <em>LPEX</em> editor, which |
| automatically inserts an End-of-file character (0x1A) at the end |
| of your XML document (other editors might do this as well). |
| Unfortunately, the EOF character (0x1A) is an illegal character |
| according to the XML specification, and &ParserName; |
| correctly generates an error. |
| </p> |
| </a> |
| </faq> |
| </faqs> |