docs/faq-common.xml - xerces2-j - Git at Google

 <?xml version="1.0" standalone="no"?>
 <!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd">

 <faqs title="Common Problems">
 	<faq title="Parsing HTML Generated an Error.">
 		<q>I tried to use &javaparsername; to parse an HTML file and it generated an error. What did I do wrong?</q>
 		<a><p>Unfortunately, HTML does not, in general, follow the XML grammar rules.
               Most HTML files do not meet the XML style quidelines. Therefore, the XML parser
 			  generates XML well-formedness errors.</p>
 			  <p>Typical errors include:</p>
               <ul>
                 <li>Missing end tags, e.g. &lt;P&gt; with no &lt;/P&gt; (end tags are not required in HTML)</li>
                 <li>Missing closing slash on &lt;IMG HREF="foo" <em>/</em>&gt; (not required in HTML)</li>
                 <li>Missing quotes on attribute values, e.g. &lt;IMG width="600"&gt; (not generally required in HTML)</li>
               </ul>
 			 <p>HTML must match the XHTML standard for well-formedness before it
 			 can be parsed by Xerces-J or any other XML parser. You can find the
 			 <jump href="http://www.w3c.org/TR/1999/PR-xhtml1-19991210">XHTML standard</jump>
 			 on the <jump href="http://www.w3c.org">W3C web site</jump>.</p>
 		</a>
 	</faq>
 	<faq title="UTF-8 Character Error">
 		<q>I get an &quot;invalid UTF-8 character&quot; error.</q>
 		<a><p>There are many Unicode characters that are not allowed in an XML document,
               according to the XML spec. Typical disallowed characters are control
 			  characters, even if you escape them using the Character Reference form:
 			  &amp;#xxxx; . See the XML spec, sections
 			  <jump href="http://www.w3.org/TR/REC-xml#charsets">2.2</jump> and
 			  <jump href="http://www.w3.org/TR/REC-xml#sec-references">4.1</jump>
 			  for details. If the parser is generating this error, it is very likely
 			  that there is a character in the file that you can not see.
               You can generally use a UNIX command like &quot;od -hc&quot; to find it.</p>

 		</a>
 	</faq>
 	<faq title="Error Accessing EBCDIC XML Files">
 		<q>I get an error when I access EBCDIC XML files, what is happening?</q>
 		<a><p>If an XML document/file is not UTF-8, then you MUST specify the encoding.
               When transcoding a UTF8 document to EBCDIC, remember to change this:</p>
               <ul>
                 <li>&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
 				<br/>to something like this:
 				<br/>&lt;?xml version=&quot;1.0&quot; encoding=&quot;ebcdic-cp-us&quot;?&gt;
 				</li>
               </ul>
 		</a>
 	</faq>
 	<faq title="EOF Character Error">
 		<q>I get an error on the EOF character (0x1A) -- what is happening?</q>
 		<a><p>You are probably using the <em>LPEX</em>
 		editor, which automatically inserts an End-of-file character (0x1A) at the end of your
 		XML document (other editors might do this as well). Unfortunately, the
 		EOF character (0x1A) is an illegal character according to the XML specification,
 		and &javaparsername; correctly generates an error.</p>
 		</a>
 	</faq>
 	<faq title='DOS Filenames No Longer Work'>
 		<q>I used to be able to use DOS filenames with the parser and now
 		they don't work. Why not?</q>
 		<a><p>DOS filenames are not legal URIs as required by the XML 1.0
 		specification. Therefore, it was an error for the parser to accept
 		DOS filenames. This bug is now fixed.</p>
 		<p>DOS filenames can be converted to legal URIs, however. For
 		example, the DOS filename "c:\xerces\data\personal.xml" would become
 		"file:///c:/xerces/data/personal.xml", which is a legal URI.</p>
 		</a>
 	</faq>
 </faqs>
	<?xml version="1.0" standalone="no"?>
	<!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd">

	<faqs title="Common Problems">
	<faq title="Parsing HTML Generated an Error.">
	<q>I tried to use &javaparsername; to parse an HTML file and it generated an error. What did I do wrong?</q>
	<a><p>Unfortunately, HTML does not, in general, follow the XML grammar rules.
	Most HTML files do not meet the XML style quidelines. Therefore, the XML parser
	generates XML well-formedness errors.</p>
	<p>Typical errors include:</p>
	<ul>
	<li>Missing end tags, e.g. <P> with no </P> (end tags are not required in HTML)</li>
	<li>Missing closing slash on <IMG HREF="foo" <em>/</em>> (not required in HTML)</li>
	<li>Missing quotes on attribute values, e.g. <IMG width="600"> (not generally required in HTML)</li>
	</ul>
	<p>HTML must match the XHTML standard for well-formedness before it
	can be parsed by Xerces-J or any other XML parser. You can find the
	<jump href="http://www.w3c.org/TR/1999/PR-xhtml1-19991210">XHTML standard</jump>
	on the <jump href="http://www.w3c.org">W3C web site</jump>.</p>
	</a>
	</faq>
	<faq title="UTF-8 Character Error">
	<q>I get an "invalid UTF-8 character" error.</q>
	<a><p>There are many Unicode characters that are not allowed in an XML document,
	according to the XML spec. Typical disallowed characters are control
	characters, even if you escape them using the Character Reference form:
	&#xxxx; . See the XML spec, sections
	<jump href="http://www.w3.org/TR/REC-xml#charsets">2.2</jump> and
	<jump href="http://www.w3.org/TR/REC-xml#sec-references">4.1</jump>
	for details. If the parser is generating this error, it is very likely
	that there is a character in the file that you can not see.
	You can generally use a UNIX command like "od -hc" to find it.</p>

	</a>
	</faq>
	<faq title="Error Accessing EBCDIC XML Files">
	<q>I get an error when I access EBCDIC XML files, what is happening?</q>
	<a><p>If an XML document/file is not UTF-8, then you MUST specify the encoding.
	When transcoding a UTF8 document to EBCDIC, remember to change this:</p>
	<ul>
	<li><?xml version="1.0" encoding="UTF-8"?>
	<br/>to something like this:
	<br/><?xml version="1.0" encoding="ebcdic-cp-us"?>
	</li>
	</ul>
	</a>
	</faq>
	<faq title="EOF Character Error">
	<q>I get an error on the EOF character (0x1A) -- what is happening?</q>
	<a><p>You are probably using the <em>LPEX</em>
	editor, which automatically inserts an End-of-file character (0x1A) at the end of your
	XML document (other editors might do this as well). Unfortunately, the
	EOF character (0x1A) is an illegal character according to the XML specification,
	and &javaparsername; correctly generates an error.</p>
	</a>
	</faq>
	<faq title='DOS Filenames No Longer Work'>
	<q>I used to be able to use DOS filenames with the parser and now
	they don't work. Why not?</q>
	<a><p>DOS filenames are not legal URIs as required by the XML 1.0
	specification. Therefore, it was an error for the parser to accept
	DOS filenames. This bug is now fixed.</p>
	<p>DOS filenames can be converted to legal URIs, however. For
	example, the DOS filename "c:\xerces\data\personal.xml" would become
	"file:///c:/xerces/data/personal.xml", which is a legal URI.</p>
	</a>
	</faq>
	</faqs>