docs/faq-common.xml - xerces2-j - Git at Google

 <?xml version='1.0' encoding='UTF-8'?>
 <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'>
 <faqs title='Common Problems FAQs'>
  <faq title='Parsing HTML Generated an Error.'>
   <q>
    I tried to use &ParserName; to parse an HTML file and it
    generated an error. What did I do wrong?
   </q>
   <a>
    <p>
     Unfortunately, HTML does not, in general, follow the XML
     grammar rules. Most HTML files do not meet the XML style
     quidelines. Therefore, the XML parser generates XML
     well-formedness errors.
    </p>
    <p>Typical errors include:</p>
    <ul>
     <li>
      Missing end tags, e.g. &lt;P&gt; with no &lt;/P&gt; (end
      tags are not required in HTML)
     </li>
     <li>
      Missing closing slash on &lt;IMG HREF="foo" <em>/</em>&gt;
      (not required in HTML)
     </li>
     <li>
      Missing quotes on attribute values, e.g. &lt;IMG width="600"&gt;
      (not generally required in HTML)
     </li>
    </ul>
    <p>
     HTML must match the XHTML standard for well-formedness before it
     can be parsed by &ParserName; or any other XML parser. You can
     find the
     <jump href="http://www.w3c.org/TR/1999/PR-xhtml1-19991210">XHTML
     standard</jump> on the
     <jump href="http://www.w3c.org">W3C web site</jump>.
    </p>
   </a>
  </faq>
  <faq title='UTF-8 Character Error'>
   <q>I get an &quot;invalid UTF-8 character&quot; error.</q>
   <a>
    <p>
     There are many Unicode characters that are not allowed in an
     XML document, according to the XML spec. Typical disallowed
     characters are control characters, even if you escape them
     using the Character Reference form: &amp;#xxxx; . See the XML
     spec, sections
     <jump href="http://www.w3.org/TR/REC-xml#charsets">2.2</jump>
     and
     <jump href="http://www.w3.org/TR/REC-xml#sec-references">4.1</jump>
     for details. If the parser is generating this error, it is very
     likely that there is a character in the file that you can not see.
     You can generally use a UNIX command like &quot;od -hc&quot; to
     find it.
    </p>
   </a>
  </faq>
  <faq title='Error Accessing EBCDIC XML Files'>
   <q>
    I get an error when I access EBCDIC XML files, what is happening?
   </q>
   <a>
    <p>
     If an XML document/file is not UTF-8, then you MUST specify the
     encoding. When transcoding a UTF8 document to EBCDIC, remember
     to change this:
    </p>
    <ul>
     <li>
      &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
      <br/>
      to something like this:
      <br/>
      &lt;?xml version=&quot;1.0&quot; encoding=&quot;ebcdic-cp-us&quot;?&gt;
     </li>
    </ul>
   </a>
  </faq>
  <faq title='EOF Character Error'>
   <q>
    I get an error on the EOF character (0x1A) -- what is happening?
   </q>
   <a>
    <p>
     You are probably using the <em>LPEX</em> editor, which
     automatically inserts an End-of-file character (0x1A) at the end
     of your XML document (other editors might do this as well).
     Unfortunately, the EOF character (0x1A) is an illegal character
     according to the XML specification, and &ParserName;
     correctly generates an error.
    </p>
   </a>
  </faq>
 </faqs>
	<?xml version='1.0' encoding='UTF-8'?>
	<!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'>
	<faqs title='Common Problems FAQs'>
	<faq title='Parsing HTML Generated an Error.'>
	<q>
	I tried to use &ParserName; to parse an HTML file and it
	generated an error. What did I do wrong?
	</q>
	<a>
	<p>
	Unfortunately, HTML does not, in general, follow the XML
	grammar rules. Most HTML files do not meet the XML style
	quidelines. Therefore, the XML parser generates XML
	well-formedness errors.
	</p>
	<p>Typical errors include:</p>
	<ul>
	<li>
	Missing end tags, e.g. <P> with no </P> (end
	tags are not required in HTML)
	</li>
	<li>
	Missing closing slash on <IMG HREF="foo" <em>/</em>>
	(not required in HTML)
	</li>
	<li>
	Missing quotes on attribute values, e.g. <IMG width="600">
	(not generally required in HTML)
	</li>
	</ul>
	<p>
	HTML must match the XHTML standard for well-formedness before it
	can be parsed by &ParserName; or any other XML parser. You can
	find the
	<jump href="http://www.w3c.org/TR/1999/PR-xhtml1-19991210">XHTML
	standard</jump> on the
	<jump href="http://www.w3c.org">W3C web site</jump>.
	</p>
	</a>
	</faq>
	<faq title='UTF-8 Character Error'>
	<q>I get an "invalid UTF-8 character" error.</q>
	<a>
	<p>
	There are many Unicode characters that are not allowed in an
	XML document, according to the XML spec. Typical disallowed
	characters are control characters, even if you escape them
	using the Character Reference form: &#xxxx; . See the XML
	spec, sections
	<jump href="http://www.w3.org/TR/REC-xml#charsets">2.2</jump>
	and
	<jump href="http://www.w3.org/TR/REC-xml#sec-references">4.1</jump>
	for details. If the parser is generating this error, it is very
	likely that there is a character in the file that you can not see.
	You can generally use a UNIX command like "od -hc" to
	find it.
	</p>
	</a>
	</faq>
	<faq title='Error Accessing EBCDIC XML Files'>
	<q>
	I get an error when I access EBCDIC XML files, what is happening?
	</q>
	<a>
	<p>
	If an XML document/file is not UTF-8, then you MUST specify the
	encoding. When transcoding a UTF8 document to EBCDIC, remember
	to change this:
	</p>
	<ul>
	<li>
	<?xml version="1.0" encoding="UTF-8"?>
	<br/>
	to something like this:
	<br/>
	<?xml version="1.0" encoding="ebcdic-cp-us"?>
	</li>
	</ul>
	</a>
	</faq>
	<faq title='EOF Character Error'>
	<q>
	I get an error on the EOF character (0x1A) -- what is happening?
	</q>
	<a>
	<p>
	You are probably using the <em>LPEX</em> editor, which
	automatically inserts an End-of-file character (0x1A) at the end
	of your XML document (other editors might do this as well).
	Unfortunately, the EOF character (0x1A) is an illegal character
	according to the XML specification, and &ParserName;
	correctly generates an error.
	</p>
	</a>
	</faq>
	</faqs>