| <?xml version='1.0' encoding='UTF-8'?> |
| <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'> |
| <faqs title='General FAQs'> |
| <faq title="Jar file changes"> |
| <q>What happened to xerces.jar</q> |
| <a> |
| <p>In order to take advantage of the fact that this parser is |
| very often used in conjunction with other XML technologies, |
| such as XSLT processors, which also rely on standard |
| API's like DOM and SAX, xerces.jar was split into two |
| jarfiles: |
| </p> |
| <ul> |
| <li><code>xmlParserAPIs.jar</code> contains the DOM level 2, |
| SAX 2.0 and JAXP 1.1 API's;</li> |
| <li><code>xercesImpl.jar</code> contains the implementation of |
| these API's as well as the XNI API. |
| </li> |
| </ul> |
| <p>For backwards compatibility, we have retained the ability |
| to generate old-style jarfiles. For instructions, see <link |
| idref="install">the installation documentation</link>. |
| </p> |
| </a> |
| </faq> |
| <faq title='Validation against DTD'> |
| <q>How do I turn on DTD validation?</q> |
| <a> |
| <p> |
| You can turn validation on and off via methods available |
| on the SAX2 <code>XMLReader</code> interface. While only the |
| <code>SAXParser</code> implements the <code>XMLReader</code> |
| interface, the methods required for turning on validation |
| are available to both parser classes, DOM and SAX. |
| <br/> |
| The code snippet below shows how to turn validation on -- assume |
| that <ref>parser</ref> is an instance of either |
| <code>org.apache.xerces.parsers.SAXParser</code> or |
| <code>org.apache.xerces.parsers.DOMParser</code>. |
| <br/><br/> |
| <code>parser.setFeature("http://xml.org/sax/features/validation", true);</code> |
| </p> |
| </a> |
| </faq> |
| <faq title='IDs and XML Schemas'> |
| <q>Why does getElementById() not always work for documents validated against XML Schemas?</q> |
| <a> |
| <p>According to the XML Schema specification, an instance document might have |
| more than one <jump href="http://www.w3.org/TR/xmlschema-1/#key-vr">validation root</jump> and |
| <jump href="http://www.w3.org/TR/xmlschema-1/#cvc-id">ID/IDREFS</jump> must be |
| unique only within the context of a particular validation root, meaning that a |
| document may potentially contain multiple identical ids. In this case, the output |
| of getElementById() is unspecified. On the other hand, if the document root is |
| a validation root of the document, getElementById() should work as expected. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title='PSVI'> |
| <q>How do I get access to the PSVI?</q> |
| <a> |
| <p>Xerces provides a sample component PSVIWriter that intercepts document |
| handler events and collects PSVI information. For more information read <link |
| idref="samples-xni">samples documentation</link> on how to use xni.parser.PSVIParser |
| and xni.parser.PSVIConfiguration. |
| </p> |
| <note>Xerces only produces light-weight PSVI.</note> |
| </a> |
| </faq> |
| |
| |
| <faq title='International Encodings'> |
| <q>What international encodings are supported by &ParserName;?</q> |
| <a> |
| <ul> |
| <li>UTF-8</li> |
| <li>UTF-16 Big Endian, UTF-16 Little Endian</li> |
| <li>IBM-1208</li> |
| <li>ISO Latin-1 (ISO-8859-1)</li> |
| <li> |
| ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, |
| Hungarian, Polish, Romanian, Serbian (in Latin transcription), |
| Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian] |
| </li> |
| <li>ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]</li> |
| <li>ISO Latin-4 (ISO-8859-4)</li> |
| <li>ISO Latin Cyrillic (ISO-8859-5)</li> |
| <li>ISO Latin Arabic (ISO-8859-6)</li> |
| <li>ISO Latin Greek (ISO-8859-7)</li> |
| <li>ISO Latin Hebrew (ISO-8859-8)</li> |
| <li>ISO Latin-5 (ISO-8859-9) [Turkish]</li> |
| <li>Extended Unix Code, packed for Japanese (euc-jp, eucjis)</li> |
| <li>Japanese Shift JIS (shift-jis)</li> |
| <li>Chinese (big5)</li> |
| <li>Chinese for PRC (mixed 1/2 byte) (gb2312)</li> |
| <li>Japanese ISO-2022-JP (iso-2022-jp)</li> |
| <li>Cyrillic (koi8-r)</li> |
| <li>Extended Unix Code, packed for Korean (euc-kr)</li> |
| <li>Russian Unix, Cyrillic (koi8-r)</li> |
| <li>Windows Thai (cp874)</li> |
| <li>Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)</li> |
| <li>cp858</li> |
| <li>EBCDIC encodings:</li> |
| <ul> |
| <li>EBCDIC US (ebcdic-cp-us)</li> |
| <li>EBCDIC Canada (ebcdic-cp-ca)</li> |
| <li>EBCDIC Netherland (ebcdic-cp-nl)</li> |
| <li>EBCDIC Denmark (ebcdic-cp-dk)</li> |
| <li>EBCDIC Norway (ebcdic-cp-no)</li> |
| <li>EBCDIC Finland (ebcdic-cp-fi)</li> |
| <li>EBCDIC Sweden (ebcdic-cp-se)</li> |
| <li>EBCDIC Italy (ebcdic-cp-it)</li> |
| <li>EBCDIC Spain, Latin America (ebcdic-cp-es)</li> |
| <li>EBCDIC Great Britain (ebcdic-cp-gb)</li> |
| <li>EBCDIC France (ebcdic-cp-fr)</li> |
| <li>EBCDIC Hebrew (ebcdic-cp-he)</li> |
| <li>EBCDIC Switzerland (ebcdic-cp-ch)</li> |
| <li>EBCDIC Roece (ebcdic-cp-roece)</li> |
| <li>EBCDIC Yugoslavia (ebcdic-cp-yu)</li> |
| <li>EBCDIC Iceland (ebcdic-cp-is)</li> |
| <li>EBCDIC Urdu (ebcdic-cp-ar2)</li> |
| <li>Latin 0 EBCDIC</li> |
| <li>EBCDIC Arabic (ebcdic-cp-ar1)</li> |
| </ul> |
| </ul> |
| <note>UCS-4 is not yet supported, but it is hoped that support will be available soon.</note> |
| </a> |
| </faq> |
| </faqs> |