| <?xml version='1.0' encoding='UTF-8'?> |
| <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'> |
| <faqs title='General FAQs'> |
| <faq title="Querying Xerces Version"> |
| <q>How do I find out which Xerces version I am using?</q> |
| <a> <p>To find out the release version of Xerces, execute the following: |
| <code>java org.apache.xerces.impl.Version</code>. |
| </p> |
| </a> |
| </faq> |
| <faq title="Bugzilla"> |
| <q>How do I use Bugzilla to report bugs?</q> |
| <a> <p> |
| Please refer to the <link idref="bugzilla">Reporting bugs in bugzilla</link> |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Jar file changes"> |
| <q>What happened to xerces.jar?</q> |
| <a> |
| <p>In order to take advantage of the fact that this parser is |
| very often used in conjunction with other XML technologies, |
| such as XSLT processors, which also rely on standard |
| API's like DOM and SAX, xerces.jar was split into two |
| jarfiles: |
| </p> |
| <ul> |
| <li><code>xml-apis.jar</code> contains the DOM level 2, |
| SAX 2.0 and the parsing component of JAXP 1.2 API's;</li> |
| <li><code>xercesImpl.jar</code> contains the implementation of |
| these API's as well as the XNI API. |
| </li> |
| </ul> |
| <p>For backwards compatibility, we have retained the ability |
| to generate xerces.jar. For instructions, see <link |
| idref="install">the installation documentation</link>. |
| </p> |
| </a> |
| </faq> |
| <faq title="Obtaining smaller jars"> |
| <q>I don't need all the features Xerces provides, but I'm |
| running in an environment where space is at a premium. Is there |
| anything I can do? |
| </q> |
| <a> |
| <p> |
| Partially to address this issue, we've recently begun to |
| distribute compressed jarfiles instead of our traditionally |
| uncompressed files. But if you still need a smaller jar, and |
| don't need things like support for XML Schema or the WML/HTML |
| DOM implementations that Xerces provides, then look at the |
| <code>dtdjars</code> target in our |
| buildfile. |
| </p> |
| </a> |
| </faq> |
| <faq title='Validation against DTD'> |
| <q>How do I turn on DTD validation?</q> |
| <a> |
| <p> |
| You can turn validation on and off via methods available |
| on the SAX2 <code>XMLReader</code> interface. While only the |
| <code>SAXParser</code> implements the <code>XMLReader</code> |
| interface, the methods required for turning on validation |
| are available to both parser classes, DOM and SAX. |
| <br/> |
| The code snippet below shows how to turn validation on -- assume |
| that <ref>parser</ref> is an instance of either |
| <code>org.apache.xerces.parsers.SAXParser</code> or |
| <code>org.apache.xerces.parsers.DOMParser</code>. |
| <br/><br/> |
| <code>parser.setFeature("http://xml.org/sax/features/validation", true);</code> |
| </p> |
| </a> |
| </faq> |
| |
| <!-- |
| <faq title='PSVI'> |
| <q>How do I get access to the PSVI?</q> |
| <a> |
| <p>Xerces provides a sample component PSVIWriter that intercepts document |
| handler events and collects PSVI information. For more information read <link |
| idref="samples-xni">samples documentation</link> on how to use xni.parser.PSVIParser |
| and xni.parser.PSVIConfiguration. |
| </p> |
| <note>Xerces only produces light-weight PSVI.</note> |
| </a> |
| </faq> |
| --> |
| |
| |
| <faq title='International Encodings'> |
| <q>What international encodings are supported by &ParserName;?</q> |
| <a> |
| <ul> |
| <li>UTF-8</li> |
| <li>UTF-16 Big Endian and Little Endian</li> |
| <li>UCS-2 (ISO-10646-UCS-2) Big Endian and Little Endian</li> |
| <li>UCS-4 (ISO-10646-UCS-4) Big Endian and Little Endian</li> |
| <li>IBM-1208</li> |
| <li>ISO Latin-1 (ISO-8859-1)</li> |
| <li> |
| ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, |
| Hungarian, Polish, Romanian, Serbian (in Latin transcription), |
| Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian] |
| </li> |
| <li>ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]</li> |
| <li>ISO Latin-4 (ISO-8859-4)</li> |
| <li>ISO Latin Cyrillic (ISO-8859-5)</li> |
| <li>ISO Latin Arabic (ISO-8859-6)</li> |
| <li>ISO Latin Greek (ISO-8859-7)</li> |
| <li>ISO Latin Hebrew (ISO-8859-8)</li> |
| <li>ISO Latin-5 (ISO-8859-9) [Turkish]</li> |
| <li>ISO Latin-7 (ISO-8859-13)</li> |
| <li>ISO Latin-9 (ISO-8859-15)</li> |
| <li>Extended Unix Code, packed for Japanese (euc-jp, eucjis)</li> |
| <li>Japanese Shift JIS (shift-jis)</li> |
| <li>Chinese (big5)</li> |
| <li>Chinese for PRC (mixed 1/2 byte) (gb2312)</li> |
| <li>Japanese ISO-2022-JP (iso-2022-jp)</li> |
| <li>Cyrillic (koi8-r)</li> |
| <li>Extended Unix Code, packed for Korean (euc-kr)</li> |
| <li>Russian Unix, Cyrillic (koi8-r)</li> |
| <li>Windows Thai (cp874)</li> |
| <li>Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)</li> |
| <li>cp858</li> |
| <li>EBCDIC encodings:</li> |
| <ul> |
| <li>EBCDIC US (ebcdic-cp-us)</li> |
| <li>EBCDIC Canada (ebcdic-cp-ca)</li> |
| <li>EBCDIC Netherland (ebcdic-cp-nl)</li> |
| <li>EBCDIC Denmark (ebcdic-cp-dk)</li> |
| <li>EBCDIC Norway (ebcdic-cp-no)</li> |
| <li>EBCDIC Finland (ebcdic-cp-fi)</li> |
| <li>EBCDIC Sweden (ebcdic-cp-se)</li> |
| <li>EBCDIC Italy (ebcdic-cp-it)</li> |
| <li>EBCDIC Spain, Latin America (ebcdic-cp-es)</li> |
| <li>EBCDIC Great Britain (ebcdic-cp-gb)</li> |
| <li>EBCDIC France (ebcdic-cp-fr)</li> |
| <li>EBCDIC Hebrew (ebcdic-cp-he)</li> |
| <li>EBCDIC Switzerland (ebcdic-cp-ch)</li> |
| <li>EBCDIC Roece (ebcdic-cp-roece)</li> |
| <li>EBCDIC Yugoslavia (ebcdic-cp-yu)</li> |
| <li>EBCDIC Iceland (ebcdic-cp-is)</li> |
| <li>EBCDIC Urdu (ebcdic-cp-ar2)</li> |
| <li>Latin 0 EBCDIC</li> |
| <li>EBCDIC Arabic (ebcdic-cp-ar1)</li> |
| </ul> |
| </ul> |
| </a> |
| </faq> |
| |
| <faq title='Accessing Documents on the Internet'> |
| <q>Why is the parser unable to access schema documents or external entities available on the Internet?</q> |
| <a> |
| <p> |
| The parser may not be able to access various external entities or schema documents |
| (imported, included etc...) available on the Internet, such as the Schema for Schemas |
| "http://www.w3.org/2001/XMLSchema.xsd" or the schema defining xml:base, xml:lang attributes etc... |
| "http://www.w3.org/2001/xml.xsd" or any other external entity available on the Internet. There |
| are various reasons one could experience such a problem. |
| <br/> |
| <br/> |
| One of the reasons could be that your proxy settings do not allow the parser to make |
| URL connections through a proxy server. To solve this problem, before parsing a document, |
| the application must set the two system properties: "http.proxyHost" and "http.proxyPort". |
| Another reason could be due to strict firewall settings that do not allow any URL connection |
| to be made to the outside web. The problem may also be caused by a server that is offline or |
| inaccessible on the network, preventing documents hosted by the server from being accessed. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title='Incomplete character data is received via SAX'> |
| <q>Why does the SAX parser lose some character data or why is the data split |
| into several chunks?</q> |
| <a> |
| <p>If you read the <jump href='http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)'>SAX</jump> |
| documentation, you will find that SAX may deliver contiguous text as multiple calls to |
| characters(), for reasons having to do with parser efficiency and input |
| buffering. It is the programmer's responsibility to deal with that |
| appropriately, e.g. by accumulating text until the next non-characters() |
| event. |
| </p> |
| </a> |
| </faq> |
| <faq title="Encodings and XML Version Via SAX"> |
| <q>Is there any way I can determine what encoding an entity was |
| written in, or what XML version the document conformed to, if I'm |
| using SAX? |
| </q> |
| <a> |
| <p>The answer to this question is that, yes there is a way, but it's |
| not particularly beautiful. There is no way in SAX 2.0.0 or |
| 2.0.1 to get hold of these pieces of information; the SAX |
| Locator2 interface from the 1.1 extensions--still in Alpha at |
| the time of writing--does provide methods to accomplish this, |
| but since Xerces is required to support precisely SAX 2.0.0 by |
| Sun TCK rules, we cannot ship this interface. However, we can |
| still support the appropriate methods on the objects we |
| provide to implement the SAX Locator interface. Therefore, |
| assuming <code>Locator</code> is an instance of the SAX |
| Locator interface that Xerces has passed back in a |
| <code>setDocumentLocator</code> call, |
| you can use a method like this to determine the encoding of |
| the entity currently being parsed: |
| </p> |
| <source> |
| import java.lang.reflect.Method; |
| private String getEncoding(Locator locator) { |
| String encoding = null; |
| Method getEncoding = null; |
| try { |
| getEncoding = locator.getClass().getMethod("getEncoding", new Class[]{}); |
| if(getEncoding != null) { |
| encoding = (String)getEncoding.invoke(locator, null); |
| } |
| } catch (Exception e) { |
| // either this locator object doesn't have this |
| // method, or we're on an old JDK |
| } |
| return encoding; |
| } |
| </source> |
| <p>This code has the advantage that it will compile on JDK |
| 1.1.8, though it will only produce non-null results on 1.2.x |
| JDK's and later. Substituting <code>getXMLVersion</code> for |
| <code>getEncoding</code> will enable you to determine the |
| version of XML to which the instance document conforms. |
| </p> |
| </a> |
| </faq> |
| |
| </faqs> |