docs/faq-general.xml - xerces2-j - Git at Google

 <?xml version='1.0' encoding='UTF-8'?>
 <!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'>
 <faqs title='General FAQs'>
   <faq title="Querying Xerces Version">
     <q>How do I find out which Xerces version I am using?</q>
     <a> <p>To find out the release version of Xerces, execute the following:
          <code>java org.apache.xerces.impl.Version</code>.
     </p>
     </a>
   </faq>
   <faq title="Bugzilla">
     <q>How do I use Bugzilla to report bugs?</q>
     <a> <p>
     Please refer to the <link idref="bugzilla">Reporting bugs in bugzilla</link>
     </p>
     </a>
   </faq>

     <faq title="Jar file changes">
     <q>What happened to xerces.jar?</q>
     <a>
         <p>In order to take advantage of the fact that this parser is
         very often used in conjunction with other XML technologies,
         such as XSLT processors, which also rely on standard
         API&apos;s like DOM and SAX, xerces.jar was split into two
         jarfiles:
         </p>
         <ul>
         <li><code>xml-apis.jar</code> contains the DOM level 2,
         SAX 2.0 and the parsing component of JAXP 1.2 API&apos;s;</li>
         <li><code>xercesImpl.jar</code> contains the implementation of
         these API&apos;s as well as the XNI API.
         </li>
         </ul>
         <p>For backwards compatibility, we have retained the ability
         to generate xerces.jar.  For instructions, see <link
         idref="install">the installation documentation</link>.
         </p>
     </a>
  </faq>
  <faq title="Obtaining smaller jars">
    <q>I don&apos;t need all the features Xerces provides, but I&apos;m
    running in an environment where space is at a premium.  Is there
    anything I can do?
  </q>
  <a>
     <p>
     Partially to address this issue, we've recently begun to
     distribute compressed jarfiles instead of our traditionally
     uncompressed files.  But if you still need a smaller jar, and
     don&apos;t need things like support for XML Schema or the WML/HTML
     DOM implementations that Xerces provides, then look at the
     <code>dtdjars</code> target in our
     buildfile.
     </p>
   </a>
  </faq>
  <faq title='Validation against DTD'>
   <q>How do I turn on DTD validation?</q>
   <a>
    <p>
     You can turn validation on and off via methods available
     on the SAX2 <code>XMLReader</code> interface. While only the
     <code>SAXParser</code> implements the <code>XMLReader</code>
     interface, the methods required for turning on validation
     are available to both parser classes, DOM and SAX.
     <br/>
     The code snippet below shows how to turn validation on -- assume
     that <ref>parser</ref> is an instance of either
     <code>org.apache.xerces.parsers.SAXParser</code> or
     <code>org.apache.xerces.parsers.DOMParser</code>.
     <br/><br/>
     <code>parser.setFeature("http://xml.org/sax/features/validation", true);</code>
    </p>
   </a>
  </faq>

 <!--
 <faq title='PSVI'>
   <q>How do I get access to the PSVI?</q>
   <a>
    <p>Xerces provides a sample component PSVIWriter that intercepts document
 handler events and collects PSVI information. For more information read <link
 idref="samples-xni">samples documentation</link> on how to use xni.parser.PSVIParser
 and xni.parser.PSVIConfiguration.
     </p>
 <note>Xerces only produces light-weight  PSVI.</note>
   </a>
  </faq>
  -->


  <faq title='International Encodings'>
   <q>What international encodings are supported by &ParserName;?</q>
   <a>
    <ul>
     <li>UTF-8</li>
     <li>UTF-16 Big Endian and Little Endian</li>
     <li>UCS-2 (ISO-10646-UCS-2) Big Endian and Little Endian</li>
     <li>UCS-4 (ISO-10646-UCS-4) Big Endian and Little Endian</li>
     <li>IBM-1208</li>
     <li>ISO Latin-1 (ISO-8859-1)</li>
     <li>
      ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech,
      Hungarian, Polish, Romanian, Serbian (in Latin transcription),
      Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
     </li>
     <li>ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]</li>
     <li>ISO Latin-4 (ISO-8859-4)</li>
     <li>ISO Latin Cyrillic (ISO-8859-5)</li>
     <li>ISO Latin Arabic (ISO-8859-6)</li>
     <li>ISO Latin Greek (ISO-8859-7)</li>
     <li>ISO Latin Hebrew (ISO-8859-8)</li>
     <li>ISO Latin-5 (ISO-8859-9) [Turkish]</li>
     <li>ISO Latin-7 (ISO-8859-13)</li>
     <li>ISO Latin-9 (ISO-8859-15)</li>
     <li>Extended Unix Code, packed for Japanese (euc-jp, eucjis)</li>
     <li>Japanese Shift JIS (shift-jis)</li>
     <li>Chinese (big5)</li>
     <li>Chinese for PRC (mixed 1/2 byte) (gb2312)</li>
     <li>Japanese ISO-2022-JP (iso-2022-jp)</li>
     <li>Cyrillic (koi8-r)</li>
     <li>Extended Unix Code, packed for Korean (euc-kr)</li>
     <li>Russian Unix, Cyrillic (koi8-r)</li>
     <li>Windows Thai (cp874)</li>
     <li>Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)</li>
     <li>cp858</li>
     <li>EBCDIC encodings:</li>
      <ul>
       <li>EBCDIC US (ebcdic-cp-us)</li>
       <li>EBCDIC Canada (ebcdic-cp-ca)</li>
       <li>EBCDIC Netherland (ebcdic-cp-nl)</li>
       <li>EBCDIC Denmark (ebcdic-cp-dk)</li>
       <li>EBCDIC Norway (ebcdic-cp-no)</li>
       <li>EBCDIC Finland (ebcdic-cp-fi)</li>
       <li>EBCDIC Sweden (ebcdic-cp-se)</li>
       <li>EBCDIC Italy (ebcdic-cp-it)</li>
       <li>EBCDIC Spain, Latin America (ebcdic-cp-es)</li>
       <li>EBCDIC Great Britain (ebcdic-cp-gb)</li>
       <li>EBCDIC France (ebcdic-cp-fr)</li>
       <li>EBCDIC Hebrew (ebcdic-cp-he)</li>
       <li>EBCDIC Switzerland (ebcdic-cp-ch)</li>
       <li>EBCDIC Roece (ebcdic-cp-roece)</li>
       <li>EBCDIC Yugoslavia (ebcdic-cp-yu)</li>
       <li>EBCDIC Iceland (ebcdic-cp-is)</li>
       <li>EBCDIC Urdu (ebcdic-cp-ar2)</li>
       <li>Latin 0 EBCDIC</li>
       <li>EBCDIC Arabic (ebcdic-cp-ar1)</li>
      </ul>
    </ul>
   </a>
  </faq>

  <faq title='Accessing Documents on the Internet'>
   <q>Why is the parser unable to access schema documents or external entities available on the Internet?</q>
   <a>
    <p>
    The parser may not be able to access various external entities or schema documents
    (imported, included etc...) available on the Internet, such as the Schema for Schemas
    "http://www.w3.org/2001/XMLSchema.xsd" or the schema defining xml:base, xml:lang attributes etc...
    "http://www.w3.org/2001/xml.xsd" or any other external entity available on the Internet. There
    are various reasons one could experience such a problem.
    <br/>
    <br/>
    One of the reasons could be that your proxy settings do not allow the parser to make
    URL connections through a proxy server. To solve this problem, before parsing a document,
    the application must set the two system properties: "http.proxyHost" and "http.proxyPort".
    Another reason could be due to strict firewall settings that do not allow any URL connection
    to be made to the outside web. The problem may also be caused by a server that is offline or
    inaccessible on the network, preventing documents hosted by the server from being accessed.
    </p>
   </a>
  </faq>

   <faq title='Incomplete character data is received via SAX'>
   <q>Why does the SAX parser lose some character data or why is the data split
   into several chunks?</q>
   <a>
   <p>If you read the <jump href='http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)'>SAX</jump>
   documentation, you will find that SAX may deliver contiguous text as multiple calls to
 characters(), for reasons having to do with parser efficiency and input
 buffering. It is the programmer's responsibility to deal with that
 appropriately, e.g. by accumulating text until the next non-characters()
 event.
  </p>
   </a>
   </faq>
   <faq title="Encodings and XML Version Via SAX">
     <q>Is there any way I can determine what encoding an entity was
     written in, or what XML version the document conformed to, if I'm
     using SAX?
     </q>
     <a>
         <p>The answer to this question is that, yes there is a way, but it's
         not particularly beautiful.  There is no way in SAX 2.0.0 or
         2.0.1 to get hold of these pieces of information; the SAX
         Locator2 interface from the 1.1 extensions--still in Alpha at
         the time of writing--does provide methods to accomplish this,
         but since Xerces is required to support precisely SAX 2.0.0 by
         Sun TCK rules, we cannot ship this interface.  However, we can
         still support the appropriate methods on the objects we
         provide to implement the SAX Locator interface.  Therefore,
         assuming <code>Locator</code> is an instance of the SAX
         Locator interface that Xerces has passed back in a
         <code>setDocumentLocator</code> call,
         you can use a method like this to determine the encoding of
         the entity currently being parsed:
         </p>
         <source>
     import java.lang.reflect.Method;
     private String getEncoding(Locator locator) {
         String encoding = null;
         Method getEncoding = null;
         try {
             getEncoding = locator.getClass().getMethod("getEncoding", new Class[]{});
             if(getEncoding != null) {
                 encoding = (String)getEncoding.invoke(locator, null);
             }
         } catch (Exception e) {
             // either this locator object doesn't have this
             // method, or we're on an old JDK
         }
         return encoding;
     }
         </source>
         <p>This code has the advantage that it will compile on JDK
         1.1.8, though it will only produce non-null results on 1.2.x
         JDK's and later.  Substituting <code>getXMLVersion</code> for
         <code>getEncoding</code> will enable you to determine the
         version of XML to which the instance document conforms.
         </p>
     </a>
   </faq>

 </faqs>
	<?xml version='1.0' encoding='UTF-8'?>
	<!DOCTYPE faqs SYSTEM 'dtd/faqs.dtd'>
	<faqs title='General FAQs'>
	<faq title="Querying Xerces Version">
	<q>How do I find out which Xerces version I am using?</q>
	<a> <p>To find out the release version of Xerces, execute the following:
	<code>java org.apache.xerces.impl.Version</code>.
	</p>
	</a>
	</faq>
	<faq title="Bugzilla">
	<q>How do I use Bugzilla to report bugs?</q>
	<a> <p>
	Please refer to the <link idref="bugzilla">Reporting bugs in bugzilla</link>
	</p>
	</a>
	</faq>

	<faq title="Jar file changes">
	<q>What happened to xerces.jar?</q>
	<a>
	<p>In order to take advantage of the fact that this parser is
	very often used in conjunction with other XML technologies,
	such as XSLT processors, which also rely on standard
	API's like DOM and SAX, xerces.jar was split into two
	jarfiles:
	</p>
	<ul>
	<li><code>xml-apis.jar</code> contains the DOM level 2,
	SAX 2.0 and the parsing component of JAXP 1.2 API's;</li>
	<li><code>xercesImpl.jar</code> contains the implementation of
	these API's as well as the XNI API.
	</li>
	</ul>
	<p>For backwards compatibility, we have retained the ability
	to generate xerces.jar. For instructions, see <link
	idref="install">the installation documentation</link>.
	</p>
	</a>
	</faq>
	<faq title="Obtaining smaller jars">
	<q>I don't need all the features Xerces provides, but I'm
	running in an environment where space is at a premium. Is there
	anything I can do?
	</q>
	<a>
	<p>
	Partially to address this issue, we've recently begun to
	distribute compressed jarfiles instead of our traditionally
	uncompressed files. But if you still need a smaller jar, and
	don't need things like support for XML Schema or the WML/HTML
	DOM implementations that Xerces provides, then look at the
	<code>dtdjars</code> target in our
	buildfile.
	</p>
	</a>
	</faq>
	<faq title='Validation against DTD'>
	<q>How do I turn on DTD validation?</q>
	<a>
	<p>
	You can turn validation on and off via methods available
	on the SAX2 <code>XMLReader</code> interface. While only the
	<code>SAXParser</code> implements the <code>XMLReader</code>
	interface, the methods required for turning on validation
	are available to both parser classes, DOM and SAX.
	<br/>
	The code snippet below shows how to turn validation on -- assume
	that <ref>parser</ref> is an instance of either
	<code>org.apache.xerces.parsers.SAXParser</code> or
	<code>org.apache.xerces.parsers.DOMParser</code>.
	<br/><br/>
	<code>parser.setFeature("http://xml.org/sax/features/validation", true);</code>
	</p>
	</a>
	</faq>

	<!--
	<faq title='PSVI'>
	<q>How do I get access to the PSVI?</q>
	<a>
	<p>Xerces provides a sample component PSVIWriter that intercepts document
	handler events and collects PSVI information. For more information read <link
	idref="samples-xni">samples documentation</link> on how to use xni.parser.PSVIParser
	and xni.parser.PSVIConfiguration.
	</p>
	<note>Xerces only produces light-weight PSVI.</note>
	</a>
	</faq>
	-->


	<faq title='International Encodings'>
	<q>What international encodings are supported by &ParserName;?</q>
	<a>
	<ul>
	<li>UTF-8</li>
	<li>UTF-16 Big Endian and Little Endian</li>
	<li>UCS-2 (ISO-10646-UCS-2) Big Endian and Little Endian</li>
	<li>UCS-4 (ISO-10646-UCS-4) Big Endian and Little Endian</li>
	<li>IBM-1208</li>
	<li>ISO Latin-1 (ISO-8859-1)</li>
	<li>
	ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech,
	Hungarian, Polish, Romanian, Serbian (in Latin transcription),
	Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
	</li>
	<li>ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]</li>
	<li>ISO Latin-4 (ISO-8859-4)</li>
	<li>ISO Latin Cyrillic (ISO-8859-5)</li>
	<li>ISO Latin Arabic (ISO-8859-6)</li>
	<li>ISO Latin Greek (ISO-8859-7)</li>
	<li>ISO Latin Hebrew (ISO-8859-8)</li>
	<li>ISO Latin-5 (ISO-8859-9) [Turkish]</li>
	<li>ISO Latin-7 (ISO-8859-13)</li>
	<li>ISO Latin-9 (ISO-8859-15)</li>
	<li>Extended Unix Code, packed for Japanese (euc-jp, eucjis)</li>
	<li>Japanese Shift JIS (shift-jis)</li>
	<li>Chinese (big5)</li>
	<li>Chinese for PRC (mixed 1/2 byte) (gb2312)</li>
	<li>Japanese ISO-2022-JP (iso-2022-jp)</li>
	<li>Cyrillic (koi8-r)</li>
	<li>Extended Unix Code, packed for Korean (euc-kr)</li>
	<li>Russian Unix, Cyrillic (koi8-r)</li>
	<li>Windows Thai (cp874)</li>
	<li>Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)</li>
	<li>cp858</li>
	<li>EBCDIC encodings:</li>
	<ul>
	<li>EBCDIC US (ebcdic-cp-us)</li>
	<li>EBCDIC Canada (ebcdic-cp-ca)</li>
	<li>EBCDIC Netherland (ebcdic-cp-nl)</li>
	<li>EBCDIC Denmark (ebcdic-cp-dk)</li>
	<li>EBCDIC Norway (ebcdic-cp-no)</li>
	<li>EBCDIC Finland (ebcdic-cp-fi)</li>
	<li>EBCDIC Sweden (ebcdic-cp-se)</li>
	<li>EBCDIC Italy (ebcdic-cp-it)</li>
	<li>EBCDIC Spain, Latin America (ebcdic-cp-es)</li>
	<li>EBCDIC Great Britain (ebcdic-cp-gb)</li>
	<li>EBCDIC France (ebcdic-cp-fr)</li>
	<li>EBCDIC Hebrew (ebcdic-cp-he)</li>
	<li>EBCDIC Switzerland (ebcdic-cp-ch)</li>
	<li>EBCDIC Roece (ebcdic-cp-roece)</li>
	<li>EBCDIC Yugoslavia (ebcdic-cp-yu)</li>
	<li>EBCDIC Iceland (ebcdic-cp-is)</li>
	<li>EBCDIC Urdu (ebcdic-cp-ar2)</li>
	<li>Latin 0 EBCDIC</li>
	<li>EBCDIC Arabic (ebcdic-cp-ar1)</li>
	</ul>
	</ul>
	</a>
	</faq>

	<faq title='Accessing Documents on the Internet'>
	<q>Why is the parser unable to access schema documents or external entities available on the Internet?</q>
	<a>
	<p>
	The parser may not be able to access various external entities or schema documents
	(imported, included etc...) available on the Internet, such as the Schema for Schemas
	"http://www.w3.org/2001/XMLSchema.xsd" or the schema defining xml:base, xml:lang attributes etc...
	"http://www.w3.org/2001/xml.xsd" or any other external entity available on the Internet. There
	are various reasons one could experience such a problem.
	<br/>
	<br/>
	One of the reasons could be that your proxy settings do not allow the parser to make
	URL connections through a proxy server. To solve this problem, before parsing a document,
	the application must set the two system properties: "http.proxyHost" and "http.proxyPort".
	Another reason could be due to strict firewall settings that do not allow any URL connection
	to be made to the outside web. The problem may also be caused by a server that is offline or
	inaccessible on the network, preventing documents hosted by the server from being accessed.
	</p>
	</a>
	</faq>

	<faq title='Incomplete character data is received via SAX'>
	<q>Why does the SAX parser lose some character data or why is the data split
	into several chunks?</q>
	<a>
	<p>If you read the <jump href='http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)'>SAX</jump>
	documentation, you will find that SAX may deliver contiguous text as multiple calls to
	characters(), for reasons having to do with parser efficiency and input
	buffering. It is the programmer's responsibility to deal with that
	appropriately, e.g. by accumulating text until the next non-characters()
	event.
	</p>
	</a>
	</faq>
	<faq title="Encodings and XML Version Via SAX">
	<q>Is there any way I can determine what encoding an entity was
	written in, or what XML version the document conformed to, if I'm
	using SAX?
	</q>
	<a>
	<p>The answer to this question is that, yes there is a way, but it's
	not particularly beautiful. There is no way in SAX 2.0.0 or
	2.0.1 to get hold of these pieces of information; the SAX
	Locator2 interface from the 1.1 extensions--still in Alpha at
	the time of writing--does provide methods to accomplish this,
	but since Xerces is required to support precisely SAX 2.0.0 by
	Sun TCK rules, we cannot ship this interface. However, we can
	still support the appropriate methods on the objects we
	provide to implement the SAX Locator interface. Therefore,
	assuming <code>Locator</code> is an instance of the SAX
	Locator interface that Xerces has passed back in a
	<code>setDocumentLocator</code> call,
	you can use a method like this to determine the encoding of
	the entity currently being parsed:
	</p>
	<source>
	import java.lang.reflect.Method;
	private String getEncoding(Locator locator) {
	String encoding = null;
	Method getEncoding = null;
	try {
	getEncoding = locator.getClass().getMethod("getEncoding", new Class[]{});
	if(getEncoding != null) {
	encoding = (String)getEncoding.invoke(locator, null);
	}
	} catch (Exception e) {
	// either this locator object doesn't have this
	// method, or we're on an old JDK
	}
	return encoding;
	}
	</source>
	<p>This code has the advantage that it will compile on JDK
	1.1.8, though it will only produce non-null results on 1.2.x
	JDK's and later. Substituting <code>getXMLVersion</code> for
	<code>getEncoding</code> will enable you to determine the
	version of XML to which the instance document conforms.
	</p>
	</a>
	</faq>

	</faqs>