| <?xml version="1.0" encoding = "iso-8859-1" standalone="no"?> |
| <!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd"> |
| |
| <faqs title="Parsing with &XercesCName;"> |
| <faq title="Why does my application crash on AIX when I run it under a |
| multi-threaded environment?"> |
| |
| <q>Why does my application crash on AIX when I run it under a |
| multi-threaded environment?</q> |
| |
| <a> |
| <p>AIX maintains two kinds of libraries on the system, |
| thread-safe and non-thread safe. Multi-threaded libraries on |
| AIX follow a different naming convention, Usually the |
| multi-threaded library names are followed with "_r". For |
| example, libc.a is single threaded whereas libc_r.a is |
| multi-threaded.</p> |
| |
| <p>To make your multi-threaded application run on AIX, you |
| MUST ensure that you do not have a 'system library path' in |
| your LIBPATH environment variable when you run the |
| application. The appropriate libraries (threaded or |
| non-threaded) are automatically picked up at runtime. An |
| application usually crashes when you build your application |
| for multi-threaded operation but don't point to the |
| thread-safe version of the system libraries. For example, |
| LIBPATH can be simply set as:</p> |
| |
| <source>LIBPATH=$HOME/<&XercesCProjectName;>/lib</source> |
| |
| <p>Where <&XercesCProjectName;> points to the directory where |
| &XercesCProjectName; application resides.</p> |
| |
| <p>If for any reason, unrelated to &XercesCProjectName;, you need to |
| keep a 'system library path' in your LIBPATH environment |
| variable, you must make sure that you have placed the |
| thread-safe path before you specify the normal system |
| path. For example, you must place <ref>/lib/threads</ref> before |
| <ref>/lib</ref> in your LIBPATH variable. That is to say your |
| LIBPATH may look like this:</p> |
| |
| <source>export LIBPATH=$HOME/<&XercesCProjectName;>/lib:/usr/lib/threads:/usr/lib</source> |
| |
| <p>Where /usr/lib is where your system libraries are.</p> |
| </a> |
| </faq> |
| |
| <faq title="What compilers are being used on the supported platforms?"> |
| |
| <q>What compilers are being used on the supported platforms?</q> |
| |
| <a> |
| <p>&XercesCProjectName; has been built on the following platforms with these |
| compilers</p> |
| |
| <table> |
| <tr><td><em>Operating System</em></td><td><em>Compiler</em></td></tr> |
| <tr><td>Windows NT 4.0 SP5/98</td><td>MSVC 6.0 SP3</td></tr> |
| <tr><td>Redhat Linux 6.1</td><td>egcs-2.91.66 and glibc-2.1.2-11</td></tr> |
| <tr><td>AIX 4.2.1 and higher</td><td>xlC 3.6.4</td></tr> |
| <tr><td>Solaris 2.6</td><td>CC Workshop 4.2</td></tr> |
| <tr><td>HP-UX 10.2</td><td>CC A.10.36</td></tr> |
| <tr><td>HP-UX 11.0</td><td>aCC A.03.13 with pthreads</td></tr> |
| </table> |
| </a> |
| </faq> |
| |
| <faq title="I cannot run my sample applications. What is wrong?"> |
| |
| <q>I cannot run my sample applications. What is wrong?</q> |
| <a> |
| <p>In order to run an application built using &XercesCProjectName; you |
| must set up your path and library search path properly. In the |
| standalone version from Apache, you must have the &XercesCName; runtime library |
| available from your path settings. On Windows this library is called |
| <code>&XercesCWindowsLib;.dll</code> which must be available from your <code>PATH</code> |
| settings. (Note that now there are separate debug and release dlls for Windows. |
| If the release dll is named <code>&XercesCWindowsLib;.dll</code> then the debug dll is named |
| <code>&XercesCWindowsLib;d.dll)</code>. |
| On UNIX platforms the library is called <code>&XercesCUnixLib;.so</code> |
| (or <code>.a</code> or <code>.sl</code>) which must be available from your |
| <code>LD_LIBRARY_PATH</code> (or <code>LIBPATH</code> or <code>SHLIB_PATH</code>) |
| environment variable.</p> |
| |
| <p>Thus, if you installed your binaries under <code>$HOME/fastxmlparser</code>, |
| you need to point your library path to that directory. |
| </p> |
| |
| <source>export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX) |
| export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux) |
| export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)</source> |
| |
| <p>If you are using the enhanced version of this parser from IBM, you will need to |
| put in two additional DLLs. In the Windows build these are <code>icuuc.dll</code> and |
| <code>icudata.dll</code> which must be available from your PATH settings. On UNIX, |
| these libraries are called <code>libicu-uc.so</code> and <code>libicudata.so</code> |
| (or <code>.sl</code> for HP-UX or <code>.a</code> for AIX) which must be available from |
| your library search path. |
| |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="I just built my own application using the &XercesCProjectName; parser. Why does it |
| crash?"> |
| |
| <q>I just built my own application using the &XercesCProjectName; parser. Why does it |
| crash?</q> |
| <a> |
| <p>In order to work with the &XercesCProjectName; parser, you have to |
| first initialize the XML subsystem. The most common mistake is |
| to forget this initialization. Before you make any calls to |
| &XercesCProjectName; APIs, you must call</p> |
| |
| <source>XMLPlatformUtils::Initialize(): |
| try { |
| XMLPlatformUtils::Initialize(); |
| } |
| catch (const XMLException& toCatch) { |
| // Do your failure processing here |
| }</source> |
| |
| <p>This initializes the &XercesCProjectName; system and sets its |
| internal variables. Note that you must the include |
| <code>util/PlatformUtils.hpp</code> file for this to work.</p> |
| </a> |
| </faq> |
| |
| <faq title="Is &XercesCProjectName; thread-safe?"> |
| |
| <q>Is &XercesCProjectName; thread-safe?</q> |
| |
| <a> |
| <p>This is not a question that has a simple yes/no answer. Here are |
| the rules for using &XercesCProjectName; in a multi-threaded environment:</p> |
| |
| <p>Within an address space, an instance of the parser may be used |
| without restriction from a single thread, or an instance of the |
| parser can be accessed from multiple threads, provided the |
| application guarantees that only one thread has entered a method |
| of the parser at any one time.</p> |
| |
| <p>When two or more parser instances exist in a process, the |
| instances can be used concurrently, and without external |
| synchronization. That is, in an application containing two |
| parsers and two threads, one pareser can be running within the |
| first thread concurrently with the second parser running |
| within the second thread.</p> |
| |
| <p>The same rules apply to &XercesCProjectName; DOM documents - |
| multiple document instances may be concurrently accessed from |
| different threads, but any given document instance can only be |
| accessed by one thread at a time.</p> |
| |
| <p>DOMStrings allow multiple concurrent readers. All DOMString |
| const methods are thread safe, and can be concurrently entered |
| by multiple threads. Non-const DOMString methods, such as |
| appendData(), are not thread safe and the application must |
| guarantee that no other methods (including const methods) are |
| executed concurrently with them.</p> |
| </a> |
| </faq> |
| |
| |
| <faq title="Can I validate the data contained in a DOM tree?"> |
| <q>Can I validate the data contained in a DOM tree?</q> |
| <a><p>Given that I have built a DOM tree, is there a fiacility |
| in xerces-c that wil then validate the document contained in that |
| DOM tree? That is, without having to re-parse the source document, |
| walk the tree and perform validation?</p> |
| |
| <p>No. This is a frequently requested feature, but at this time |
| it is not possible to feed xml data from the DOM directly back to |
| the DTD validator. The best option for now is to generate xml |
| source from the DOM and feed that back into the parser.</p> |
| </a> |
| </faq> |
| |
| |
| <faq title="Why does my multi-threaded application crash on Solaris?"> |
| <q>Why does my multi-threaded application crash on Solaris?</q> |
| <a> |
| <p>The problem appears because the throw call on Solaris 2.6 |
| is not multi-thread safe. Sun Microsystems provides a patch to |
| solve this problem. To get the latest patch for solving this |
| problem, go to <jump href="http://sunsolve.sun.com">SunSolve.sun.com</jump> |
| and get the appropriate patch for your operating system. |
| For Intel machines running Solaris, you need to get Patch ID 104678. |
| For SPARC machines you need to get Patch ID #105591.</p> |
| </a> |
| </faq> |
| |
| <faq title="Why does my application gives unresolved linking errors on Solaris?"> |
| <q>Why does my application gives unresolved linking errors on Solaris?</q> |
| |
| <a> |
| <p>On Solaris there are couple of things that needs to be taken care before |
| you proceed to execute your application using Xerces / XML4C. In case you're |
| using the binary build of Xerces / XML4C make sure that the your OS and the |
| compiler are of the same version as the one on which the binary was build. |
| This might cause unresolved linking problems or compilation errors. |
| In this case rebuild the source on your system before building your application |
| with it. If you're using ICU (which is packaged with XML4C) you need to |
| rebuild the compatible version of ICU first.</p> |
| |
| <p>Also make sure the library path is set properly and you have the correct version of |
| <code>gmake</code> and <code>autoconf</code> in your system.</p> |
| </a> |
| </faq> |
| |
| |
| <faq title="How do I find out what version of &XercesCProjectName; I am using?"> |
| <q>How do I find out what version of &XercesCProjectName; I am using?</q> |
| <a> |
| <p>The version string for &XercesCProjectName; happens to be in one of |
| the source files. Look inside the file |
| <code>src/util/XML4CDefs.hpp</code> and find out what the |
| static variable <code>gXML4CFullVersionStr</code> is defined |
| to be. (It is usually of type 3.0.0 or something |
| similar). This is the version of XML you are using.</p> |
| |
| <p>If you don't have the source code, you have to find the version |
| information from the shared library name. On Windows NT/95/98 |
| right click on the DLL name &XercesCWindowsLib;.dll in the bin directory |
| and look up properties. The version information may be found on |
| the Version tab.</p> |
| |
| <p>On AIX, just look for the library name &XercesCUnixLib;.a (or |
| &XercesCUnixLib;.so on Solaris/Linux and &XercesCUnixLib;.sl on |
| HP-UX). The version number is coded in the name of the |
| library.</p> |
| </a> |
| </faq> |
| |
| <faq title="How do I uninstall &XercesCProjectName;?"> |
| <q>How do I uninstall &XercesCProjectName;?</q> |
| <a> |
| <p>&XercesCProjectName; only installs itself in a single directory and |
| does not set any registry entries. Thus, to un-install, you |
| only need to remove the directory where you installed it, and |
| all &XercesCProjectName; related files will be removed.</p> |
| </a> |
| </faq> |
| |
| <faq title="How are entity reference nodes handled in DOM?"> |
| <q>How are entity reference nodes handled in DOM?</q> |
| <a> |
| <p>If you are using the native DOM classes, the function |
| <code>setExpandEntityReferences</code> controls how entities appear in the |
| DOM tree. When setExpandEntityReferences is set to false (the |
| default), an occurance of an entity reference in the XML |
| document will be represented by a subtree with an |
| EntityReference node at the root whose children represent the |
| entity expansion. Entity expansion will be a DOM tree |
| representing the structure of the entity expansion, not a text |
| node containing the entity expansion as text.</p> |
| |
| <p>If setExpandEntityReferences is true, an entity reference in the |
| XML document is represented by only the nodes that represent the |
| entity expansion. The DOM tree will not contain any |
| entityReference nodes.</p> |
| </a> |
| </faq> |
| |
| <faq title="What kinds of URLs are currently supported in &XercesCProjectName;?"> |
| <q>What kinds of URLs are currently supported in &XercesCProjectName;?</q> |
| <a> |
| |
| <p>The <code>XMLURL</code> class provides for limited URL support. It understands |
| the <code>file://, http://</code>, and <code>ftp://</code> URL types, and is |
| capable or parsing them into their constituent components, and normalizing |
| them. It also supports the commonly required action of conglomerating a |
| base and relative URL into a single URL. In other words, it performs the |
| limited set of functions required by an XML parser.</p> |
| |
| <p>Another thing that URLs commonly do are to create an input stream that |
| provides access to the entity referenced. The parser, as shipped, only |
| supports this functionality on URLs in the form <code>file:///</code> and |
| <code>file://localhost/</code>, i.e. only when the URL refers to a local file.</p> |
| |
| <p>You may enable support for HTTP and FTP URLs by implementing and installing |
| a NetAccessor object. When a NetAccessor object is installed, the URL class |
| will use it to create input streams for the remote entities refered to by such URLs.</p> |
| </a> |
| </faq> |
| |
| <faq title="How can I add support for URL's with HTTP/FTP protocols?"> |
| <q>How can I add support for URL's with HTTP/FTP protocols?</q> |
| <a> |
| <p>Support for the http: protocol is now included by default on all |
| platforms.</p> |
| <p>To address the need to make remote connections to resources |
| specified using additional protocols, ftp for example, Xerces-C |
| provides the <code>NetAccessor</code> interface. The header |
| file is <code>src/util/XMLNetAccessor.hpp</code>. This interface |
| allows you to plug in your own implementation of URL networking |
| code into the Xerces-C parser.</p> |
| </a> |
| </faq> |
| |
| |
| <faq title="Can I use &XercesCProjectName; to parse HTML?"> |
| <q>Can I use &XercesCProjectName; to parse HTML?</q> |
| <a> |
| <p>Yes, if it follows the XML spec rules. Most HTML, however, |
| does not follow the XML rules, and will therefore generate XML |
| well-formedness errors.</p> |
| </a> |
| </faq> |
| |
| <faq title="I keep getting an error: "invalid UTF-8 character". What's wrong?"> |
| <q>I keep getting an error: "invalid UTF-8 character". What's wrong?</q> |
| <a> |
| <p>Most commonly, the xml <code>encoding =</code> declaration is |
| either incorrect or missing. Without a declaration, xml defaults |
| to the use utf-8 character encoding, which is not compatible with |
| the default text file encoding on most systems.</p> |
| <p>The xml declaration should look something like this: </p> |
| <p><code><?xml version="1.0" encoding="iso-8859-1"?></code></p> |
| <p>Make sure to specify the encoding that is actually used by file. |
| The encoding for "plain" text files depends both on the operating system |
| and the locale (country and language) in use.</p> |
| |
| <p>Another common source of problems is that some characters are not allowed in |
| XML documents, according to the XML spec. Typical |
| disallowed characters are control characters, even if you |
| escape them using the Character Reference form. See the |
| <jump href="http://www.w3.org/TR/REC-xml#charsets">XML spec</jump>, |
| sections 2.2 and 4.1 for details. If the parser is |
| generating an <code>Invalid character (Unicode: 0x???)</code> error, |
| it is very likely that there's a |
| character in there that you can't see. You can generally use |
| a UNIX command like "od -hc" to find it.</p> |
| </a> |
| </faq> |
| |
| <faq title="What encodings are supported by Xerces-C / XML4C?"> |
| <q>What encodings are supported by Xerces-C / XML4C?</q> |
| <a> |
| |
| <p>Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 |
| (Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and |
| IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can parse |
| input XML files in these above mentioned encodings.</p> |
| |
| <p>XML4C - the version of Xerces-C available from IBM - extends |
| this set to include the encodings listed in the table below.</p> |
| |
| <table> |
| <tr><td><em>Common Name</em></td><td><em>Use this name in XML</em></td></tr> |
| <tr><td>8 bit Unicode</td> <td>UTF-8</td></tr> |
| <tr><td>ISO Latin 1</td> <td>ISO-8859-1</td></tr> |
| <tr><td>ISO Latin 2</td> <td>ISO-8859-2</td></tr> |
| <tr><td>ISO Latin 3</td> <td>ISO-8859-3</td></tr> |
| <tr><td>ISO Latin 4</td> <td>ISO-8859-4</td></tr> |
| <tr><td>ISO Latin Cyrillic</td> <td>ISO-8859-5</td></tr> |
| <tr><td>ISO Latin Arabic</td> <td>ISO-8859-6</td></tr> |
| <tr><td>ISO Latin Greek</td> <td>ISO-8859-7</td></tr> |
| <tr><td>ISO Latin Hebrew</td> <td>ISO-8859-8</td></tr> |
| <tr><td>ISO Latin 5</td> <td>ISO-8859-9</td></tr> |
| <tr><td>EBCDIC US</td> <td>ebcdic-cp-us</td></tr> |
| <tr><td>EBCDIC with Euro symbol</td> <td>ibm1140</td></tr> |
| <tr><td>Chinese, PRC</td> <td>gb2312</td></tr> |
| <tr><td>Chinese, Big5</td> <td>Big5</td></tr> |
| <tr><td>Cyrillic</td> <td>koi8-r</td></tr> |
| <tr><td>Japanese, Shift JIS</td> <td>Shift_JIS</td></tr> |
| <tr><td>Korean, Extended UNIX code</td> <td>euc-kr</td></tr> |
| </table> |
| |
| <p>Some implementations or ports of Xerces-C provide support for |
| additional encodings. The exact set will depend on the supplier |
| of the parser and on the character set transcoding services in use.</p> |
| </a> |
| </faq> |
| |
| <faq title="What character encoding should I use when creating XML documents?"> |
| <q>What character encoding should I use when creating XML documents?</q> |
| <a> |
| |
| <p>The best choice in most cases is either utf-8 or utf-16. |
| Advantages of these encodings include </p> |
| |
| <ul> |
| <li>The best portability. These encodings are more widely |
| supported by XML processors than any others, meaning that |
| your documents will have the best possible chance of being |
| read correctly, no matter where they end up. </li> |
| |
| <li>Full international character support. Both utf-8 and |
| utf-16 cover the full Unicode character set, which |
| includes all of the characters from all major national, |
| international and industry character sets. </li> |
| |
| <li>Efficient. utf-8 has the smaller storage requirements |
| for documents that are primarily composed of of characters |
| from the Latin alphabet. utf-16 is more efficient for |
| encoding Asian languages. But both encodings cover |
| all languages without loss.</li> |
| </ul> |
| |
| <p>The only drawback of utf-8 or utf-16 is that they are not |
| the native text file format for most systems, meaning that |
| common text file editors and viewers can not be directly used.</p> |
| |
| <p>A second choice of encoding would be any of the others listed in |
| the table above. This works best when the xml encoding is the same |
| as the default system encoding on the machine where the |
| XML document is being prepared, because the document will then |
| display correctly as a plain text file. For UNIX systems |
| in countries speaking Western European languages, the encoding |
| will usually be iso-8859-1.</p> |
| |
| <p>The versions of Xerces, both C and Java, distributed |
| by IBM as XML4C and XML4J, include all of the encodings |
| listed in the above table, on all platforms. </p> |
| |
| <p>A word of caution for Windows users: The default character set |
| on Windows systems is windows-1252, not iso-8859-1. While Xerces-c |
| does recognize this Windows encoding, it is a poor choice for portable |
| XML data because it is not widely recoginized by other XML processing |
| tools. If you are using a Windows based editing tool to generate |
| XML, check which character set it generates, and make sure that the |
| resulting XML specifies the correct name in the encoding="..." declaration.</p> |
| </a> |
| </faq> |
| |
| <faq title="I find memory leaks in Xerces-C / XML4C. How do I eliminate it?"> |
| <q>I find memory leaks in Xerces-C / XML4C. How do I eliminate it?</q> |
| <a> |
| |
| <p>The "leaks" that are reported through a leak-detector or heap-analysis tools |
| aren't really leaks in most application, in that the memory usage does not grow over |
| time as the XML parser is used and re-used.</p> |
| |
| <p>What you are seeing as leaks are actually lazily evaluated data allocated into |
| static variables. It gets released when the application ends. Now you can make a call |
| to <code>XMLPlatformUtil::terminate()</code> to release all the lazily allocated |
| variables before you exit your program.</p> |
| </a> |
| </faq> |
| |
| |
| <faq title="Is EBCDIC supported?"> |
| <q>Is EBCDIC supported?</q> |
| |
| <a> |
| <p>Yes, &XercesCName; supports EBCDIC. When creating EBCDIC encoded XML data, |
| the preferred encoding is ibm1140. Also supported is ibm037 (and its alternate name, |
| ebcdic-cp-us); this encoding is almost the same as ibm1140, but it lacks the Euro |
| symbol</p> |
| |
| <p>These two encodings, ibm1140 and ibm037, are available on both Xerces-C and |
| IBM XML4C, on all platforms. </p> |
| |
| <p>On IBM System 390, XML4C also supports two alternative forms, ibm037-s390 |
| and ibm1140-s390. These are similar to the base ibm037 and ibm1140 encodings, |
| but with alternate mappings of the EBCDIC new-line character, which allows |
| them to appear as normal text files on System 390s. These encodings are not |
| supported on other platforms, and should not be used for portable data.</p> |
| |
| <p>XML4C on System 390 and AS/400 also provides additional EBCDIC encodings, including |
| those for the character sets of different countries. The exact set supported |
| will be platform dependent, and these encodings are not recommended for |
| portable XML data. </p> |
| </a> |
| </faq> |
| |
| </faqs> |
| |