| <?xml version="1.0" standalone="no"?> |
| <!DOCTYPE faqs SYSTEM "sbk:/style/dtd/faqs.dtd"> |
| |
| <faqs title="Parsing with &XercesCName;"> |
| <faq title="Why does my application crash on AIX when I run it under a |
| multi-threaded environment?"> |
| |
| <q>Why does my application crash on AIX when I run it under a |
| multi-threaded environment?</q> |
| |
| <a> |
| <p>AIX maintains two kinds of libraries on the system, |
| thread-safe and non-thread safe. Multi-threaded libraries on |
| AIX follow a different naming convention, Usually the |
| multi-threaded library names are followed with "_r". For |
| example, libc.a is single threaded whereas libc_r.a is |
| multi-threaded.</p> |
| |
| <p>To make your multi-threaded application run on AIX, you |
| MUST ensure that you do not have a 'system library path' in |
| your LIBPATH environment variable when you run the |
| application. The appropriate libraries (threaded or |
| non-threaded) are automatically picked up at runtime. An |
| application usually crashes when you build your application |
| for multi-threaded operation but don't point to the |
| thread-safe version of the system libraries. For example, |
| LIBPATH can be simply set as:</p> |
| |
| <source>LIBPATH=$HOME/<&XercesCProjectName;>/lib</source> |
| |
| <p>Where <&XercesCProjectName;> points to the directory where |
| &XercesCProjectName; application resides.</p> |
| |
| <p>If for any reason, unrelated to &XercesCProjectName;, you need to |
| keep a 'system library path' in your LIBPATH environment |
| variable, you must make sure that you have placed the |
| thread-safe path before you specify the normal system |
| path. For example, you must place <ref>/lib/threads</ref> before |
| <ref>/lib</ref> in your LIBPATH variable. That is to say your |
| LIBPATH may look like this:</p> |
| |
| <source>export LIBPATH=$HOME/<&XercesCProjectName;>/lib:/usr/lib/threads:/usr/lib</source> |
| |
| <p>Where /usr/lib is where your system libraries are.</p> |
| </a> |
| </faq> |
| |
| <faq title="What compilers are being used on the supported platforms?"> |
| |
| <q>What compilers are being used on the supported platforms?</q> |
| |
| <a> |
| <p>&XercesCProjectName; has been built on the following platforms with these |
| compilers</p> |
| |
| <table> |
| <tr><td><em>Operating System</em></td><td><em>Compiler</em></td></tr> |
| <tr><td>Windows NT SP5/98</td><td>MSVC 6.0</td></tr> |
| <tr><td>Redhat Linux 6.0</td><td>gcc</td></tr> |
| <tr><td>AIX 4.1.4 and higher</td><td>xlC 3.1</td></tr> |
| <tr><td>Solaris 2.6</td><td>CC version 4.2</td></tr> |
| <tr><td>HP-UX B10.2</td><td>aCC and CC</td></tr> |
| <tr><td>HP-UX B11</td><td>aCC and CC</td></tr> |
| </table> |
| </a> |
| </faq> |
| |
| <faq title="I cannot run my sample applications. What is wrong?"> |
| |
| <q>I cannot run my sample applications. What is wrong?</q> |
| <a> |
| <p>In order to run an application built using &XercesCProjectName; you |
| must set up your path and library search path properly. In the |
| standalone version from Apache, you must have the &XercesCName; runtime library |
| available from your path settings. On Windows this library is called |
| <code>&XercesCWindowsLib;.dll</code> which must be available from your <code>PATH</code> |
| settings. On UNIX platforms the library is called <code>&XercesCUnixLib;.so</code> |
| (or <code>.a</code> or <code>.sl</code>) which must be available from your |
| <code>LD_LIBRARY_PATH</code> (or <code>SHLIB_PATH</code> or <code>LIBPATH</code>) |
| environment variable.</p> |
| |
| <p>Thus, if you installed your binaries under <code>$HOME/fastxmlparser</code>, |
| you need to point your library path to that directory. |
| </p> |
| |
| <source>export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX) |
| export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux) |
| export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)</source> |
| |
| <p>If you are using the enhanced version of this parser from IBM, you will need to |
| put in two additional DLLs. In the Windows build these are <code>icuuc.dll</code> and |
| <code>icudata.dll</code> which must be available from your PATH settings. On UNIX, |
| these libraries are called <code>libicu-uc.so</code> and <code>libicudata.so</code> |
| (or <code>.sl</code> for HP-UX or <code>.a</code> for AIX) which must be available from |
| your library search path. |
| |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="I just built my own application using the &XercesCProjectName; parser. Why does it |
| crash?"> |
| |
| <q>I just built my own application using the &XercesCProjectName; parser. Why does it |
| crash?</q> |
| <a> |
| <p>In order to work with the &XercesCProjectName; parser, you have to |
| first initialize the XML subsystem. The most common mistake is |
| to forget this initialization. Before you make any calls to |
| &XercesCProjectName; APIs, you must call</p> |
| |
| <source>XMLPlatformUtils::Initialize(): |
| try { |
| XMLPlatformUtils::Initialize(); |
| } |
| catch (const XMLException& toCatch) { |
| // Do your failure processing here |
| }</source> |
| |
| <p>This initializes the &XercesCProjectName; system and sets its |
| internal variables. Note that you must the include |
| <code>util/PlatformUtils.hpp</code> file for this to work.</p> |
| </a> |
| </faq> |
| |
| <faq title="Is &XercesCProjectName; thread-safe?"> |
| |
| <q>Is &XercesCProjectName; thread-safe?</q> |
| |
| <a> |
| <p>This is not a question that has a simple yes/no answer. Here are |
| the rules for using &XercesCProjectName; in a multi-threaded environment:</p> |
| |
| <p>Within an address space, an instance of the parser may be used |
| without restriction from a single thread, or an instance of the |
| parser can be accessed from multiple threads, provided the |
| application guarantees that only one thread has entered a method |
| of the parser at any one time.</p> |
| |
| <p>When two or more parser instances exist in a process, the |
| instances can be used concurrently, and without external |
| synchronization. That is, in an application containing two |
| parsers and two threads, one pareser can be running within the |
| first thread concurrently with the second parser running |
| within the second thread.</p> |
| |
| <p>The same rules apply to &XercesCProjectName; DOM documents - |
| multiple document instances may be concurrently accessed from |
| different threads, but any given document instance can only be |
| accessed by one thread at a time.</p> |
| |
| <p>DOMStrings allow multiple concurrent readers. All DOMString |
| const methods are thread safe, and can be concurrently entered |
| by multiple threads. Non-const DOMString methods, such as |
| appendData(), are not thread safe and the application must |
| guarantee that no other methods (including const methods) are |
| executed concurrently with them.</p> |
| </a> |
| </faq> |
| |
| <faq title="Why does my multi-threaded application crash on Solaris?"> |
| <q>Why does my multi-threaded application crash on Solaris?</q> |
| <a> |
| <p>The problem appears because the throw call on Solaris 2.6 |
| is not multi-thread safe. Sun Microsystems provides a patch to |
| solve this problem. To get the latest patch for solving this |
| problem, go to <jump href="http://sunsolve.sun.com">SunSolve.sun.com</jump> |
| and get the appropriate patch for your operating system. |
| For Intel machines running Solaris, you need to get Patch ID 104678. |
| For SPARC machines you need to get Patch ID #105591.</p> |
| </a> |
| </faq> |
| |
| <faq title="How do I find out what version of &XercesCProjectName; I am using?"> |
| <q>How do I find out what version of &XercesCProjectName; I am using?</q> |
| <a> |
| <p>The version string for &XercesCProjectName; happens to be in one of |
| the source files. Look inside the file |
| <code>src/util/XML4CDefs.hpp</code> and find out what the |
| static variable <code>gXML4CFullVersionStr</code> is defined |
| to be. (It is usually of type 3.0.0 or something |
| similar). This is the version of XML you are using.</p> |
| |
| <p>If you don't have the source code, you have to find the version |
| information from the shared library name. On Windows NT/95/98 |
| right click on the DLL name &XercesCWindowsLib;.dll in the bin directory |
| and look up properties. The version information may be found on |
| the Version tab.</p> |
| |
| <p>On AIX, just look for the library name &XercesCUnixLib;.a (or |
| &XercesCUnixLib;.so on Solaris/Linux and &XercesCUnixLib;.sl on |
| HP-UX). The version number is coded in the name of the |
| library.</p> |
| </a> |
| </faq> |
| |
| <faq title="How do I uninstall &XercesCProjectName;?"> |
| <q>How do I uninstall &XercesCProjectName;?</q> |
| <a> |
| <p>&XercesCProjectName; only installs itself in a single directory and |
| does not set any registry entries. Thus, to un-install, you |
| only need to remove the directory where you installed it, and |
| all &XercesCProjectName; related files will be removed.</p> |
| </a> |
| </faq> |
| |
| <faq title="How are entity reference nodes handled in DOM?"> |
| <q>How are entity reference nodes handled in DOM?</q> |
| <a> |
| <p>If you are using the native DOM classes, the function |
| <code>setExpandEntityReferences</code> controls how entities appear in the |
| DOM tree. When setExpandEntityReferences is set to false (the |
| default), an occurance of an entity reference in the XML |
| document will be represented by a subtree with an |
| EntityReference node at the root whose children represent the |
| entity expansion. Entity expansion will be a DOM tree |
| representing the structure of the entity expansion, not a text |
| node containing the entity expansion as text.</p> |
| |
| <p>If setExpandEntityReferences is true, an entity reference in the |
| XML document is represented by only the nodes that represent the |
| entity expansion. The DOM tree will not contain any |
| entityReference nodes.</p> |
| </a> |
| </faq> |
| |
| <faq title="What kinds of URLs are currently supported in &XercesCProjectName;?"> |
| <q>What kinds of URLs are currently supported in &XercesCProjectName;?</q> |
| <a> |
| |
| <p>The <code>XMLURL</code> class provides for limited URL support. It understands |
| the <code>file://, http://</code>, and <code>ftp://</code> URL types, and is |
| capable or parsing them into their constituent components, and normalizing |
| them. It also supports the commonly required action of conglomerating a |
| base and relative URL into a single URL. In other words, it performs the |
| limited set of functions required by an XML parser.</p> |
| |
| <p>Another thing that URLs commonly do are to create an input stream that |
| provides access to the entity referenced. The parser, as shipped, only |
| supports this functionality on URLs in the form <code>file:///</code> and |
| <code>file://localhost/</code>, i.e. only when the URL refers to a local file.</p> |
| |
| <p>You may enable support for HTTP and FTP URLs by implementing and installing |
| a NetAccessor object. When a NetAccessor object is installed, the URL class |
| will use it to create input streams for the remote entities refered to by such URLs.</p> |
| |
| |
| </a> |
| </faq> |
| |
| <faq title="How can I add support for URL's with HTTP/FTP protocols?"> |
| <q>How can I add support for URL's with HTTP/FTP protocols?</q> |
| <a> |
| <p>To address the need to make remote connections to resources |
| specified using other protocols like HTTP, FTP etc..., Xerces-C |
| now provides the <code>NetAccessor</code> interface. The header |
| file is <code>src/util/XMLNetAccessor.hpp</code>. This interface |
| allows you to plug in your own implementation of URL networking |
| code into the Xerces-C parser.</p> |
| |
| <p>One such implementation <em>(tested minimally under WinNT |
| only)</em> is already provided in &XercesCName; source code |
| drop, using <jump href="http://www.w3.org/Library/">W3C's Libwww |
| library</jump>. Libwww is available for free and has been ported |
| to various platforms. Click <jump |
| href="build.html#BuildUsingLibwww">here</jump> to read how you |
| can rebuild Xerces-C binaries with this implementation.</p> |
| |
| <p>Some more notes about the NetAccessor implementation using |
| Libwww:</p> |
| |
| <ul> |
| <li>This implementation only supports HTTP and does not return |
| adequate error messages when connections cannot be made to the |
| remote resources. It however illustrates how you can add support |
| for HTTP and FTP URL's.</li> |
| |
| <li>The Xerces-C team will <em>NOT</em> be able to address any |
| questions related to how things work in Libwww. You can get some |
| help with Libwww by subscribing to the <<jump |
| href="mailto:www-lib-request@w3.org?subject=subscribe">www-lib@w3.org</jump>> |
| public mailing list.</li> |
| |
| <li>However, we will welcome any feedback on the design of the |
| NetAccessor interface. Please send all such feedback to <jump |
| href="mailto:xerces-dev@xml.apache.org">xerces-dev@xml.apache.org</jump>.</li> |
| |
| <li>You do not have to recompile &XercesCName; to plugin your |
| NetAccessor implementation. You can simply point the static |
| pointer variable <code>XMLPlatformUtils::fgNetAccessor</code> to |
| an instance of your NetAccessor implementation. Please refer to |
| the files <code>src/util/PlatformUtils.cpp</code> and |
| <code>src/util/Platforms/Win32/Win32PlatformUtils.cpp</code> to |
| see how we have done this simple illustrative |
| implementation.</li> |
| |
| </ul> |
| </a> |
| </faq> |
| |
| |
| <faq title="Can I use &XercesCProjectName; to parse HTML?"> |
| <q>Can I use &XercesCProjectName; to parse HTML?</q> |
| <a> |
| <p>Yes, if it follows the XML spec rules. Most HTML, however, |
| does not follow the XML rules, and will therefore generate XML |
| well-formedness errors.</p> |
| </a> |
| </faq> |
| |
| <faq title="I keep getting an error: "invalid UTF-8 character". What's wrong?"> |
| <q>I keep getting an error: "invalid UTF-8 character". What's wrong?</q> |
| <a> |
| <p>There are many Unicode characters that are not allowed in |
| your XML document, according to the XML spec. Typical |
| disallowed characters are control characters, even if you |
| escape them using the Character Reference form: See the XML |
| spec, sections 2.2 and 4.1 for details. If the parser is |
| generating this error, it is very likely that there's a |
| character in there that you can't see. You can generally use |
| a UNIX command like "od -hc" to find it.</p> |
| |
| <p>Another reason for this error is that your file is in some |
| non UTF/ASCII encoding but you gave no encoding="" string in |
| your file to tell the parser what its real encoding is.</p> |
| </a> |
| </faq> |
| |
| <faq title="What encodings are supported by Xerces-C / XML4C?"> |
| <q>What encodings are supported by Xerces-C / XML4C?</q> |
| <a> |
| |
| <p>Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 |
| (Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and |
| IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can parse |
| input XML files in these above mentioned encodings.</p> |
| |
| <p>XML4C - the version of Xerces-C available from IBM - extends |
| this set to include the encodings listed in the table below.</p> |
| |
| <table> |
| <tr><td><em>Common Name</em></td><td><em>Use this name in XML</em></td></tr> |
| <tr><td>8 bit Unicode</td> <td>UTF-8</td></tr> |
| <tr><td>ISO Latin 1</td> <td>ISO-8859-1</td></tr> |
| <tr><td>ISO Latin 2</td> <td>ISO-8859-2</td></tr> |
| <tr><td>ISO Latin 3</td> <td>ISO-8859-3</td></tr> |
| <tr><td>ISO Latin 4</td> <td>ISO-8859-4</td></tr> |
| <tr><td>ISO Latin Cyrillic</td> <td>ISO-8859-5</td></tr> |
| <tr><td>ISO Latin Arabic</td> <td>ISO-8859-6</td></tr> |
| <tr><td>ISO Latin Greek</td> <td>ISO-8859-7</td></tr> |
| <tr><td>ISO Latin Hebrew</td> <td>ISO-8859-8</td></tr> |
| <tr><td>ISO Latin 5</td> <td>ISO-8859-9</td></tr> |
| <tr><td>EBCDIC US</td> <td>ebcdic-cp-us</td></tr> |
| <tr><td>EBCDIC with Euro symbol</td> <td>ibm1140</td></tr> |
| <tr><td>Chinese, PRC</td> <td>gb2312</td></tr> |
| <tr><td>Chinese, Big5</td> <td>Big5</td></tr> |
| <tr><td>Cyrillic</td> <td>koi8-r</td></tr> |
| <tr><td>Japanese, Shift JIS</td> <td>Shift_JIS</td></tr> |
| <tr><td>Korean, Extended UNIX code</td> <td>euc-kr</td></tr> |
| </table> |
| |
| <p>Some implementations or ports of Xerces-C provide support for |
| additional encodings. The exact set will depend on the supplier |
| of the parser and on the character set transcoding services in use. |
| </p> |
| |
| </a> |
| </faq> |
| |
| <faq title="What character encoding should I use when creating XML documents?"> |
| <q>What character encoding should I use when creating XML documents?</q> |
| <a> |
| |
| <p>The best choice in most cases is either utf-8 or utf-16. |
| Advantages of these encodings include </p> |
| |
| <ul> |
| <li>The best portability. These encodings are more widely |
| supported by XML processors than any others, meaning that |
| your documents will have the best possible chance of being |
| read correctly, no matter where they end up. </li> |
| |
| <li>Full international character support. Both utf-8 and |
| utf-16 cover the full Unicode character set, which |
| includes all of the characters from all major national, |
| international and industry character sets. </li> |
| |
| <li>Efficient. utf-8 has the smaller storage requirements |
| for documents that are primarily composed of of characters |
| from the Latin alphabet. utf-16 is more efficient for |
| encoding Asian languages. But both encodings cover |
| all languages without loss.</li> |
| </ul> |
| |
| <p>The only drawback of utf-8 or utf-16 is that they are not |
| the native text file format for most systems, meaning that |
| common text file editors and viewers can not be directly used.</p> |
| |
| <p>A second choice of encoding would be any of the others listed in |
| the table above. This works best when the xml encoding is the same |
| as the default system encoding on the machine where the |
| XML document is being prepared, because the document will then |
| display correctly as a plain text file. For UNIX systems |
| in countries speaking Western European languages, the encoding |
| will usually be iso-8859-1.</p> |
| |
| <p>The versions of Xerces, both C and Java, distributed |
| by IBM as XML4C and XML4J, include all of the encodings |
| listed in the above table, on all platforms. </p> |
| |
| <p>A word of caution for Windows users: The default character set |
| on Windows systems is windows-1252, not iso-8859-1. While Xerces-c |
| does recognize this Windows encoding, it is a poor choice for portable |
| XML data because it is not widely recoginized by other XML processing |
| tools. If you are using a Windows based editing tool to generate |
| XML, check which character set it generates, and make sure that the |
| resulting XML specifies the correct name in the encoding="..." declaration.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Is EBCDIC supported?"> |
| <q>Is EBCDIC supported?</q> |
| |
| <a> |
| <p>Yes, &XercesCName; supports EBCDIC. When creating EBCDIC encoded XML data, |
| the preferred encoding is ibm1140. Also supported is ibm037 (and its alternate name, |
| ebcdic-cp-us); this encoding is almost the same as ibm1140, but it lacks the Euro |
| symbol</p> |
| |
| <p>These two encodings, ibm1140 and ibm037, are available on both Xerces-C and |
| IBM XML4C, on all platforms. </p> |
| |
| <p>On IBM System 390, XML4C also supports two alternative forms, ibm037-s390 |
| and ibm1140-s390. These are similar to the base ibm037 and ibm1140 encodings, |
| but with alternate mappings of the EBCDIC new-line character, which allows |
| them to appear as normal text files on System 390s. These encodings are not |
| supported on other platforms, and should not be used for portable data.</p> |
| |
| <p>XML4C on System 390 and AS/400 also provides additional EBCDIC encodings, including |
| those for the character sets of different countries. The exact set supported |
| will be platform dependent, and these encodings are not recommended for |
| portable XML data. </p> |
| |
| </a> |
| </faq> |
| |
| </faqs> |
| |