blob: 63ffead3b3cb1c2887bc47e938e2eca7e16cf9fe [file] [log] [blame]
<?xml version="1.0" encoding = "iso-8859-1" standalone="no"?>
<!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd">
<faqs title="Parsing with &XercesCName;">
<faq title="Why does my application crash on AIX when I run it under a
multi-threaded environment?">
<q>Why does my application crash on AIX when I run it under a
multi-threaded environment?</q>
<a>
<p>AIX maintains two kinds of libraries on the system,
thread-safe and non-thread safe. Multi-threaded libraries on
AIX follow a different naming convention, Usually the
multi-threaded library names are followed with "_r". For
example, libc.a is single threaded whereas libc_r.a is
multi-threaded.</p>
<p>To make your multi-threaded application run on AIX, you
MUST ensure that you do not have a 'system library path' in
your LIBPATH environment variable when you run the
application. The appropriate libraries (threaded or
non-threaded) are automatically picked up at runtime. An
application usually crashes when you build your application
for multi-threaded operation but don't point to the
thread-safe version of the system libraries. For example,
LIBPATH can be simply set as:</p>
<source>LIBPATH=$HOME/&lt;&XercesCProjectName;&gt;/lib</source>
<p>Where &lt;&XercesCProjectName;&gt; points to the directory where
&XercesCProjectName; application resides.</p>
<p>If for any reason, unrelated to &XercesCProjectName;, you need to
keep a 'system library path' in your LIBPATH environment
variable, you must make sure that you have placed the
thread-safe path before you specify the normal system
path. For example, you must place <ref>/lib/threads</ref> before
<ref>/lib</ref> in your LIBPATH variable. That is to say your
LIBPATH may look like this:</p>
<source>export LIBPATH=$HOME/&lt;&XercesCProjectName;&gt;/lib:/usr/lib/threads:/usr/lib</source>
<p>Where /usr/lib is where your system libraries are.</p>
</a>
</faq>
<faq title="What compilers are being used on the supported platforms?">
<q>What compilers are being used on the supported platforms?</q>
<a>
<p>&XercesCProjectName; has been built on the following platforms with these
compilers</p>
<table>
<tr><td><em>Operating System</em></td><td><em>Compiler</em></td></tr>
<tr><td>Windows NT 4.0 SP5/98</td><td>MSVC 6.0 SP3</td></tr>
<tr><td>Redhat Linux 6.1</td><td>egcs-2.91.66 and glibc-2.1.2-11</td></tr>
<tr><td>AIX 4.2.1 and higher</td><td>xlC 3.6.4</td></tr>
<tr><td>Solaris 2.6</td><td>CC Workshop 4.2</td></tr>
<tr><td>HP-UX 10.2</td><td>CC A.10.36</td></tr>
<tr><td>HP-UX 11.0</td><td>aCC A.03.13 with pthreads</td></tr>
</table>
</a>
</faq>
<faq title="I cannot run my sample applications. What is wrong?">
<q>I cannot run my sample applications. What is wrong?</q>
<a>
<p>In order to run an application built using &XercesCProjectName; you
must set up your path and library search path properly. In the
standalone version from Apache, you must have the &XercesCName; runtime library
available from your path settings. On Windows this library is called
<code>&XercesCWindowsLib;.dll</code> which must be available from your <code>PATH</code>
settings. (Note that now there are separate debug and release dlls for Windows.
If the release dll is named <code>&XercesCWindowsLib;.dll</code> then the debug dll is named
<code>&XercesCWindowsLib;d.dll)</code>.
On UNIX platforms the library is called <code>&XercesCUnixLib;.so</code>
(or <code>.a</code> or <code>.sl</code>) which must be available from your
<code>LD_LIBRARY_PATH</code> (or <code>LIBPATH</code> or <code>SHLIB_PATH</code>)
environment variable.</p>
<p>Thus, if you installed your binaries under <code>$HOME/fastxmlparser</code>,
you need to point your library path to that directory.
</p>
<source>export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux)
export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)</source>
<p>If you are using the enhanced version of this parser from IBM, you will need to
put in two additional DLLs. In the Windows build these are <code>icuuc.dll</code> and
<code>icudata.dll</code> which must be available from your PATH settings. On UNIX,
these libraries are called <code>libicu-uc.so</code> and <code>libicudata.so</code>
(or <code>.sl</code> for HP-UX or <code>.a</code> for AIX) which must be available from
your library search path.
</p>
</a>
</faq>
<faq title="I just built my own application using the &XercesCProjectName; parser. Why does it
crash?">
<q>I just built my own application using the &XercesCProjectName; parser. Why does it
crash?</q>
<a>
<p>In order to work with the &XercesCProjectName; parser, you have to
first initialize the XML subsystem. The most common mistake is
to forget this initialization. Before you make any calls to
&XercesCProjectName; APIs, you must call</p>
<source>XMLPlatformUtils::Initialize():
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException&amp; toCatch) {
// Do your failure processing here
}</source>
<p>This initializes the &XercesCProjectName; system and sets its
internal variables. Note that you must the include
<code>util/PlatformUtils.hpp</code> file for this to work.</p>
</a>
</faq>
<faq title="Is &XercesCProjectName; thread-safe?">
<q>Is &XercesCProjectName; thread-safe?</q>
<a>
<p>This is not a question that has a simple yes/no answer. Here are
the rules for using &XercesCProjectName; in a multi-threaded environment:</p>
<p>Within an address space, an instance of the parser may be used
without restriction from a single thread, or an instance of the
parser can be accessed from multiple threads, provided the
application guarantees that only one thread has entered a method
of the parser at any one time.</p>
<p>When two or more parser instances exist in a process, the
instances can be used concurrently, and without external
synchronization. That is, in an application containing two
parsers and two threads, one pareser can be running within the
first thread concurrently with the second parser running
within the second thread.</p>
<p>The same rules apply to &XercesCProjectName; DOM documents -
multiple document instances may be concurrently accessed from
different threads, but any given document instance can only be
accessed by one thread at a time.</p>
<p>DOMStrings allow multiple concurrent readers. All DOMString
const methods are thread safe, and can be concurrently entered
by multiple threads. Non-const DOMString methods, such as
appendData(), are not thread safe and the application must
guarantee that no other methods (including const methods) are
executed concurrently with them.</p>
</a>
</faq>
<faq title="Can I validate the data contained in a DOM tree?">
<q>Can I validate the data contained in a DOM tree?</q>
<a><p>Given that I have built a DOM tree, is there a fiacility
in xerces-c that wil then validate the document contained in that
DOM tree? That is, without having to re-parse the source document,
walk the tree and perform validation?</p>
<p>No. This is a frequently requested feature, but at this time
it is not possible to feed xml data from the DOM directly back to
the DTD validator. The best option for now is to generate xml
source from the DOM and feed that back into the parser.</p>
</a>
</faq>
<faq title="Why does my multi-threaded application crash on Solaris?">
<q>Why does my multi-threaded application crash on Solaris?</q>
<a>
<p>The problem appears because the throw call on Solaris 2.6
is not multi-thread safe. Sun Microsystems provides a patch to
solve this problem. To get the latest patch for solving this
problem, go to <jump href="http://sunsolve.sun.com">SunSolve.sun.com</jump>
and get the appropriate patch for your operating system.
For Intel machines running Solaris, you need to get Patch ID 104678.
For SPARC machines you need to get Patch ID #105591.</p>
</a>
</faq>
<faq title="Why does my application gives unresolved linking errors on Solaris?">
<q>Why does my application gives unresolved linking errors on Solaris?</q>
<a>
<p>On Solaris there are couple of things that needs to be taken care before
you proceed to execute your application using Xerces / XML4C. In case you're
using the binary build of Xerces / XML4C make sure that the your OS and the
compiler are of the same version as the one on which the binary was build.
This might cause unresolved linking problems or compilation errors.
In this case rebuild the source on your system before building your application
with it. If you're using ICU (which is packaged with XML4C) you need to
rebuild the compatible version of ICU first.</p>
<p>Also make sure the library path is set properly and you have the correct version of
<code>gmake</code> and <code>autoconf</code> in your system.</p>
</a>
</faq>
<faq title="How do I find out what version of &XercesCProjectName; I am using?">
<q>How do I find out what version of &XercesCProjectName; I am using?</q>
<a>
<p>The version string for &XercesCProjectName; happens to be in one of
the source files. Look inside the file
<code>src/util/XML4CDefs.hpp</code> and find out what the
static variable <code>gXML4CFullVersionStr</code> is defined
to be. (It is usually of type 3.0.0 or something
similar). This is the version of XML you are using.</p>
<p>If you don't have the source code, you have to find the version
information from the shared library name. On Windows NT/95/98
right click on the DLL name &XercesCWindowsLib;.dll in the bin directory
and look up properties. The version information may be found on
the Version tab.</p>
<p>On AIX, just look for the library name &XercesCUnixLib;.a (or
&XercesCUnixLib;.so on Solaris/Linux and &XercesCUnixLib;.sl on
HP-UX). The version number is coded in the name of the
library.</p>
</a>
</faq>
<faq title="How do I uninstall &XercesCProjectName;?">
<q>How do I uninstall &XercesCProjectName;?</q>
<a>
<p>&XercesCProjectName; only installs itself in a single directory and
does not set any registry entries. Thus, to un-install, you
only need to remove the directory where you installed it, and
all &XercesCProjectName; related files will be removed.</p>
</a>
</faq>
<faq title="How are entity reference nodes handled in DOM?">
<q>How are entity reference nodes handled in DOM?</q>
<a>
<p>If you are using the native DOM classes, the function
<code>setExpandEntityReferences</code> controls how entities appear in the
DOM tree. When setExpandEntityReferences is set to false (the
default), an occurance of an entity reference in the XML
document will be represented by a subtree with an
EntityReference node at the root whose children represent the
entity expansion. Entity expansion will be a DOM tree
representing the structure of the entity expansion, not a text
node containing the entity expansion as text.</p>
<p>If setExpandEntityReferences is true, an entity reference in the
XML document is represented by only the nodes that represent the
entity expansion. The DOM tree will not contain any
entityReference nodes.</p>
</a>
</faq>
<faq title="What kinds of URLs are currently supported in &XercesCProjectName;?">
<q>What kinds of URLs are currently supported in &XercesCProjectName;?</q>
<a>
<p>The <code>XMLURL</code> class provides for limited URL support. It understands
the <code>file://, http://</code>, and <code>ftp://</code> URL types, and is
capable or parsing them into their constituent components, and normalizing
them. It also supports the commonly required action of conglomerating a
base and relative URL into a single URL. In other words, it performs the
limited set of functions required by an XML parser.</p>
<p>Another thing that URLs commonly do are to create an input stream that
provides access to the entity referenced. The parser, as shipped, only
supports this functionality on URLs in the form <code>file:///</code> and
<code>file://localhost/</code>, i.e. only when the URL refers to a local file.</p>
<p>You may enable support for HTTP and FTP URLs by implementing and installing
a NetAccessor object. When a NetAccessor object is installed, the URL class
will use it to create input streams for the remote entities refered to by such URLs.</p>
</a>
</faq>
<faq title="How can I add support for URL's with HTTP/FTP protocols?">
<q>How can I add support for URL's with HTTP/FTP protocols?</q>
<a>
<p>Support for the http: protocol is now included by default on all
platforms.</p>
<p>To address the need to make remote connections to resources
specified using additional protocols, ftp for example, Xerces-C
provides the <code>NetAccessor</code> interface. The header
file is <code>src/util/XMLNetAccessor.hpp</code>. This interface
allows you to plug in your own implementation of URL networking
code into the Xerces-C parser.</p>
</a>
</faq>
<faq title="Can I use &XercesCProjectName; to parse HTML?">
<q>Can I use &XercesCProjectName; to parse HTML?</q>
<a>
<p>Yes, if it follows the XML spec rules. Most HTML, however,
does not follow the XML rules, and will therefore generate XML
well-formedness errors.</p>
</a>
</faq>
<faq title="I keep getting an error: &quot;invalid UTF-8 character&quot;. What's wrong?">
<q>I keep getting an error: "invalid UTF-8 character". What's wrong?</q>
<a>
<p>Most commonly, the xml <code>encoding =</code> declaration is
either incorrect or missing. Without a declaration, xml defaults
to the use utf-8 character encoding, which is not compatible with
the default text file encoding on most systems.</p>
<p>The xml declaration should look something like this: </p>
<p><code>&lt;?xml version="1.0" encoding="iso-8859-1"?></code></p>
<p>Make sure to specify the encoding that is actually used by file.
The encoding for "plain" text files depends both on the operating system
and the locale (country and language) in use.</p>
<p>Another common source of problems is that some characters are not allowed in
XML documents, according to the XML spec. Typical
disallowed characters are control characters, even if you
escape them using the Character Reference form. See the
<jump href="http://www.w3.org/TR/REC-xml#charsets">XML spec</jump>,
sections 2.2 and 4.1 for details. If the parser is
generating an <code>Invalid character (Unicode: 0x???)</code> error,
it is very likely that there's a
character in there that you can't see. You can generally use
a UNIX command like "od -hc" to find it.</p>
</a>
</faq>
<faq title="What encodings are supported by Xerces-C / XML4C?">
<q>What encodings are supported by Xerces-C / XML4C?</q>
<a>
<p>Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16
(Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and
IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can parse
input XML files in these above mentioned encodings.</p>
<p>XML4C - the version of Xerces-C available from IBM - extends
this set to include the encodings listed in the table below.</p>
<table>
<tr><td><em>Common Name</em></td><td><em>Use this name in XML</em></td></tr>
<tr><td>8 bit Unicode</td> <td>UTF-8</td></tr>
<tr><td>ISO Latin 1</td> <td>ISO-8859-1</td></tr>
<tr><td>ISO Latin 2</td> <td>ISO-8859-2</td></tr>
<tr><td>ISO Latin 3</td> <td>ISO-8859-3</td></tr>
<tr><td>ISO Latin 4</td> <td>ISO-8859-4</td></tr>
<tr><td>ISO Latin Cyrillic</td> <td>ISO-8859-5</td></tr>
<tr><td>ISO Latin Arabic</td> <td>ISO-8859-6</td></tr>
<tr><td>ISO Latin Greek</td> <td>ISO-8859-7</td></tr>
<tr><td>ISO Latin Hebrew</td> <td>ISO-8859-8</td></tr>
<tr><td>ISO Latin 5</td> <td>ISO-8859-9</td></tr>
<tr><td>EBCDIC US</td> <td>ebcdic-cp-us</td></tr>
<tr><td>EBCDIC with Euro symbol</td> <td>ibm1140</td></tr>
<tr><td>Chinese, PRC</td> <td>gb2312</td></tr>
<tr><td>Chinese, Big5</td> <td>Big5</td></tr>
<tr><td>Cyrillic</td> <td>koi8-r</td></tr>
<tr><td>Japanese, Shift JIS</td> <td>Shift_JIS</td></tr>
<tr><td>Korean, Extended UNIX code</td> <td>euc-kr</td></tr>
</table>
<p>Some implementations or ports of Xerces-C provide support for
additional encodings. The exact set will depend on the supplier
of the parser and on the character set transcoding services in use.</p>
</a>
</faq>
<faq title="What character encoding should I use when creating XML documents?">
<q>What character encoding should I use when creating XML documents?</q>
<a>
<p>The best choice in most cases is either utf-8 or utf-16.
Advantages of these encodings include </p>
<ul>
<li>The best portability. These encodings are more widely
supported by XML processors than any others, meaning that
your documents will have the best possible chance of being
read correctly, no matter where they end up. </li>
<li>Full international character support. Both utf-8 and
utf-16 cover the full Unicode character set, which
includes all of the characters from all major national,
international and industry character sets. </li>
<li>Efficient. utf-8 has the smaller storage requirements
for documents that are primarily composed of of characters
from the Latin alphabet. utf-16 is more efficient for
encoding Asian languages. But both encodings cover
all languages without loss.</li>
</ul>
<p>The only drawback of utf-8 or utf-16 is that they are not
the native text file format for most systems, meaning that
common text file editors and viewers can not be directly used.</p>
<p>A second choice of encoding would be any of the others listed in
the table above. This works best when the xml encoding is the same
as the default system encoding on the machine where the
XML document is being prepared, because the document will then
display correctly as a plain text file. For UNIX systems
in countries speaking Western European languages, the encoding
will usually be iso-8859-1.</p>
<p>The versions of Xerces, both C and Java, distributed
by IBM as XML4C and XML4J, include all of the encodings
listed in the above table, on all platforms. </p>
<p>A word of caution for Windows users: The default character set
on Windows systems is windows-1252, not iso-8859-1. While Xerces-c
does recognize this Windows encoding, it is a poor choice for portable
XML data because it is not widely recoginized by other XML processing
tools. If you are using a Windows based editing tool to generate
XML, check which character set it generates, and make sure that the
resulting XML specifies the correct name in the encoding="..." declaration.</p>
</a>
</faq>
<faq title="I find memory leaks in Xerces-C / XML4C. How do I eliminate it?">
<q>I find memory leaks in Xerces-C / XML4C. How do I eliminate it?</q>
<a>
<p>The "leaks" that are reported through a leak-detector or heap-analysis tools
aren't really leaks in most application, in that the memory usage does not grow over
time as the XML parser is used and re-used.</p>
<p>What you are seeing as leaks are actually lazily evaluated data allocated into
static variables. It gets released when the application ends. Now you can make a call
to <code>XMLPlatformUtil::terminate()</code> to release all the lazily allocated
variables before you exit your program.</p>
</a>
</faq>
<faq title="Is EBCDIC supported?">
<q>Is EBCDIC supported?</q>
<a>
<p>Yes, &XercesCName; supports EBCDIC. When creating EBCDIC encoded XML data,
the preferred encoding is ibm1140. Also supported is ibm037 (and its alternate name,
ebcdic-cp-us); this encoding is almost the same as ibm1140, but it lacks the Euro
symbol</p>
<p>These two encodings, ibm1140 and ibm037, are available on both Xerces-C and
IBM XML4C, on all platforms. </p>
<p>On IBM System 390, XML4C also supports two alternative forms, ibm037-s390
and ibm1140-s390. These are similar to the base ibm037 and ibm1140 encodings,
but with alternate mappings of the EBCDIC new-line character, which allows
them to appear as normal text files on System 390s. These encodings are not
supported on other platforms, and should not be used for portable data.</p>
<p>XML4C on System 390 and AS/400 also provides additional EBCDIC encodings, including
those for the character sets of different countries. The exact set supported
will be platform dependent, and these encodings are not recommended for
portable XML data. </p>
</a>
</faq>
</faqs>