blob: f749f6c1a135d956a979063a6e51dbb60e5ed875 [file] [log] [blame]
<?xml version="1.0" standalone="no"?>
<!DOCTYPE faqs SYSTEM "sbk:/style/dtd/faqs.dtd">
<faqs title="Parsing with &XercesCName;">
<faq title="Why does my application crash on AIX when I run it under a
multi-threaded environment?">
<q>Why does my application crash on AIX when I run it under a
multi-threaded environment?</q>
<a>
<p>AIX maintains two kinds of libraries on the system,
thread-safe and non-thread safe. Multi-threaded libraries on
AIX follow a different naming convention, Usually the
multi-threaded library names are followed with "_r". For
example, libc.a is single threaded whereas libc_r.a is
multi-threaded.</p>
<p>To make your multi-threaded application run on AIX, you
MUST ensure that you do not have a 'system library path' in
your LIBPATH environment variable when you run the
application. The appropriate libraries (threaded or
non-threaded) are automatically picked up at runtime. An
application usually crashes when you build your application
for multi-threaded operation but don't point to the
thread-safe version of the system libraries. For example,
LIBPATH can be simply set as:</p>
<source>LIBPATH=$HOME/&lt;&XercesCProjectName;&gt;/lib</source>
<p>Where &lt;&XercesCProjectName;&gt; points to the directory where
&XercesCProjectName; application resides.</p>
<p>If for any reason, unrelated to &XercesCProjectName;, you need to
keep a 'system library path' in your LIBPATH environment
variable, you must make sure that you have placed the
thread-safe path before you specify the normal system
path. For example, you must place <ref>/lib/threads</ref> before
<ref>/lib</ref> in your LIBPATH variable. That is to say your
LIBPATH may look like this:</p>
<source>export LIBPATH=$HOME/&lt;&XercesCProjectName;&gt;/lib:/usr/lib/threads:/usr/lib</source>
<p>Where /usr/lib is where your system libraries are.</p>
</a>
</faq>
<faq title="What compilers are being used on the supported platforms?">
<q>What compilers are being used on the supported platforms?</q>
<a>
<p>&XercesCProjectName; has been built on the following platforms with these
compilers</p>
<table>
<tr><td><em>Operating System</em></td><td><em>Compiler</em></td></tr>
<tr><td>Windows NT SP5/98</td><td>MSVC 6.0</td></tr>
<tr><td>Redhat Linux 6.0</td><td>gcc</td></tr>
<tr><td>AIX 4.1.4 and higher</td><td>xlC 3.1</td></tr>
<tr><td>Solaris 2.6</td><td>CC version 4.2</td></tr>
<tr><td>HP-UX B10.2</td><td>aCC and CC</td></tr>
<tr><td>HP-UX B11</td><td>aCC and CC</td></tr>
</table>
</a>
</faq>
<faq title="I cannot run my sample applications. What is wrong?">
<q>I cannot run my sample applications. What is wrong?</q>
<a>
<p>In order to run an application built using &XercesCProjectName; you
must set up your path and library search path properly. In the
standalone version from Apache, you must have the &XercesCName; runtime library
available from your path settings. On Windows this library is called
<code>&XercesCWindowsLib;.dll</code> which must be available from your <code>PATH</code>
settings. On UNIX platforms the library is called <code>&XercesCUnixLib;.so</code>
(or <code>.a</code> or <code>.sl</code>) which must be available from your
<code>LD_LIBRARY_PATH</code> (or <code>SHLIB_PATH</code> or <code>LIBPATH</code>)
environment variable.</p>
<p>Thus, if you installed your binaries under <code>$HOME/fastxmlparser</code>,
you need to point your library path to that directory.
</p>
<source>export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux)
export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)</source>
<p>If you are using the enhanced version of this parser from IBM, you will need to
put in two additional DLLs. In the Windows build these are <code>icuuc.dll</code> and
<code>icudata.dll</code> which must be available from your PATH settings. On UNIX,
these libraries are called <code>libicu-uc.so</code> and <code>libicudata.so</code>
(or <code>.sl</code> for HP-UX or <code>.a</code> for AIX) which must be available from
your library search path.
</p>
</a>
</faq>
<faq title="I just built my own application using the &XercesCProjectName; parser. Why does it
crash?">
<q>I just built my own application using the &XercesCProjectName; parser. Why does it
crash?</q>
<a>
<p>In order to work with the &XercesCProjectName; parser, you have to
first initialize the XML subsystem. The most common mistake is
to forget this initialization. Before you make any calls to
&XercesCProjectName; APIs, you must call</p>
<source>XMLPlatformUtils::Initialize():
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException&amp; toCatch) {
// Do your failure processing here
}</source>
<p>This initializes the &XercesCProjectName; system and sets its
internal variables. Note that you must the include
<code>util/PlatformUtils.hpp</code> file for this to work.</p>
</a>
</faq>
<faq title="Is &XercesCProjectName; thread-safe?">
<q>Is &XercesCProjectName; thread-safe?</q>
<a>
<p>This is not a question that has a simple yes/no answer. Here are
the rules for using &XercesCProjectName; in a multi-threaded environment:</p>
<p>Within an address space, an instance of the parser may be used
without restriction from a single thread, or an instance of the
parser can be accessed from multiple threads, provided the
application guarantees that only one thread has entered a method
of the parser at any one time.</p>
<p>When two or more parser instances exist in a process, the
instances can be used concurrently, and without external
synchronization. That is, in an application containing two
parsers and two threads, one pareser can be running within the
first thread concurrently with the second parser running
within the second thread.</p>
<p>The same rules apply to &XercesCProjectName; DOM documents -
multiple document instances may be concurrently accessed from
different threads, but any given document instance can only be
accessed by one thread at a time.</p>
<p>DOMStrings allow multiple concurrent readers. All DOMString
const methods are thread safe, and can be concurrently entered
by multiple threads. Non-const DOMString methods, such as
appendData(), are not thread safe and the application must
guarantee that no other methods (including const methods) are
executed concurrently with them.</p>
</a>
</faq>
<faq title="Why does my multi-threaded application crash on Solaris?">
<q>Why does my multi-threaded application crash on Solaris?</q>
<a>
<p>The problem appears because the throw call on Solaris 2.6
is not multi-thread safe. Sun Microsystems provides a patch to
solve this problem. To get the latest patch for solving this
problem, go to <jump href="http://sunsolve.sun.com">SunSolve.sun.com</jump>
and get the appropriate patch for your operating system.
For Intel machines running Solaris, you need to get Patch ID 104678.
For SPARC machines you need to get Patch ID #105591.</p>
</a>
</faq>
<faq title="How do I find out what version of &XercesCProjectName; I am using?">
<q>How do I find out what version of &XercesCProjectName; I am using?</q>
<a>
<p>The version string for &XercesCProjectName; happens to be in one of
the source files. Look inside the file
<code>src/util/XML4CDefs.hpp</code> and find out what the
static variable <code>gXML4CFullVersionStr</code> is defined
to be. (It is usually of type 3.0.0 or something
similar). This is the version of XML you are using.</p>
<p>If you don't have the source code, you have to find the version
information from the shared library name. On Windows NT/95/98
right click on the DLL name &XercesCWindowsLib;.dll in the bin directory
and look up properties. The version information may be found on
the Version tab.</p>
<p>On AIX, just look for the library name &XercesCUnixLib;.a (or
&XercesCUnixLib;.so on Solaris/Linux and &XercesCUnixLib;.sl on
HP-UX). The version number is coded in the name of the
library.</p>
</a>
</faq>
<faq title="How do I uninstall &XercesCProjectName;?">
<q>How do I uninstall &XercesCProjectName;?</q>
<a>
<p>&XercesCProjectName; only installs itself in a single directory and
does not set any registry entries. Thus, to un-install, you
only need to remove the directory where you installed it, and
all &XercesCProjectName; related files will be removed.</p>
</a>
</faq>
<faq title="How are entity reference nodes handled in DOM?">
<q>How are entity reference nodes handled in DOM?</q>
<a>
<p>If you are using the native DOM classes, the function
<code>setExpandEntityReferences</code> controls how entities appear in the
DOM tree. When setExpandEntityReferences is set to false (the
default), an occurance of an entity reference in the XML
document will be represented by a subtree with an
EntityReference node at the root whose children represent the
entity expansion. Entity expansion will be a DOM tree
representing the structure of the entity expansion, not a text
node containing the entity expansion as text.</p>
<p>If setExpandEntityReferences is true, an entity reference in the
XML document is represented by only the nodes that represent the
entity expansion. The DOM tree will not contain any
entityReference nodes.</p>
</a>
</faq>
<faq title="What kinds of URLs are currently supported in &XercesCProjectName;?">
<q>What kinds of URLs are currently supported in &XercesCProjectName;?</q>
<a>
<p>The <code>XMLURL</code> class provides for limited URL support. It understands
the <code>file://, http://</code>, and <code>ftp://</code> URL types, and is
capable or parsing them into their constituent components, and normalizing
them. It also supports the commonly required action of conglomerating a
base and relative URL into a single URL. In other words, it performs the
limited set of functions required by an XML parser.</p>
<p>Another thing that URLs commonly do are to create an input stream that
provides access to the entity referenced. The parser, as shipped, only
supports this functionality on URLs in the form <code>file:///</code> and
<code>file://localhost/</code>, i.e. only when the URL refers to a local file.</p>
<p>You may enable support for HTTP and FTP URLs by implementing and installing
a NetAccessor object. When a NetAccessor object is installed, the URL class
will use it to create input streams for the remote entities refered to by such URLs.</p>
</a>
</faq>
<faq title="How can I add support for URL's with HTTP/FTP protocols?">
<q>How can I add support for URL's with HTTP/FTP protocols?</q>
<a>
<p>To address the need to make remote connections to resources
specified using other protocols like HTTP, FTP etc..., Xerces-C
now provides the <code>NetAccessor</code> interface. The header
file is <code>src/util/XMLNetAccessor.hpp</code>. This interface
allows you to plug in your own implementation of URL networking
code into the Xerces-C parser.</p>
<p>One such implementation <em>(tested minimally under WinNT
only)</em> is already provided in &XercesCName; source code
drop, using <jump href="http://www.w3.org/Library/">W3C's Libwww
library</jump>. Libwww is available for free and has been ported
to various platforms. Click <jump
href="build.html#BuildUsingLibwww">here</jump> to read how you
can rebuild Xerces-C binaries with this implementation.</p>
<p>Some more notes about the NetAccessor implementation using
Libwww:</p>
<ul>
<li>This implementation only supports HTTP and does not return
adequate error messages when connections cannot be made to the
remote resources. It however illustrates how you can add support
for HTTP and FTP URL's.</li>
<li>The Xerces-C team will <em>NOT</em> be able to address any
questions related to how things work in Libwww. You can get some
help with Libwww by subscribing to the &lt;<jump
href="mailto:www-lib-request@w3.org?subject=subscribe">www-lib@w3.org</jump>&gt;
public mailing list.</li>
<li>However, we will welcome any feedback on the design of the
NetAccessor interface. Please send all such feedback to <jump
href="mailto:xerces-dev@xml.apache.org">xerces-dev@xml.apache.org</jump>.</li>
<li>You do not have to recompile &XercesCName; to plugin your
NetAccessor implementation. You can simply point the static
pointer variable <code>XMLPlatformUtils::fgNetAccessor</code> to
an instance of your NetAccessor implementation. Please refer to
the files <code>src/util/PlatformUtils.cpp</code> and
<code>src/util/Platforms/Win32/Win32PlatformUtils.cpp</code> to
see how we have done this simple illustrative
implementation.</li>
</ul>
</a>
</faq>
<faq title="Can I use &XercesCProjectName; to parse HTML?">
<q>Can I use &XercesCProjectName; to parse HTML?</q>
<a>
<p>Yes, if it follows the XML spec rules. Most HTML, however,
does not follow the XML rules, and will therefore generate XML
well-formedness errors.</p>
</a>
</faq>
<faq title="I keep getting an error: &quot;invalid UTF-8 character&quot;. What's wrong?">
<q>I keep getting an error: "invalid UTF-8 character". What's wrong?</q>
<a>
<p>There are many Unicode characters that are not allowed in
your XML document, according to the XML spec. Typical
disallowed characters are control characters, even if you
escape them using the Character Reference form: See the XML
spec, sections 2.2 and 4.1 for details. If the parser is
generating this error, it is very likely that there's a
character in there that you can't see. You can generally use
a UNIX command like "od -hc" to find it.</p>
<p>Another reason for this error is that your file is in some
non UTF/ASCII encoding but you gave no encoding="" string in
your file to tell the parser what its real encoding is.</p>
</a>
</faq>
<faq title="What encodings are supported by Xerces-C / XML4C?">
<q>What encodings are supported by Xerces-C / XML4C?</q>
<a>
<p>Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16
(Big/Small Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and
IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can parse
input XML files in these above mentioned encodings.</p>
<p>XML4C - the version of Xerces-C available from IBM - extends
this set to include the encodings listed in the table below.</p>
<table>
<tr><td><em>Common Name</em></td><td><em>Use this name in XML</em></td></tr>
<tr><td>8 bit Unicode</td> <td>UTF-8</td></tr>
<tr><td>ISO Latin 1</td> <td>ISO-8859-1</td></tr>
<tr><td>ISO Latin 2</td> <td>ISO-8859-2</td></tr>
<tr><td>ISO Latin 3</td> <td>ISO-8859-3</td></tr>
<tr><td>ISO Latin 4</td> <td>ISO-8859-4</td></tr>
<tr><td>ISO Latin Cyrillic</td> <td>ISO-8859-5</td></tr>
<tr><td>ISO Latin Arabic</td> <td>ISO-8859-6</td></tr>
<tr><td>ISO Latin Greek</td> <td>ISO-8859-7</td></tr>
<tr><td>ISO Latin Hebrew</td> <td>ISO-8859-8</td></tr>
<tr><td>ISO Latin 5</td> <td>ISO-8859-9</td></tr>
<tr><td>EBCDIC US</td> <td>ebcdic-cp-us</td></tr>
<tr><td>EBCDIC with Euro symbol</td> <td>ibm1140</td></tr>
<tr><td>Chinese, PRC</td> <td>gb2312</td></tr>
<tr><td>Chinese, Big5</td> <td>Big5</td></tr>
<tr><td>Cyrillic</td> <td>koi8-r</td></tr>
<tr><td>Japanese, Shift JIS</td> <td>Shift_JIS</td></tr>
<tr><td>Korean, Extended UNIX code</td> <td>euc-kr</td></tr>
</table>
<p>Some implementations or ports of Xerces-C provide support for
additional encodings. The exact set will depend on the supplier
of the parser and on the character set transcoding services in use.
</p>
</a>
</faq>
<faq title="What character encoding should I use when creating XML documents?">
<q>What character encoding should I use when creating XML documents?</q>
<a>
<p>The best choice in most cases is either utf-8 or utf-16.
Advantages of these encodings include </p>
<ul>
<li>The best portability. These encodings are more widely
supported by XML processors than any others, meaning that
your documents will have the best possible chance of being
read correctly, no matter where they end up. </li>
<li>Full international character support. Both utf-8 and
utf-16 cover the full Unicode character set, which
includes all of the characters from all major national,
international and industry character sets. </li>
<li>Efficient. utf-8 has the smaller storage requirements
for documents that are primarily composed of of characters
from the Latin alphabet. utf-16 is more efficient for
encoding Asian languages. But both encodings cover
all languages without loss.</li>
</ul>
<p>The only drawback of utf-8 or utf-16 is that they are not
the native text file format for most systems, meaning that
common text file editors and viewers can not be directly used.</p>
<p>A second choice of encoding would be any of the others listed in
the table above. This works best when the xml encoding is the same
as the default system encoding on the machine where the
XML document is being prepared, because the document will then
display correctly as a plain text file. For UNIX systems
in countries speaking Western European languages, the encoding
will usually be iso-8859-1.</p>
<p>The versions of Xerces, both C and Java, distributed
by IBM as XML4C and XML4J, include all of the encodings
listed in the above table, on all platforms. </p>
<p>A word of caution for Windows users: The default character set
on Windows systems is windows-1252, not iso-8859-1. While Xerces-c
does recognize this Windows encoding, it is a poor choice for portable
XML data because it is not widely recoginized by other XML processing
tools. If you are using a Windows based editing tool to generate
XML, check which character set it generates, and make sure that the
resulting XML specifies the correct name in the encoding="..." declaration.</p>
</a>
</faq>
<faq title="Is EBCDIC supported?">
<q>Is EBCDIC supported?</q>
<a>
<p>Yes, &XercesCName; supports EBCDIC. When creating EBCDIC encoded XML data,
the preferred encoding is ibm1140. Also supported is ibm037 (and its alternate name,
ebcdic-cp-us); this encoding is almost the same as ibm1140, but it lacks the Euro
symbol</p>
<p>These two encodings, ibm1140 and ibm037, are available on both Xerces-C and
IBM XML4C, on all platforms. </p>
<p>On IBM System 390, XML4C also supports two alternative forms, ibm037-s390
and ibm1140-s390. These are similar to the base ibm037 and ibm1140 encodings,
but with alternate mappings of the EBCDIC new-line character, which allows
them to appear as normal text files on System 390s. These encodings are not
supported on other platforms, and should not be used for portable data.</p>
<p>XML4C on System 390 and AS/400 also provides additional EBCDIC encodings, including
those for the character sets of different countries. The exact set supported
will be platform dependent, and these encodings are not recommended for
portable XML data. </p>
</a>
</faq>
</faqs>