| <?xml version="1.0" encoding = "iso-8859-1" standalone="no"?> |
| <!DOCTYPE faqs SYSTEM "./dtd/faqs.dtd"> |
| |
| <faqs title="Parsing with &XercesCName;"> |
| |
| <faq title="Does &XercesCName; support Schema?"> |
| |
| <q> Does &XercesCName; support Schema?</q> |
| |
| <a> |
| |
| <p>The &XercesCName; &XercesCVersion; contains an implementation |
| of a subset of the W3C XML Schema Language as specified |
| in the 2 May 2001 Recommendation for <jump |
| href="http://www.w3.org/TR/xmlschema-1/">Structures</jump> |
| and <jump href="http://www.w3.org/TR/xmlschema-2/"> |
| Datatypes</jump>. See <jump href="schema.html">the Schema |
| page</jump> for details.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why &XercesCName; does not support this particular Schema feature?"> |
| |
| <q> Why &XercesCName; does not support this particular Schema feature?</q> |
| |
| <a> |
| |
| <p>The &XercesCName; &XercesCVersion; contains an implementation |
| of a subset of the W3C XML Schema Language as specified |
| in the 2 May 2001 Recommendation for <jump |
| href="http://www.w3.org/TR/xmlschema-1/">Structures</jump> |
| and <jump href="http://www.w3.org/TR/xmlschema-2/"> |
| Datatypes</jump>. You should not consider this implementation |
| complete or correct. Please refer to <jump href="schema.html#limitation"> |
| the Schema Limitations </jump>for further details.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why does my application crash on AIX when I run it under a |
| multi-threaded environment?"> |
| |
| <q>Why does my application crash on AIX when I run it under a |
| multi-threaded environment?</q> |
| |
| <a> |
| |
| <p>AIX maintains two kinds of libraries on the system, thread-safe and |
| non-thread safe. Multi-threaded libraries on AIX follow a different naming |
| convention, Usually the multi-threaded library names are followed with "_r". |
| For example, libc.a is single threaded whereas libc_r.a is multi-threaded.</p> |
| |
| <p>To make your multi-threaded application run on AIX, you <em>must</em> |
| ensure that you do not have a "system library path" in your <code>LIBPATH</code> |
| environment variable when you run the application. The appropriate |
| libraries (threaded or non-threaded) are automatically picked up at runtime. An |
| application usually crashes when you build your application for multi-threaded |
| operation but don't point to the thread-safe version of the system libraries. |
| For example, LIBPATH can be simply set as:</p> |
| |
| <source>LIBPATH=$HOME/<&XercesCProjectName;>/lib</source> |
| |
| <p>Where <&XercesCProjectName;> points to the directory where the |
| &XercesCProjectName; application resides.</p> |
| |
| <p>If, for any reason unrelated to &XercesCProjectName;, you need to keep a |
| "system library path" in your LIBPATH environment variable, you must make sure |
| that you have placed the thread-safe path before you specify the normal system |
| path. For example, you must place <ref>/lib/threads</ref> before |
| <ref>/lib</ref> in your LIBPATH variable. That is to say your LIBPATH may look |
| like this:</p> |
| |
| <source>export LIBPATH=$HOME/<&XercesCProjectName;>/lib:/usr/lib/threads:/usr/lib</source> |
| |
| <p>Where /usr/lib is where your system libraries are.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="I cannot run the sample applications. What is wrong?"> |
| |
| <q>I cannot run the sample applications. What is wrong?</q> |
| |
| <a> |
| |
| <p>In order to run an application built using &XercesCProjectName; you must |
| set up your path and library search path properly. In the stand-alone version |
| from Apache, you must have the &XercesCName; runtime library available from |
| your path settings. On Windows this library is called <code>&XercesCWindowsDLL;.dll</code> which must be available from your <code>PATH</code> settings. (Note that now there are separate debug and release dlls for |
| Windows. If the release dll is named <code>&XercesCWindowsDLL;.dll</code> then the debug dll is named <code>&XercesCWindowsDLL;d.dll)</code>. On UNIX platforms the library is called <code>&XercesCUnixLib;.so</code> (or <code>.a</code> or <code>.sl</code>) which must be available from your <code>LD_LIBRARY_PATH</code> (or <code>LIBPATH</code> or <code>SHLIB_PATH</code>) environment variable.</p> |
| |
| <p>Thus, if you installed your binaries under <code>$HOME/fastxmlparser</code>, you need to point your library path to that directory.</p> |
| |
| <source>export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX) |
| export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux) |
| export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)</source> |
| |
| <p>If you are using the enhanced version of this parser from IBM, you will |
| need to put in two additional DLLs. In the Windows build these are <code>icuuc.dll</code> and <code>icudata.dll</code> which must be available from your PATH settings. On UNIX, these |
| libraries are called <code>libicuuc.so</code> and <code>libicudata.so</code> (or <code>.sl</code> for HP-UX or <code>.a</code> for AIX) which must be available from your library search path.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="I just built my own application using the &XercesCName; parser. Why does it crash?"> |
| |
| <q>I just built my own application using the &XercesCName; parser. Why does |
| it crash?</q> |
| |
| <a> |
| |
| <p>In order to work with the &XercesCName; parser, you have to first |
| initialize the XML subsystem. The most common mistake is to forget this |
| initialization. Before you make any calls to &XercesCName; APIs, you must |
| call XMLPlatformUtils::Initialize(): </p> |
| |
| <source> |
| try { |
| XMLPlatformUtils::Initialize(); |
| } |
| catch (const XMLException& toCatch) { |
| // Do your failure processing here |
| }</source> |
| |
| <p>This initializes the &XercesCProjectName; system and sets its internal |
| variables. Note that you must the include <code>util/PlatformUtils.hpp</code> file for this to work.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Is &XercesCName; thread-safe?"> |
| |
| <q>Is &XercesCName; thread-safe?</q> |
| |
| <a> |
| |
| <p>This is not a question that has a simple yes/no answer. Here are the |
| rules for using &XercesCName; in a multi-threaded environment:</p> |
| |
| <p>Within an address space, an instance of the parser may be used without |
| restriction from a single thread, or an instance of the parser can be accessed |
| from multiple threads, provided the application guarantees that only one thread |
| has entered a method of the parser at any one time.</p> |
| |
| <p>When two or more parser instances exist in a process, the instances can |
| be used concurrently, without external synchronization. That is, in an |
| application containing two parsers and two threads, one parser can be running |
| within the first thread concurrently with the second parser running within the |
| second thread.</p> |
| |
| <p>The same rules apply to &XercesCName; DOM documents. Multiple document |
| instances may be concurrently accessed from different threads, but any given |
| document instance can only be accessed by one thread at a time.</p> |
| |
| <p>DOMStrings allow multiple concurrent readers. All DOMString const |
| methods are thread safe, and can be concurrently entered by multiple threads. |
| Non-const DOMString methods, such as <code>appendData()</code>, are not thread safe and the application must guarantee that no other |
| methods (including const methods) are executed concurrently with them.</p> |
| |
| <p>The application also needs to guarantee that only one thread has entered the |
| method XMLPlatformUtils::Initialize() at any one time. And similarly only one |
| thread has entered the method XMLPlatformUtils::Terminate() at any one time.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Can't debug into the &XercesCName; DLL with the MSVC debugger"> |
| |
| <q> The libs/dll's I downloaded keep me from using the debugger in VC6.0. I |
| am using the 'D', debug versions of them. "no symbolic information found" is |
| what it says. Do I have to compile everything from source to make it work?</q> |
| |
| <a> |
| |
| <p>Unless you have the .pdb files, all you are getting with the debug |
| library is that it uses the debug heap manager, so that you can compile your |
| stuff in debug mode and not be dangerous. If you want full symbolic info for |
| the &XercesCName; library, you'll need the .pdb files, and to get those, you'll |
| need to rebuild the &XercesCName; library.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="First-chance exception in Microsoft debugger"> |
| |
| <q>"First-chance exception in DOMPrint.exe (KERNEL32.DLL): 0xE06D7363: |
| Microsoft C++ Exception." I am always getting this message when I am using the |
| parser. My programs are terminating abnormally. Even the samples are giving |
| this exception. I am using Visual C++ 6.0 with latest service pack |
| installed.</q> |
| |
| <a> |
| |
| <p>&XercesCName; uses C++ exceptions internally, as part of its normal |
| operation. By default, the MSVC debugger will stop on each of these with the |
| "First-chance exception ..." message.</p> |
| |
| <p>To stop this from happening do this:</p> |
| |
| <ul> |
| <li>start debugging (so the debug menu appears)</li> |
| <li>from the debug menu select "Exceptions"</li> |
| <li>from the box that opens select "Microsoft C++ Exception" and set it |
| to "Stop if not handled" instead of "stop always".</li> |
| </ul> |
| |
| <p>You'll still land in the debugger if your program is terminating |
| abnormally, but it will be at your problem, not from the internal &XercesCName; |
| exceptions.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="I am seeing memory leaks in &XercesCName;. Are they real?"> |
| |
| <q>I am seeing memory leaks in &XercesCName;. Are they real?</q> |
| |
| <a> |
| |
| <p>The &XercesCName; library allocates and caches some commonly reused |
| items. The storage for these may be reported as memory leaks by some heap |
| analysis tools; to avoid the problem, call the function <code>XMLPlatformUtils::Terminate()</code> before your application exits. This will free all memory that was being |
| held by the library.</p> |
| |
| <p>For most applications, the use of <code>Terminate()</code> is optional. The system will recover all memory when the application |
| process shuts down. The exception to this is the use of &XercesCName; from DLLs |
| that will be repeatedly loaded and unloaded from within the same process. To |
| avoid memory leaks with this kind of use, <code>Terminate()</code> must be called before unloading the &XercesCName; library</p> |
| |
| <p>To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate() calls |
| should match the number of XMLPlatformUtils::Initialize() calls. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Can I validate the data contained in a DOM tree?"> |
| |
| <q>Is there a facility in &XercesCName; to validate the data contained in a |
| DOM tree? That is, without saving and re-parsing the source document?</q> |
| |
| <a> |
| |
| <p>No. This is a frequently requested feature, but at this time it is not |
| possible to feed XML data from the DOM directly back to the DTD validator. The |
| best option for now is to generate XML source from the DOM and feed that back |
| into the parser.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Can I use Xerces to perform write validation"> |
| |
| <q>Can I use Xerces to perform "write validation" (which is having an |
| appropriate DTD and being able to add elements to the DOM whilst validating |
| against the DTD)? Is there a function that I have totally missed that creates |
| an XML file from a DTD, (obviously with the values missing, a skeleton, as it |
| were.)</q> |
| |
| <a> |
| |
| <p>The answers are: "No" and "No." Write Validation is a commonly requested |
| feature, but &XercesCName; does not have it yet.</p> |
| |
| <p>The best you can do for now is to create the DOM document, write it back |
| as XML and re-parse it.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why does my multi-threaded application crash on Solaris?"> |
| |
| <q>Why does my multi-threaded application crash on Solaris?</q> |
| |
| <a> |
| |
| <p>The problem appears because the throw call on Solaris 2.6 is not |
| multi-thread safe. Sun Microsystems provides a patch to solve this problem. To |
| get the latest patch for solving this problem, go to |
| <jump href="http://sunsolve.sun.com">SunSolve.sun.com</jump> and get the |
| appropriate patch for your operating system. For Intel machines running |
| Solaris, you need to get Patch ID 104678. For SPARC machines you need to get |
| Patch ID #105591.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why does my application gives unresolved linking errors on Solaris?"> |
| |
| <q>Why does my application gives unresolved linking errors on Solaris?</q> |
| |
| <a> |
| |
| <p>On Solaris there are a few things that need to be done before you |
| execute your application using &XercesCName;. In case you're using the |
| binary build of &XercesCName; make sure that the OS and compiler are |
| the same version as the ones used to build the binary. Different OS and |
| compiler versions might cause unresolved linking problems or compilation |
| errors. If the versions are different, rebuild the &XercesCName; library on |
| your system before building your application. If you're using ICU (which is |
| packaged with XML4C) you need to rebuild the compatible version of ICU |
| first.</p> |
| |
| <p>Also check that the library path is set properly and that the correct |
| versions of <code>gmake</code> and <code>autoconf</code> are on your system.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why do I get Internal Compiler Error when compiling &XercesCName; for a 64bit target with gcc?"> |
| <q>Why do I get Internal Compiler Error when compiling &XercesCName; for a 64bit target with gcc?</q> |
| <a> |
| <p>This is a compiler problem. Try turning off optimization to bypass the problem.</p> |
| </a> |
| </faq> |
| |
| |
| <faq title="How are entity reference nodes handled in DOM?"> |
| |
| <q>How are entity reference nodes handled in DOM?</q> |
| |
| <a> |
| |
| <p>If you are using the native DOM classes, the function <code>setExpandEntityReferences</code> controls how entities appear in the DOM tree. When |
| setExpandEntityReferences is set to false (the default), an occurrence of an |
| entity reference in the XML document will be represented by a subtree with an |
| EntityReference node at the root whose children represent the entity expansion. |
| Entity expansion will be a DOM tree representing the structure of the entity |
| expansion, not a text node containing the entity expansion as text.</p> |
| |
| <p>If setExpandEntityReferences is true, an entity reference in the XML |
| document is represented by only the nodes that represent the entity expansion. |
| The DOM tree will not contain any entityReference nodes.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="What kinds of URLs are currently supported in &XercesCName;?"> |
| |
| <q>What kinds of URLs are currently supported in &XercesCName;?</q> |
| |
| <a> |
| |
| <p>The <code>XMLURL</code> class provides for limited URL support. It understands the <code>file://, http://</code>, and <code>ftp://</code> URL types, and is capable or parsing them into their constituent |
| components, and normalizing them. It also supports the commonly required action |
| of conglomerating a base and relative URL into a single URL. In other words, it |
| performs the limited set of functions required by an XML parser.</p> |
| |
| <p>Another thing that URLs commonly do are to create an input stream that |
| provides access to the entity referenced. The parser, as shipped, only supports |
| this functionality on URLs in the form <code>file:///</code> and <code>file://localhost/</code>, i.e. only when the URL refers to a local file.</p> |
| |
| <p>You may enable support for HTTP and FTP URLs by implementing and |
| installing a NetAccessor object. When a NetAccessor object is installed, the |
| URL class will use it to create input streams for the remote entities referred |
| to by such URLs.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="How can I add support for URLs with HTTP/FTP protocols?"> |
| |
| <q>How can I add support for URLs with HTTP/FTP protocols?</q> |
| |
| <a> |
| |
| <p>Support for the http: protocol is now included by default on all |
| platforms.</p> |
| |
| <p>To address the need to make remote connections to resources specified |
| using additional protocols, ftp for example, &XercesCName; provides the <code>NetAccessor</code> interface. The header file is <code>src/util/XMLNetAccessor.hpp</code>. This interface allows you to plug in your own implementation of URL |
| networking code into the &XercesCName; parser.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Can I use &XercesCName; to parse HTML?"> |
| |
| <q>Can I use &XercesCName; to parse HTML?</q> |
| |
| <a> |
| |
| <p>Yes, but only if the HTML follows the rules given in the |
| <jump href="http://www.w3.org/TR/REC-xml">XML specification</jump>. Most HTML, |
| however, does not follow the XML rules, and will generate XML well-formedness |
| errors.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="I keep getting an error: "invalid UTF-8 character". What's wrong?"> |
| |
| <q>I keep getting an error: "invalid UTF-8 character". What's wrong?</q> |
| |
| <a> |
| |
| <p>Most commonly, the XML <code>encoding =</code> declaration is either incorrect or missing. Without a declaration, XML |
| defaults to the use utf-8 character encoding, which is not compatible with the |
| default text file encoding on most systems.</p> |
| |
| <p>The XML declaration should look something like this:</p> |
| |
| <p><code><?xml version="1.0" encoding="iso-8859-1"?></code></p> |
| |
| <p>Make sure to specify the encoding that is actually used by file. The |
| encoding for "plain" text files depends both on the operating system and the |
| locale (country and language) in use.</p> |
| |
| <p>Another common source of problems is that some characters are not |
| allowed in XML documents, according to the XML spec. Typical disallowed |
| characters are control characters, even if you escape them using the Character |
| Reference form. See the <jump href="http://www.w3.org/TR/REC-xml#charsets">XML |
| spec</jump>, sections 2.2 and 4.1 for details. If the parser is generating an <code>Invalid character (Unicode: 0x???)</code> error, it is very likely that there's a character in there that you |
| can't see. You can generally use a UNIX command like "od -hc" to find it.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="What encodings are supported by Xerces-C / XML4C?"> |
| |
| <q>What encodings are supported by Xerces-C / XML4C?</q> |
| |
| <a> |
| |
| <p>Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 (Big/Small |
| Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and IBM1140 |
| encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can |
| parse input XML files in these above mentioned encodings.</p> |
| |
| <p>XML4C -- the version of Xerces-C available from IBM -- extends this |
| set to include the encodings listed in the table below.</p> |
| |
| <table> |
| <tr> |
| <td><em>Common Name</em></td> |
| <td><em>Use this name in XML</em></td> |
| </tr> |
| <tr> |
| <td>8 bit Unicode</td> |
| <td>UTF-8</td> |
| </tr> |
| <tr> |
| <td>ISO Latin 1</td> |
| <td>ISO-8859-1</td> |
| </tr> |
| <tr> |
| <td>ISO Latin 2</td> |
| <td>ISO-8859-2</td> |
| </tr> |
| <tr> |
| <td>ISO Latin 3</td> |
| <td>ISO-8859-3</td> |
| </tr> |
| <tr> |
| <td>ISO Latin 4</td> |
| <td>ISO-8859-4</td> |
| </tr> |
| <tr> |
| <td>ISO Latin Cyrillic</td> |
| <td>ISO-8859-5</td> |
| </tr> |
| <tr> |
| <td>ISO Latin Arabic</td> |
| <td>ISO-8859-6</td> |
| </tr> |
| <tr> |
| <td>ISO Latin Greek</td> |
| <td>ISO-8859-7</td> |
| </tr> |
| <tr> |
| <td>ISO Latin Hebrew</td> |
| <td>ISO-8859-8</td> |
| </tr> |
| <tr> |
| <td>ISO Latin 5</td> |
| <td>ISO-8859-9</td> |
| </tr> |
| <tr> |
| <td>EBCDIC US</td> |
| <td>ebcdic-cp-us</td> |
| </tr> |
| <tr> |
| <td>EBCDIC with Euro symbol</td> |
| <td>ibm1140</td> |
| </tr> |
| <tr> |
| <td>Chinese, PRC</td> |
| <td>gb2312</td> |
| </tr> |
| <tr> |
| <td>Chinese, Big5</td> |
| <td>Big5</td> |
| </tr> |
| <tr> |
| <td>Cyrillic</td> |
| <td>koi8-r</td> |
| </tr> |
| <tr> |
| <td>Japanese, Shift JIS</td> |
| <td>Shift_JIS</td> |
| </tr> |
| <tr> |
| <td>Korean, Extended UNIX code</td> |
| <td>euc-kr</td> |
| </tr> |
| </table> |
| |
| <p>Some implementations or ports of Xerces-C provide support for |
| additional encodings. The exact set will depend on the supplier of the parser |
| and on the character set transcoding services in use.</p> |
| |
| </a> |
| </faq> |
| |
| <faq |
| title="What character encoding should I use when creating XML documents?"> |
| |
| <q>What character encoding should I use when creating XML documents?</q> |
| |
| <a> |
| |
| <p>The best choice in most cases is either utf-8 or utf-16. Advantages of |
| these encodings include:</p> |
| |
| <ul> |
| <li>The best portability. These encodings are more widely supported by |
| XML processors than any others, meaning that your documents will have the best |
| possible chance of being read correctly, no matter where they end up.</li> |
| <li>Full international character support. Both utf-8 and utf-16 cover the |
| full Unicode character set, which includes all of the characters from all major |
| national, international and industry character sets.</li> |
| <li>Efficient. utf-8 has the smaller storage requirements for documents |
| that are primarily composed of of characters from the Latin alphabet. utf-16 is |
| more efficient for encoding Asian languages. But both encodings cover all |
| languages without loss.</li> |
| </ul> |
| |
| <p>The only drawback of utf-8 or utf-16 is that they are not the native |
| text file format for most systems, meaning that common text file editors and |
| viewers can not be directly used.</p> |
| |
| <p>A second choice of encoding would be any of the others listed in the |
| table above. This works best when the xml encoding is the same as the default |
| system encoding on the machine where the XML document is being prepared, |
| because the document will then display correctly as a plain text file. For UNIX |
| systems in countries speaking Western European languages, the encoding will |
| usually be iso-8859-1.</p> |
| |
| <p>The versions of Xerces distributed by IBM, both C and Java (known |
| respectively as XML4C and XML4J), include all of the encodings listed in the |
| above table, on all platforms.</p> |
| |
| <p>A word of caution for Windows users: The default character set on |
| Windows systems is windows-1252, not iso-8859-1. While &XercesCName; does |
| recognize this Windows encoding, it is a poor choice for portable XML data |
| because it is not widely recognized by other XML processing tools. If you are |
| using a Windows-based editing tool to generate XML, check which character set |
| it generates, and make sure that the resulting XML specifies the correct name |
| in the <code>encoding="..."</code> declaration.</p> |
| |
| </a> |
| </faq> |
| |
| <faq |
| title="I find memory leaks in &XercesCName;. How do I eliminate it?"> |
| |
| <q>I find memory leaks in &XercesCName;. How do I eliminate it?</q> |
| |
| <a> |
| |
| <p>The "leaks" that are reported through a leak-detector or heap-analysis |
| tools aren't really leaks in most application, in that the memory usage does |
| not grow over time as the XML parser is used and re-used.</p> |
| |
| <p>What you are seeing as leaks are actually lazily evaluated data |
| allocated into static variables. This data gets released when the application |
| ends. You can make a call to <code>XMLPlatformUtil::terminate()</code> to release all the lazily allocated variables before you exit your |
| program.</p> |
| |
| <p>To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate() calls |
| should match the number of XMLPlatformUtils::Initialize() calls. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Is EBCDIC supported?"> |
| |
| <q>Is EBCDIC supported?</q> |
| |
| <a> |
| |
| <p>Yes, &XercesCName; supports EBCDIC. When creating EBCDIC encoded XML |
| data, the preferred encoding is ibm1140. Also supported is ibm037 (and its |
| alternate name, ebcdic-cp-us); this encoding is almost the same as ibm1140, but |
| it lacks the Euro symbol.</p> |
| |
| <p>These two encodings, ibm1140 and ibm037, are available on both |
| Xerces-C and IBM XML4C, on all platforms.</p> |
| |
| <p>On IBM System 390, XML4C also supports two alternative forms, |
| ibm037-s390 and ibm1140-s390. These are similar to the base ibm037 and ibm1140 |
| encodings, but with alternate mappings of the EBCDIC new-line character, which |
| allows them to appear as normal text files on System 390s. These encodings are |
| not supported on other platforms, and should not be used for portable data.</p> |
| |
| <p>XML4C on System 390 and AS/400 also provides additional EBCDIC |
| encodings, including those for the character sets of different countries. The |
| exact set supported will be platform dependent, and these encodings are not |
| recommended for portable XML data.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="How to write out a DOM tree into an XML file?"> |
| <q>How to write out a DOM tree into an XML file?</q> |
| <a> |
| <p>This feature is not yet availabe in the parser. Take a look at the DOMPrint sample |
| for an example on parsing XML file, then writing it out back to the screen. You |
| can use that code.</p> |
| </a> |
| </faq> |
| |
| <faq title="Is it OK to call the XMLPlatformUtils::Initialize/Terminate pair of routines multiple times in one program?"> |
| <q>Is it OK to call the XMLPlatformUtils::Initialize/Terminate pair of routines multiple times in one program?</q> |
| <a> |
| <p>Yes. The code has been enhanced so that calling XMLPlatformUtils::Initialize/Terminate pair of routines |
| multiple times in one process is now allowed. |
| </p> |
| |
| <p>But the application needs to guarantee that only one thread has entered the |
| method XMLPlatformUtils::Initialize() at any one time. And similarly only one |
| thread has entered the method XMLPlatformUtils::Terminate() at any one time. |
| </p> |
| |
| <p>If you are calling XMLPlatformUtils::Initialize() a number of times, and then follow with |
| XMLPlatformUtils::Terminate() the same number of times, only the first XMLPlatformUtils::Initialize() |
| will do the initialization, and only the last XMLPlatformUtils::Terminate() will clean up |
| the memory. The other calls are ignored. |
| </p> |
| <p>To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate() calls |
| should match the number of XMLPlatformUtils::Initialize() calls. |
| </p> |
| <p> |
| Consider the following code snippets (for illustration simplicity the following |
| sample code is not coded in try/catch clause): |
| </p> |
| |
| <source> |
| // The XMLPlatformUtils::Initialize/Terminate calls are paired. |
| { |
| // Initialize the parser |
| XMLPlatformUtils::Initialize(); |
| |
| SAXParser parser; |
| parser.parse(xmlFile); |
| |
| // Free all memory that was being held by the parser |
| XMLPlatformUtils::Terminate(); |
| |
| // Initialize the parser |
| XMLPlatformUtils::Initialize(); |
| |
| SAXParser parser; |
| parser.parse(xmlFile); |
| |
| // Free all memory that was being held by the parser |
| XMLPlatformUtils::Terminate(); |
| } |
| </source> |
| |
| <source> |
| // calls XMLPlatformUtils::Initialize() three times |
| // then calls XMLPlatformUtils::Terminate() numerous times |
| { |
| // Initialize the parser |
| XMLPlatformUtils::Initialize(); |
| |
| // The next two calls are no-op |
| XMLPlatformUtils::Initialize(); |
| XMLPlatformUtils::Initialize(); |
| |
| SAXParser parser; |
| parser.parse(xmlFile); |
| |
| // The first two XMLPlatformUtils::Terminate() calls are no-op |
| XMLPlatformUtils::Terminate(); |
| XMLPlatformUtils::Terminate(); |
| |
| // This third XMLPlatformUtils::Terminate() will free all memory that was being held by the parser |
| XMLPlatformUtils::Terminate(); |
| |
| // This extra fourth XMLPlatformUtils::Terminate() call is no-op. |
| // However calling XMLPlatformUtils::Terminate() without a matching XMLPlatformUtils::Initialize() |
| // is dangerous and should be avoided. |
| XMLPlatformUtils::Terminate(); |
| } |
| </source> |
| </a> |
| </faq> |
| |
| <faq title="Why does deleting a transcoded string result in assertion on windows?"> |
| <q>Why does deleting a transcoded string result in assertion on windows?</q> |
| <a> |
| <p>Both your application program and the Xerces DLL must use the same *DLL* version of the |
| runtime library. If either statically links to the runtime library, the |
| problem will still occur. |
| |
| For example, for a Win32/VC6 build, the runtime library build setting MUST |
| be "Multithreaded DLL" for release builds and "Debug Multithreaded DLL" for |
| debug builds.</p> |
| </a> |
| </faq> |
| |
| <faq title="How do I transcode to/from something besides the local code page?"> |
| <q>How do I transcode to/from something besides the local code page?</q> |
| <a> |
| <p>XMLString::transcode() will transcode from XMLCh to the local code page, and |
| other APIs which take a char* assume that the source text is in the local |
| code page. If this is not true, you must transcode the text yourself. You |
| can do this using local transcoding support on your OS, such as Iconv on |
| Unix or or IBM's ICU package. However, if your transcoding needs are simple, |
| you can achieve some better portability by using the Xerces parser's |
| transcoder wrappers. You get a transcoder like this: |
| </p> |
| <ul> |
| <li> |
| 1. Call XMLPlatformUtils::fgTransServer->MakeNewTranscoderFor() and provide |
| the name of the encoding you wish to create a transcoder for. This will |
| return a transcoder to you, which you own and must delete when you are |
| through with it. |
| |
| NOTE: You must provide a maximum block size that you will pass to the transcoder |
| at one time, and you must blocks of characters of this count or smaller when |
| you do your transcoding. The reason for this is that this is really an |
| internal API and is used by the parser itself to do transcoding. The parser |
| always does transcoding in known block sizes, and this allows transcoders to |
| be much more efficient for internal use since it knows the max size it will |
| ever have to deal with and can set itself up for that internally. In |
| general, you should stick to block sizes in the 4 to 64K range. |
| </li> |
| <li> |
| 2. The returned transcoder is something derived from XMLTranscoder, so they |
| are all returned to you via that interface. |
| </li> |
| <li> |
| 3. This object is really just a wrapper around the underlying transcoding |
| system actually in use by your version of Xerces, and does whatever is |
| necessary to handle differences between the XMLCh representation and the |
| representation uesd by that underying transocding system. |
| </li> |
| <li> |
| 4. The transcoder object has two primary APIs, transcodeFrom() and |
| transcodeTo(). These transcode between the XMLCh format and the encoding you |
| indicated. |
| </li> |
| <li> |
| 5. These APIs will transcode as much of the source data as will fit into the |
| outgoing buffer you provide. They will tell you how much of the source they |
| ate and how much of the target they filled. You can use this information to |
| continue the process until all source is consumed. |
| </li> |
| <li> |
| 6. char* data is always dealt with in terms of bytes, and XMLCh data is |
| always dealt with in terms of characters. Don't mix up which you are dealing |
| with or you will not get the correct results, since many encodings don't |
| have a one to one relationship of characters to bytes. |
| </li> |
| <li> |
| 7. When transcoding from XMLCh to the target encoding, the transcodeTo() |
| method provides an 'unrepresentable flag' parameter, which tells the |
| transcoder how to deal with an XMLCh code point that cannot be converted |
| legally to the target encoding, which can easily happen since XMLCh is |
| Unicode and can represent thousands of code points. The options are to use a |
| default replacement character (which the underlying transcoding service will |
| choose, and which is guaranteed to be legal for the target encoding), or to |
| throw an exception. |
| </li> |
| </ul> |
| </a> |
| </faq> |
| |
| <faq title="Why DOM_Node::cloneNode() does not clone the pointer assigned to a DOM_Node via DOM_Node::setUserData()?"> |
| <q>Why DOM_Node::cloneNode() does not clone the pointer assigned to a DOM_Node via DOM_Node::setUserData()?</q> |
| <a> |
| <p>There are several possible options for how cloneNode should handle userData: |
| </p> |
| <ul> |
| <li> |
| 1) Copy the pointer. May be a Very Bad Idea if you really wanted the data |
| associated with a particular node object. |
| </li> |
| <li> |
| 2) Clone the object being pointed at. Maybe a Very Bad Idea if that object, |
| in turn, wasn't designed to be cloned at this time. |
| </li> |
| <li> |
| 3) A complex call-back API has been proposed which would allow the userData |
| object to tell the DOM which of these three options should be taken, but |
| that would require that only objects implementing that API be registered as |
| userData. That doesn't seem to be a good option. |
| </li> |
| <li> |
| 4) Do nothing. This is by far the lowest-overhead and safest choice. And |
| since cloneNode is a DOM operation, and userData is _not_ defined by the |
| standard DOM API, one can make a very strong case for this being the "most |
| correct" option. |
| </li> |
| </ul> |
| <p> |
| We chose (4), very deliberately. If you want one of the others, you can |
| implement it by creating your own wrapper operation for cloneNode() and |
| calling that. |
| </p> |
| <p> |
| NOTE that userData should be considered a nonportable, experimental feature |
| of the Xerces DOM. It may evaporate entirely in favor of a scheme based on |
| the DOM Level 3 "node key" mechanism, when that becomes officially |
| available. |
| </p> |
| </a> |
| </faq> |
| |
| </faqs> |