| <?xml version="1.0" encoding = "iso-8859-1" standalone="no"?> |
| <!-- |
| * Licensed to the Apache Software Foundation (ASF) under one or more |
| * contributor license agreements. See the NOTICE file distributed with |
| * this work for additional information regarding copyright ownership. |
| * The ASF licenses this file to You under the Apache License, Version 2.0 |
| * (the "License"); you may not use this file except in compliance with |
| * the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| --> |
| |
| <!DOCTYPE faqs SYSTEM "sbk:/style/dtd/faqs.dtd"> |
| |
| <faqs title="Programming &XercesCName;"> |
| |
| <faq title="Does &XercesCName; support XML Schema?"> |
| |
| <q> Does &XercesCName; support Schema?</q> |
| |
| <a> |
| <p>Yes, &XercesCName; &XercesC3Version; contains an implementation |
| of the W3C XML Schema Language, a recommendation of the Worldwide Web Consortium |
| available in three parts: |
| <jump href="http://www.w3.org/TR/xmlschema-0/">XML Schema: Primer</jump> and |
| <jump href="http://www.w3.org/TR/xmlschema-1/">XML Schema: Structures</jump> and |
| <jump href="http://www.w3.org/TR/xmlschema-2/">XML Schema: Datatypes</jump>. |
| We consider this implementation complete. See the |
| <jump href="schema-&XercesC3Series;.html#limitation">XML Schema Support</jump> page for limitations.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Does &XercesCName; support XPath?"> |
| |
| <q> Does &XercesCName; support XPath?</q> |
| |
| <a> |
| |
| <p>&XercesCName; &XercesC3Version; provides partial XPath 1 implementation |
| for the purposes of handling XML Schema identity constraints. |
| The same engine is made available through the DOMDocument::evaluate API to |
| let the user perform simple XPath queries involving DOMElement nodes only, |
| with no predicate testing and allowing the "//" operator only as the initial |
| step. For full XPath 1 and 2 support refer to the |
| <jump href="http://xqilla.sourceforge.net">XQilla</jump> and |
| <jump href="http://xml.apache.org/xalan-c/overview.html">Apache Xalan C++</jump> |
| open source projects. |
| </p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why does my application crash when instantiating the parser?"> |
| |
| <q>Why does my application crash when instantiating the parser?</q> |
| |
| <a> |
| |
| <p>In order to work with the &XercesCName; parser, you have to first |
| initialize the XML subsystem. The most common mistake is to forget this |
| initialization. Before you make any calls to &XercesCName; APIs, you must |
| call XMLPlatformUtils::Initialize(): </p> |
| |
| <source> |
| try { |
| XMLPlatformUtils::Initialize(); |
| } |
| catch (const XMLException& toCatch) { |
| // Do your failure processing here |
| }</source> |
| |
| <p>This initializes the &XercesCProjectName; system and sets its internal |
| variables. Note that you must include the <code>xercesc/util/PlatformUtils.hpp</code> file for this to work.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Is it OK to call the XMLPlatformUtils::Initialize/Terminate pair of routines multiple times in one program?"> |
| <q>Is it OK to call the XMLPlatformUtils::Initialize/Terminate pair of routines multiple times in one program?</q> |
| <a> |
| <p>Yes. Note, however, that the application needs to guarantee that the |
| XMLPlatformUtils::Initialize() and XMLPlatformUtils::Terminate() |
| methods are called from the same thread (usually the initial |
| thread executing main()) or proper synchronization is performed |
| by the application if multiple threads call |
| XMLPlatformUtils::Initialize() and XMLPlatformUtils::Terminate() |
| concurrently.</p> |
| |
| <p>If you are calling XMLPlatformUtils::Initialize() a number of times, and then follow with |
| XMLPlatformUtils::Terminate() the same number of times, only the first XMLPlatformUtils::Initialize() |
| will do the initialization, and only the last XMLPlatformUtils::Terminate() will clean up |
| the memory. The other calls are ignored. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Why does my application crash after calling XMLPlatformUtils::Terminate()?"> |
| |
| <q>Why does my application crash after calling XMLPlatformUtils::Terminate()?</q> |
| |
| <a> |
| |
| <p>Please make sure the XMLPlatformUtils::Terminate() is the last &XercesCName; function to be called |
| in your program. NO explicit nor implicit &XercesCName; destructor (those local data that are |
| destructed when going out of scope) should be called after XMLPlatformUtils::Terminate(). |
| </p> |
| <p> |
| For example consider the following code snippet which is incorrect: |
| </p> |
| |
| <source> |
| 1: { |
| 2: XMLPlatformUtils::Initialize(); |
| 3: XercesDOMParser parser; |
| 4: XMLPlatformUtils::Terminate(); |
| 5: } |
| </source> |
| |
| <p>The XercesDOMParser object "parser" is destructed when going out of scope at line 5 before the closing |
| brace. As a result, XercesDOMParser destructor is called at line 5 after |
| XMLPlatformUtils::Terminate() which is incorrect. Correct code should be: |
| </p> |
| |
| <source> |
| 1: { |
| 2: XMLPlatformUtils::Initialize(); |
| 2a: { |
| 3: XercesDOMParser parser; |
| 3a: } |
| 4: XMLPlatformUtils::Terminate(); |
| 5: } |
| </source> |
| |
| <p>The extra pair of braces (line 2a and 3a) ensures that all implicit destructors are called |
| before terminating &XercesCName;.</p> |
| |
| <p>Note also that the application needs to guarantee that the |
| XMLPlatformUtils::Initialize() and XMLPlatformUtils::Terminate() |
| methods are called from the same thread (usually the initial |
| thread executing main()) or proper synchronization is performed |
| by the application if multiple threads call |
| XMLPlatformUtils::Initialize() and XMLPlatformUtils::Terminate() |
| concurrently.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Is &XercesCName; thread-safe?"> |
| |
| <q>Is &XercesCName; thread-safe?</q> |
| |
| <a> |
| <p>The answer is yes if you observe the following rules for using |
| &XercesCName; in a multi-threaded environment:</p> |
| |
| <p>Within an address space, an instance of the parser may be used without |
| restriction from a single thread, or an instance of the parser can be accessed |
| from multiple threads, provided the application guarantees that only one thread |
| has entered a method of the parser at any one time.</p> |
| |
| <p>When two or more parser instances exist in a process, the instances can |
| be used concurrently, without external synchronization. That is, in an |
| application containing two parsers and two threads, one parser can be running |
| within the first thread concurrently with the second parser running within the |
| second thread.</p> |
| |
| <p>The same rules apply to &XercesCName; DOM documents. Multiple document |
| instances may be concurrently accessed from different threads, but any given |
| document instance can only be accessed by one thread at a time.</p> |
| |
| <p>The application also needs to guarantee that the |
| XMLPlatformUtils::Initialize() and XMLPlatformUtils::Terminate() |
| methods are called from the same thread (usually the initial |
| thread executing main()) or proper synchronization is performed |
| by the application if multiple threads call |
| XMLPlatformUtils::Initialize() and XMLPlatformUtils::Terminate() |
| concurrently.</p> |
| </a> |
| </faq> |
| |
| <faq title="I am seeing memory leaks in &XercesCName;. Are they real?"> |
| |
| <q>I am seeing memory leaks in &XercesCName;. Are they real?</q> |
| |
| <a> |
| |
| <p>The &XercesCName; library allocates and caches some commonly reused |
| items. The storage for these may be reported as memory leaks by some heap |
| analysis tools; to avoid the problem, call the function <code>XMLPlatformUtils::Terminate()</code> before your application exits. This will free all memory that was being |
| held by the library.</p> |
| |
| <p>For most applications, the use of <code>Terminate()</code> is optional. The system will recover all memory when the application |
| process shuts down. The exception to this is the use of &XercesCName; from DLLs |
| that will be repeatedly loaded and unloaded from within the same process. To |
| avoid memory leaks with this kind of use, <code>Terminate()</code> must be called before unloading the &XercesCName; library</p> |
| |
| <p>To ensure all the memory held by the parser are freed, the number of XMLPlatformUtils::Terminate() calls |
| should match the number of XMLPlatformUtils::Initialize() calls. |
| </p> |
| |
| <p>If you have built &XercesCName; with dependency on ICU then you may |
| want to call the u_cleanup() ICU function to clean up |
| ICU static data. Refer to the ICU documentation for details. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Can &XercesCName; create an XML skeleton based on a DTD"> |
| |
| <q>Is there a function that creates an XML file from a DTD (obviously |
| with the values missing, a skeleton)?</q> |
| <a> |
| |
| <p>No, there is no such functionality.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Can I use &XercesCName; to perform write validation"> |
| |
| <q>Can I use &XercesCName; to perform "write validation"? That is, having an |
| appropriate Grammar and being able to add elements to the DOM whilst validating |
| against the grammar?</q> |
| |
| <a> |
| |
| <p>No, there is no such functionality.</p> |
| |
| <p>The best you can do for now is to create the DOM document, write it back |
| as XML and re-parse it with validation turned on.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Can I validate the data contained in a DOM tree?"> |
| |
| <q>Is there a facility in &XercesCName; to validate the data contained in a |
| DOM tree? That is, without saving and re-parsing the source document?</q> |
| |
| <a> |
| <p>No, there is no such functionality. The best you can do for now is to create the DOM document, write it back |
| as XML and re-parse it with validation turned on.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="How to write out a DOM tree into a string or an XML file?"> |
| <q>How to write out a DOM tree into a string or an XML file?</q> |
| <a> |
| <p>You can use |
| the DOMLSSerializer::writeToString, or DOMLSSerializer::writeNode to serialize a DOM tree. |
| Please refer to the sample DOMPrint or the API documentation for more details of |
| DOMLSSerializer.</p> |
| </a> |
| </faq> |
| |
| <faq title="Why doesn't DOMNode::cloneNode() clone the pointer assigned to a DOMNode via DOMNode::setUserData()?"> |
| <q>Why doesn't DOMNode::cloneNode() clone the pointer assigned to a DOMNode via DOMNode::setUserData()?</q> |
| <a> |
| <p>&XercesCName; supports the DOMNode::userData specified |
| in <jump href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-3A0ED0A4"> |
| the DOM level 3 Node interface</jump>. As |
| is made clear in the description of the behavior of |
| <code>cloneNode()</code>, userData that has been set on the |
| Node is not cloned. Thus, if the userData is to be copied |
| to the new Node, this copy must be effected manually. |
| Note further that the operation of <code>importNode()</code> |
| is specified similarly. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="How are entity reference nodes handled in DOM?"> |
| |
| <q>How are entity reference nodes handled in DOM?</q> |
| |
| <a> |
| |
| <p>If you are using the native DOM classes, the function <code>setCreateEntityReferenceNodes</code> |
| controls how entities appear in the DOM tree. When |
| setCreateEntityReferenceNodes is set to true (the default), an occurrence of an |
| entity reference in the XML document will be represented by a subtree with an |
| EntityReference node at the root whose children represent the entity expansion. |
| Entity expansion will be a DOM tree representing the structure of the entity |
| expansion, not a text node containing the entity expansion as text.</p> |
| |
| <p>If setCreateEntityReferenceNodes is false, an entity reference in the XML |
| document is represented by only the nodes that represent the entity expansion. |
| The DOM tree will not contain any entityReference nodes.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Can I use &XercesCName; to parse HTML?"> |
| |
| <q>Can I use &XercesCName; to parse HTML?</q> |
| |
| <a> |
| |
| <p>Yes, but only if the HTML follows the rules given in the |
| <jump href="http://www.w3.org/TR/REC-xml">XML specification</jump>. Most HTML, |
| however, does not follow the XML rules, and will generate XML well-formedness |
| errors.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="I keep getting an error: "invalid UTF-8 character". What's wrong?"> |
| |
| <q>I keep getting an error: "invalid UTF-8 character". What's wrong?</q> |
| |
| <a> |
| |
| <p>Most commonly, the XML <code>encoding =</code> declaration is either incorrect or missing. Without a declaration, XML |
| defaults to the use utf-8 character encoding, which is not compatible with the |
| default text file encoding on most systems.</p> |
| |
| <p>The XML declaration should look something like this:</p> |
| |
| <p><code><?xml version="1.0" encoding="iso-8859-1"?></code></p> |
| |
| <p>Make sure to specify the encoding that is actually used by file. The |
| encoding for "plain" text files depends both on the operating system and the |
| locale (country and language) in use.</p> |
| |
| <p>Another common source of problems is characters that are not |
| allowed in XML documents, according to the XML spec. Typical disallowed |
| characters are control characters, even if you escape them using the Character |
| Reference form. See the XML specification, sections 2.2 and 4.1 for details. |
| If the parser is generating an <code>Invalid character (Unicode: 0x???)</code> error, it is very likely that there's a character in there that you |
| can't see. You can generally use a UNIX command like "od -hc" to find it.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="What encodings are supported by &XercesCName;?"> |
| |
| <q>What encodings are supported by &XercesCName;?</q> |
| |
| <a> |
| <p>&XercesCName; has intrinsic support for ASCII, UTF-8, UTF-16 (Big/Small |
| Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037, IBM1047 and IBM1140 |
| encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can |
| always parse input XML files in these above mentioned encodings.</p> |
| |
| <p>Furthermore, if you build &XercesCName; with the International Components |
| for Unicode (ICU) as a transcoder then the list of supported encodings |
| extends to over 100 different encodings that are supported by |
| ICU. In particular, all the encodings registered with the |
| Internet Assigned Numbers Authority (IANA) are supported |
| in this configuration.</p> |
| </a> |
| </faq> |
| |
| <faq title="What character encoding should I use when creating XML documents?"> |
| |
| <q>What character encoding should I use when creating XML documents?</q> |
| |
| <a> |
| |
| <p>The best choice in most cases is either utf-8 or utf-16. Advantages of |
| these encodings include:</p> |
| |
| <ul> |
| <li>The best portability. These encodings are more widely supported by |
| XML processors than any others, meaning that your documents will have the best |
| possible chance of being read correctly, no matter where they end up.</li> |
| <li>Full international character support. Both utf-8 and utf-16 cover the |
| full Unicode character set, which includes all of the characters from all major |
| national, international and industry character sets.</li> |
| <li>Efficient. utf-8 has the smaller storage requirements for documents |
| that are primarily composed of characters from the Latin alphabet. utf-16 is |
| more efficient for encoding Asian languages. But both encodings cover all |
| languages without loss.</li> |
| </ul> |
| |
| <p>The only drawback of utf-8 or utf-16 is that they are not the native |
| text file format for most systems, meaning that some text file editors |
| and viewers can not be directly used.</p> |
| |
| <p>A second choice of encoding would be any of the others listed in the |
| table above. This works best when the xml encoding is the same as the default |
| system encoding on the machine where the XML document is being prepared, |
| because the document will then display correctly as a plain text file. For UNIX |
| systems in countries speaking Western European languages, the encoding will |
| usually be iso-8859-1.</p> |
| |
| <p>A word of caution for Windows users: The default character set on |
| Windows systems is windows-1252, not iso-8859-1. While &XercesCName; does |
| recognize this Windows encoding, it is a poor choice for portable XML data |
| because it is not widely recognized by other XML processing tools. If you are |
| using a Windows-based editing tool to generate XML, check which character set |
| it generates, and make sure that the resulting XML specifies the correct name |
| in the <code>encoding="..."</code> declaration.</p> |
| |
| </a> |
| </faq> |
| |
| <faq title="Why does deleting a transcoded string result in assertion on windows?"> |
| <q>Why does deleting a transcoded string result in assertion on windows?</q> |
| <a> |
| <p>Both your application program and the &XercesCName; DLL must use the same DLL version of the |
| runtime library. If either statically links to the runtime library, this |
| problem will still occur.</p> |
| |
| <p>For a Visual Studio build the runtime library setting MUST |
| be "Multithreaded DLL" for release builds and "Debug Multithreaded DLL" for |
| debug builds.</p> |
| |
| <p>To bypass such problem, instead of calling operator delete[] directly, you can use the |
| provided function XMLString::release to delete any string that was allocated by the parser. |
| This will ensure the string is allocated and deleted by the same DLL and such assertion |
| problem should be resolved.</p> |
| </a> |
| </faq> |
| |
| <faq title="How do I transcode to/from something besides the local code page?"> |
| <q>How do I transcode to/from something besides the local code page?</q> |
| <a> |
| <p>XMLString::transcode() will transcode from XMLCh to the local code page, and |
| other APIs which take a char* assume that the source text is in the local |
| code page. If this is not true, you must transcode the text yourself. You |
| can do this using local transcoding support on your OS, such as Iconv on |
| Unix or IBM's ICU package. However, if your transcoding needs are simple, |
| you can achieve better portability by using the &XercesCName; parser's |
| transcoder wrappers. You get a transcoder like this: |
| </p> |
| <ul> |
| <li> |
| Call XMLPlatformUtils::fgTransServer->MakeNewTranscoderFor() and provide |
| the name of the encoding you wish to create a transcoder for. This will |
| return a transcoder to you, which you own and must delete when you are |
| through with it. |
| |
| NOTE: You must provide a maximum block size that you will pass to the transcoder |
| at one time, and you must pass blocks of characters of this count or smaller when |
| you do your transcoding. The reason for this is that this is really an |
| internal API and is used by the parser itself to do transcoding. The parser |
| always does transcoding in known block sizes, and this allows transcoders to |
| be much more efficient for internal use since it knows the max size it will |
| ever have to deal with and can set itself up for that internally. In |
| general, you should stick to block sizes in the 4 to 64K range. |
| </li> |
| <li> |
| The returned transcoder is something derived from XMLTranscoder, so they |
| are all returned to you via that interface. |
| </li> |
| <li> |
| This object is really just a wrapper around the underlying transcoding |
| system actually in use by your version of &XercesCName;, and does whatever is |
| necessary to handle differences between the XMLCh representation and the |
| representation used by that underlying transcoding system. |
| </li> |
| <li> |
| The transcoder object has two primary APIs, transcodeFrom() and |
| transcodeTo(). These transcode between the XMLCh format and the encoding you |
| indicated. |
| </li> |
| <li> |
| These APIs will transcode as much of the source data as will fit into the |
| outgoing buffer you provide. They will tell you how much of the source they |
| ate and how much of the target they filled. You can use this information to |
| continue the process until all source is consumed. |
| </li> |
| <li> |
| char* data is always dealt with in terms of bytes, and XMLCh data is |
| always dealt with in terms of characters. Don't mix up which you are dealing |
| with or you will not get the correct results, since many encodings don't |
| have a one to one relationship of characters to bytes. |
| </li> |
| <li> |
| When transcoding from XMLCh to the target encoding, the transcodeTo() |
| method provides an 'unrepresentable flag' parameter, which tells the |
| transcoder how to deal with an XMLCh code point that cannot be converted |
| legally to the target encoding, which can easily happen since XMLCh is |
| Unicode and can represent thousands of code points. The options are to use a |
| default replacement character (which the underlying transcoding service will |
| choose, and which is guaranteed to be legal for the target encoding), or to |
| throw an exception. |
| </li> |
| </ul> |
| <p>Here is an example:</p> |
| <source> |
| // Create an XMLTranscoder that is able to transcode between |
| // Unicode and UTF-8. |
| // |
| XMLTranscoder* t = XMLPlatformUtils::fgTransService->makeNewTranscoderFor( |
| "UTF-8", failReason, 16*1024); |
| |
| // Source string is in Unicode, want to transcode to UTF-8 |
| t->transcodeTo(source_unicode, |
| length, |
| result_utf8, |
| length, |
| charsEaten, |
| XMLTranscoder::UnRep_Throw); |
| |
| // Source string in UTF-8, want to transcode to Unicode. |
| t->transcodeFrom(source_utf8, |
| length, |
| result_unicode, |
| length, |
| bytesEaten, |
| (unsigned char*)charSz); |
| </source> |
| |
| <p>An even simpler way to transcode to a different encoding is |
| to use the TranscodeToStr and TranscodeFromStr wrapper classes |
| which represent a one-time transcoding and encapsulate all the |
| memory management. Refer to the API Reference for more information.</p> |
| </a> |
| </faq> |
| |
| <faq title="Why does the parser still try to locate the DTD even validation is turned off |
| and how to ignore external DTD reference?"> |
| |
| <q>Why does the parser still try to locate the DTD even validation is turned off |
| and how to ignore external DTD reference?</q> |
| |
| <a> |
| |
| <p>When DTD is referenced, the parser will try to read it, because DTDs can |
| provide a lot more information than just validation. It defines entities and |
| notations, external unparsed entities, default attributes, character |
| entities, etc. Therefore the parser will always try to read it if present, even if |
| validation is turned off. |
| </p> |
| |
| <p>To ignore external DTDs completely you can call |
| <code>setLoadExternalDTD(false)</code> (or |
| <code>setFeature(XMLUni::fgXercesLoadExternalDTD, false)</code> |
| to disable the loading of external DTD. The parser will then ignore |
| any external DTD completely if the validationScheme is set to Val_Never. |
| </p> |
| |
| <p>Note: This flag is ignored if the validationScheme is set to Val_Always or Val_Auto. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Why does the XML data generated by the DOMLSSerializer does not match my original XML input?"> |
| |
| <q>Why does the XML data generated by the DOMLSSerializer does not match my original XML input?</q> |
| |
| <a> |
| |
| <p>If you parse an xml document using XercesDOMParser or DOMLSParser and pass such DOMNode |
| to DOMLSSerializer for serialization, you may not get something that is exactly the same |
| as the original XML data. The parser may have done normalization, end of line conversion, |
| or has expanded the entity reference as per the XML 1.0 specification, 4.4 XML Processor Treatment of |
| Entities and References. From DOMLSSerializer perspective, it does not know what the original |
| string was, all it sees is a processed DOMNode generated by the parser. |
| But since the DOMLSSerializer is supposed to generate something that is parsable if sent |
| back to the parser, it will not print the DOMNode node value as is. The DOMLSSerializer |
| may do some "touch up" to the output data for it to be parsable.</p> |
| |
| <p>See <jump href="program-dom-&XercesC3Series;.html#DOMLSSerializerEntityRef">How does DOMLSSerializer handle built-in entity |
| Reference in node value?</jump> to understand further how DOMLSSerializer touches up the entity reference. |
| </p> |
| </a> |
| </faq> |
| |
| <faq title="Why does my application crash when deleting the parser after releasing a document?"> |
| |
| <q>Why does my application crash when deleting the parser after releasing a document?</q> |
| |
| <a> |
| |
| <p>In most cases, the parser handles deleting documents when the parser gets deleted. However, if an application |
| needs to release a document, it shall adopt the document before releasing it, so that the parser |
| knows that the ownership of this particular document is transfered to the application and will not |
| try to delete it once the parser gets deleted. |
| </p> |
| |
| <source> |
| XercesDOMParser *parser = new XercesDOMParser; |
| ... |
| try |
| { |
| parser->parse(xml_file); |
| } |
| catch () |
| { |
| ... |
| } |
| DOMNode *doc = parser->getDocument(); |
| ... |
| parser->adoptDocument(); |
| doc->release(); |
| ... |
| delete parser; |
| </source> |
| |
| <p>The alternative to release document is to call parser's resetDocumentPool(), which releases |
| all the documents parsed. |
| </p> |
| |
| </a> |
| </faq> |
| |
| </faqs> |