| <?xml version="1.0" standalone="no"?> |
| <!DOCTYPE s1 SYSTEM "sbk:/style/dtd/document.dtd"> |
| |
| <s1 title="Programming Guide"> |
| <anchor name="Macro"/> |
| <s2 title="Version Macro"> |
| <p>&XercesCName; has defined a numeric preprocessor macro, _XERCES_VERSION, for users to |
| introduce into their code to perform conditional compilation where the |
| version of Xerces is detected in order to enable or disable version |
| specific capabilities. For example, |
| </p> |
| <source> |
| #if _XERCES_VERSION >= 20304 |
| // code specific to Xerces-C++ version 2.3.4 |
| #else |
| // old code here... |
| #endif |
| </source> |
| <p>The minor and revision (patch level) numbers have two digits of resolution |
| which means that '3' becomes '03' and '4' becomes '04' in this example. |
| </p> |
| <p>There are also other string macro, or constants to represent the Xerces-C++ version. |
| Please refer to the header xercesc/util/XercesVersion.hpp for further details. |
| </p> |
| </s2> |
| |
| |
| <anchor name="Schema"/> |
| <s2 title="Schema Support"> |
| <p>&XercesCName; contains an implementation of the W3C XML Schema |
| Language. See <jump href="schema.html">the Schema page</jump> for details. |
| </p> |
| </s2> |
| |
| <anchor name="Progressive"/> |
| <s2 title="Progressive Parsing"> |
| |
| <p>In addition to using the <ref>parse()</ref> method to parse an XML File. |
| You can use the other two parsing methods, <ref>parseFirst()</ref> and <ref>parseNext()</ref> |
| to do 'progressive parsing', so that you don't |
| have to depend upon throwing an exception to terminate the |
| parsing operation. |
| </p> |
| <p> |
| Calling parseFirst() will cause the DTD (both internal and |
| external subsets), and any pre-content, i.e. everything up to |
| but not including the root element, to be parsed. Subsequent calls to |
| parseNext() will cause one more pieces of markup to be parsed, |
| and spit out from the core scanning code to the parser (and |
| hence either on to you if using SAX or into the DOM tree if |
| using DOM). |
| </p> |
| <p> |
| You can quit the parse any time by just not |
| calling parseNext() anymore and breaking out of the loop. When |
| you call parseNext() and the end of the root element is the |
| next piece of markup, the parser will continue on to the end |
| of the file and return false, to let you know that the parse |
| is done. So a typical progressive parse loop will look like |
| this:</p> |
| |
| <source>// Create a progressive scan token |
| XMLPScanToken token; |
| |
| if (!parser.parseFirst(xmlFile, token)) |
| { |
| cerr << "scanFirst() failed\n" << endl; |
| return 1; |
| } |
| |
| // |
| // We started ok, so lets call scanNext() |
| // until we find what we want or hit the end. |
| // |
| bool gotMore = true; |
| while (gotMore && !handler.getDone()) |
| gotMore = parser.parseNext(token);</source> |
| |
| <p>In this case, our event handler object (named 'handler' |
| surprisingly enough) is watching form some criteria and will |
| return a status from its getDone() method. Since the handler |
| sees the SAX events coming out of the SAXParser, it can tell |
| when it finds what it wants. So we loop until we get no more |
| data or our handler indicates that it saw what it wanted to |
| see.</p> |
| |
| <p>When doing non-progressive parses, the parser can easily |
| know when the parse is complete and insure that any used |
| resources are cleaned up. Even in the case of a fatal parsing |
| error, it can clean up all per-parse resources. However, when |
| progressive parsing is done, the client code doing the parse |
| loop might choose to stop the parse before the end of the |
| primary file is reached. In such cases, the parser will not |
| know that the parse has ended, so any resources will not be |
| reclaimed until the parser is destroyed or another parse is started.</p> |
| |
| <p>This might not seem like such a bad thing; however, in this case, |
| the files and sockets which were opened in order to parse the |
| referenced XML entities will remain open. This could cause |
| serious problems. Therefore, you should destroy the parser instance |
| in such cases, or restart another parse immediately. In a future |
| release, a reset method will be provided to do this more cleanly.</p> |
| |
| <p>Also note that you must create a scan token and pass it |
| back in on each call. This insures that things don't get done |
| out of sequence. When you call parseFirst() or parse(), any |
| previous scan tokens are invalidated and will cause an error |
| if used again. This prevents incorrect mixed use of the two |
| different parsing schemes or incorrect calls to |
| parseNext().</p> |
| |
| </s2> |
| |
| <anchor name="GrammarCache"/> |
| <s2 title="Preparsing Grammar and Grammar Caching"> |
| <p>&XercesCName; &XercesCVersion; provides a new function to pre-parse the grammar so that users |
| can check for any syntax or error before using the grammar. Users can also optionally |
| cache these pre-parsed grammars for later use during actual parsing. |
| </p> |
| <p>Here is an example:</p> |
| <source> |
| XercesDOMParser parser; |
| |
| // enbale schema processing |
| parser.setDoSchema(true); |
| parser.setDONamespaces(true); |
| |
| // Let's preparse the schema grammar (.xsd) and cache it. |
| Grammar* grammar = parser.loadGrammar(xmlFile, Grammar::SchemaGrammarType, true); |
| </source> |
| <p>Besides caching pre-parsed schema grammars, users can also cache any |
| grammars encountered during an xml document parse. |
| </p> |
| <p>Here is an example:</p> |
| <source> |
| SAXParser parser; |
| |
| // Enable grammar caching by setting cacheGrammarFromParse to true. |
| // The parser will cache any encountered grammars if it does not |
| // exist in the pool. |
| // If the grammar is DTD, no internal subset is allowed. |
| parser.cacheGrammarFromParse(true); |
| |
| // Let's parse our xml file (DTD grammar) |
| parser.parse(xmlFile); |
| |
| // We can get the grammar where the root element was declared |
| // by calling the parser's method getRootGrammar; |
| // Note: The parser owns the grammar, and the user should not delete it. |
| Grammar* grammar = parser.getRootGrammar(); |
| </source> |
| <p>We can use any previously cached grammars when parsing new xml |
| documents. Here are some examples on how to use those cached grammars: |
| </p> |
| <source> |
| /** |
| * Caching and reusing XML Schema (.xsd) grammar |
| * Parse an XML document and cache its grammar set. Then, use the cached |
| * grammar set in subsequent parses. |
| */ |
| |
| XercesDOMParser parser; |
| |
| // Enable schema processing |
| parser.setDoSchema(true); |
| parser.setDoNamespaces(true); |
| |
| // Enable grammar caching |
| parser.cacheGrammarFromParsing(true); |
| |
| // Let's parse the XML document. The parser will cache any grammars encounterd. |
| parser.parse(xmlFile); |
| |
| // No need to enable re-use by setting useCachedGrammarInParse to true. It is |
| // automatically enabled with grammar caching. |
| for (int i=0; i< 3; i++) |
| parser.parse(xmlFile); |
| |
| // This will flush the grammar pool |
| parser.resetCachedGrammarPool(); |
| </source> |
| |
| <source> |
| /** |
| * Caching and reusing DTD grammar |
| * Preparse a grammar and cache it in the pool. Then, we use the cached grammar |
| * when parsing XML documents. |
| */ |
| |
| SAX2XMLReader* parser = XMLReaderFactory::createXMLReader(); |
| |
| // Load grammar and cache it |
| parser->loadGrammar(dtdFile, Grammar::DTDGrammarType, true); |
| |
| // enable grammar reuse |
| parser->setFeature(XMLUni::fgXercesUseCachedGrammarInParse, true); |
| |
| // Parse xml files |
| parser->parse(xmlFile1); |
| parser->parse(xmlFile2); |
| </source> |
| <p>There are some limitations about caching and using cached grammars:</p> |
| <ul> |
| <li>When caching/reusing DTD grammars, no internal subset is allowed.</li> |
| <li>When preparsing grammars with caching option enabled, if a grammar, in the |
| result set, already exists in the pool (same NS for schema or same system |
| id for DTD), the entire set will not be cached.</li> |
| <li>When parsing an XML document with the grammar caching option enabled, the |
| reuse option is also automatically enabled. We will only parse a grammar if it |
| does not exist in the pool.</li> |
| </ul> |
| </s2> |
| |
| <anchor name="LoadableMessageText"/> |
| <s2 title="Loadable Message Text"> |
| |
| <p>The &XercesCName; supports loadable message text. Although |
| the current drop just supports English, it is capable to support other |
| languages. Anyone interested in contributing any translations |
| should contact us. This would be an extremely useful |
| service.</p> |
| |
| <p>In order to support the local message loading services, all the error messages |
| are captured in an XML file in the src/xercesc/NLS/ directory. |
| There is a simple program, in the Tools/NLSXlat/ directory, |
| which can spit out that text in various formats. It currently |
| supports a simple 'in memory' format (i.e. an array of |
| strings), the Win32 resource format, and the message catalog |
| format. The 'in memory' format is intended for very simple |
| installations or for use when porting to a new platform (since |
| you can use it until you can get your own local message |
| loading support done.)</p> |
| |
| <p>In the src/xercesc/util/ directory, there is an XMLMsgLoader |
| class. This is an abstraction from which any number of |
| message loading services can be derived. Your platform driver |
| file can create whichever type of message loader it wants to |
| use on that platform. &XercesCName; currently has versions for the in |
| memory format, the Win32 resource format, and the message |
| catalog format. An ICU one is present but not implemented |
| yet. Some of the platforms can support multiple message |
| loaders, in which case a #define token is used to control |
| which one is used. You can set this in your build projects to |
| control the message loader type used.</p> |
| |
| </s2> |
| |
| <anchor name="PluggableTranscoders"/> |
| <s2 title="Pluggable Transcoders"> |
| |
| <p>&XercesCName; also supports pluggable transcoding services. The |
| XMLTransService class is an abstract API that can be derived |
| from, to support any desired transcoding |
| service. XMLTranscoder is the abstract API for a particular |
| instance of a transcoder for a particular encoding. The |
| platform driver file decides what specific type of transcoder |
| to use, which allows each platform to use its native |
| transcoding services, or the ICU service if desired.</p> |
| |
| <p>Implementations are provided for Win32 native services, ICU |
| services, and the <ref>iconv</ref> services available on many |
| Unix platforms. The Win32 version only provides native code |
| page services, so it can only handle XML code in the intrinsic |
| encodings ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4 |
| (Big/Small Endian), EBCDIC code pages IBM037 and |
| IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. The ICU version |
| provides all of the encodings that ICU supports. The |
| <ref>iconv</ref> version will support the encodings supported |
| by the local system. You can use transcoders we provide or |
| create your own if you feel ours are insufficient in some way, |
| or if your platform requires an implementation that &XercesCName; does not |
| provide.</p> |
| |
| </s2> |
| </s1> |