doc/migration.xml - xerces-c - Git at Google

 <?xml version="1.0" standalone="no"?>
 <!DOCTYPE s1 SYSTEM "./dtd/document.dtd">

 <s1 title="Migrating from XML4C 2.x">
     <p>This document is a discussion of the technical differences between
     XML4C 2.x code base and the new &XercesCName; &XercesCVersion; code base.</p>

     <p>Topics discussed are:</p>
     <ul>
         <li><link anchor="GenImprovements">General Improvements</link></li>
         <ul>
             <li><link anchor="Compliance">Compliance</link></li>
             <li><link anchor="BugFixes">Bug Fixes</link></li>
             <li><link anchor="Speed">Speed</link></li>
         </ul>
         <li><link anchor="Summary">Summary of changes required to migrate from XML4C 2.x to &XercesCName; &XercesCVersion;</link></li>
         <li><link anchor="Samples">The Samples</link></li>
         <li><link anchor="ParserClasses">Parser Classes</link></li>
         <li><link anchor="DOMLevel2">DOM Level 2 support</link></li>
         <li><link anchor="Progressive">Progressive Parsing</link></li>
         <li><link anchor="Namespace">Namespace support</link></li>
         <li><link anchor="MovedToSrcFramework">Moved Classes to src/framework</link></li>
         <li><link anchor="LoadableMessageText">Loadable Message Text</link></li>
         <li><link anchor="PluggableValidators">Pluggable Validators</link></li>
         <li><link anchor="PluggableTranscoders">Pluggable Transcoders</link></li>
         <li><link anchor="UtilReorg">Util directory Reorganization</link></li>
         <ul>
             <li><link anchor="UtilPlatform">util - The platform independent utility stuff</link></li>
         </ul>
     </ul>


     <anchor name="GenImprovements"/>
     <s2 title="General Improvements">

         <p>The new version is improved in many ways. Some general improvements
         are: significantly better conformance to the XML spec, cleaner
         internal architecture, many bug fixes, and faster speed.</p>

         <anchor name="Compliance"/>
         <s3 title="Compliance">
             <p>Except for a couple of the very obscure (mostly related to
             the 'standalone' mode), this version should be quite compliant.
             We have more than a thousand tests, some collected from various
             public sources and some IBM generated, which are used to do
             regression testing. The C++ parser is now passing all but a
             handful of them.</p>
         </s3>

         <anchor name="BugFixes"/>
         <s3 title="Bug Fixes">
             <p>This version has many bug fixes with regard to XML4C version 2.x.
             Some of these were reported by users and some were brought up by
             way of the conformance testing.</p>
         </s3>

         <anchor name="Speed"/>
         <s3 title="Speed">
             <p>Much work was done to speed up this version. Some of the
             new features, such as namespaces, and conformance checks ended
             up eating up some of these gains, but overall the new version
             is significantly faster than previous versions, even while doing
             more.</p>
         </s3>
     </s2>


     <anchor name="Summary"/>
     <s2 title="Summary of changes required to migrate from XML4C 2.x to &XercesCName; &XercesCVersion;">

         <p>As mentioned, there are some major architectural changes
         between the 2.3.x and &XercesCName; &XercesCVersion; releases
         of the parser, and as a result the code has undergone
         significant restructuring. The list below mentions the public
         api's which existed in 2.3.x and no longer exist in
         &XercesCName; &XercesCVersion;. It also mentions the
         &XercesCName; &XercesCVersion; api which will give you the
         same functionality.  Note: This list is not exhaustive. The
         API docs (and ultimately the header files) supplement this
         information.</p>

         <ul>

             <li><code>parsers/[Non]Validating[DOM/SAX]parser.hpp</code><br/>
             These files/classes have all been consolidated in the new
             version to just two files/classes:
             <code>[DOM/SAX]Parser.hpp</code>.  Validation is now a
             property which may be set before invoking the
             <code>parse</code>. Now, the
             <code>setDoValidation()</code> method controls the
             validation processing.</li>

             <li>The <code>framework/XMLDocumentTypeHandler.hpp</code>
             been replaced with
             <code>validators/DTD/DocTypeHandler.hpp</code>.</li>

             <li>The following methods now have different set of
             parameters because the underlying base class methods have
             changed in the 3.x release. These methods belong to one of
             <code>XMLDocumentHandler</code>,
             <code>XMLErrorReporter</code> or
             <code>DocTypeHandler</code> interfaces.</li>
             <ul>
                 <li><code>[Non]Validating[DOM/SAX]Parser::docComment</code></li>
                 <li><code>[Non]Validating[DOM/SAX]Parser::doctypePI</code></li>
                 <li><code>[Non]ValidatingSAXParser::elementDecl</code></li>
                 <li><code>[Non]ValidatingSAXParser::endAttList</code></li>
                 <li><code>[Non]ValidatingSAXParser::entityDecl</code></li>
                 <li><code>[Non]ValidatingSAXParser::notationDecl</code></li>
                 <li><code>[Non]ValidatingSAXParser::startAttList</code></li>
                 <li><code>[Non]ValidatingSAXParser::TextDecl</code></li>
                 <li><code>[Non]ValidatingSAXParser::docComment</code></li>
                 <li><code>[Non]ValidatingSAXParser::docPI</code></li>
                 <li><code>[Non]Validating[DOM/SAX]Parser::endElement</code></li>
                 <li><code>[Non]Validating[DOM/SAX]Parser::startElement</code></li>
                 <li><code>[Non]Validating[DOM/SAX]Parser::XMLDecl</code></li>
                 <li><code>[Non]Validating[DOM/SAX]Parser::error</code></li>
             </ul>

             <li>The following methods/data members changed visibility
             from <code>protected</code> in 2.3.x to
             <code>private</code> (with public setters and getters, as
             appropriate).</li>

             <ul>
                 <li><code>[Non]ValidatingDOMParser::fDocument</code></li>
                 <li><code>[Non]ValidatingDOMParser::fCurrentParent</code></li>
                 <li><code>[Non]ValidatingDOMParser::fCurrentNode</code></li>
                 <li><code>[Non]ValidatingDOMParser::fNodeStack</code></li>
             </ul>


             <li>The following files have moved, possibly requiring
             changes in the <code>#include</code> statements.</li>

             <ul>
                 <li><code>MemBufInputSource.hpp</code></li>
                 <li><code>StdInInputSource.hpp</code></li>
                 <li><code>URLInputSource.hpp</code></li>
             </ul>


             <li>All the DTD validator code was moved from
             <code>internal</code> to separate
             <code>validators/DTD</code> directory.</li>

             <li>The error code definitions which were earlier in
             <code>internal/ErrorCodes.hpp</code> are now splitup into
             the following files:</li>

             <ul>
                 <li><code>framework/XMLErrorCodes.hpp   </code> - Core XML errors</li>
                 <li><code>framework/XMLValidityCodes.hpp</code> - DTD validity errors</li>
                 <li><code>util/XMLExceptMsgs.hpp        </code> - C++ specific exception codes.</li>
             </ul>
         </ul>

     </s2>


     <anchor name="Samples"/>
     <s2 title="The Samples">

         <p>The sample programs no longer use any of the unsupported
         util/xxx classes. They only existed to allow us to write
         portable samples. But, since we feel that the wide character
         APIs are supported on a lot of platforms these days, it was
         decided to go ahead and just write the samples in terms of
         these. If your system does not support these APIs, you will
         not be able to build and run the samples. On some platforms,
         these APIs might perhaps be optional packages or require
         runtime updates or some such action.</p>

         <p>More samples have been added as well. These highlight some
         of the new functionality introduced in the new code base. And
         the existing ones have been cleaned up as well.</p>

         <p>The new samples are:</p>
         <ol>
            <li>PParse - Demonstrates 'progressive parse' (see below)</li>
            <li>StdInParse - Demonstrates use of the standard in input source</li>
            <li>EnumVal - Shows how to enumerate the markup decls in a DTD Validator</li>
         </ol>
     </s2>


     <anchor name="ParserClasses"/>
     <s2 title="Parser Classes">

         <p>In the XML4C 2.x code base, there were the following parser
         classes (in the src/parsers/ source directory):
         NonValidatingSAXParser, ValidatingSAXParser,
         NonValidatingDOMParser, ValidatingDOMParser.  The
         non-validating ones were the base classes and the validating
         ones just derived from them and turned on the validation.
         This was deemed a little bit overblown, considering the tiny
         amount of code required to turn on validation and the fact
         that it makes people use a pointer to the parser in most cases
         (if they needed to support either validating or non-validating
         versions.)</p>

         <p>The new code base just has SAXParer and DOMParser
         classes. These are capable of handling both validating and
         non-validating modes, according to the state of a flag that
         you can set on them. For instance, here is a code snippet that
         shows this in action.</p>

 <source>void ParseThis(const  XMLCh* const fileToParse,
                const bool validate)
 {
   //
   // Create a SAXParser. It can now just be
   // created by value on the stack if we want
   // to parse something within this scope.
   //
   SAXParser myParser;

   // Tell it whether to validate or not
   myParser.setDoValidation(validate);

   // Parse and catch exceptions...
   try
   {
     myParser.parse(fileToParse);
   }
     ...
 };</source>

         <p>We feel that this is a simpler architecture, and that it makes things
         easier for you. In the above example, for instance, the parser will be
         cleaned up for you automatically upon exit since you don't have to
         allocate it anymore.</p>

     </s2>


     <anchor name="DOMLevel2"/>
     <s2 title="DOM Level 2 support">

         <p>Experimental early support for some parts of the DOM level
         2 specification have been added. These address some of the
         shortcomings in our DOM implementation,
         such as a simple, standard mechanism for tree traversal.</p>

     </s2>


     <anchor name="Progressive"/>
     <s2 title="Progressive Parsing">

         <p>The new parser classes support, in addition to the
         <ref>parse()</ref> method, two new parsing methods,
         <ref>parseFirst()</ref> and <ref>parseNext()</ref>.  These are
         designed to support 'progressive parsing', so that you don't
         have to depend upon throwing an exception to terminate the
         parsing operation. Calling parseFirst() will cause the DTD (or
         in the future, Schema) to be parsed (both internal and
         external subsets) and any pre-content, i.e. everything up to
         but not including the root element. Subsequent calls to
         parseNext() will cause one more pieces of markup to be parsed,
         and spit out from the core scanning code to the parser (and
         hence either on to you if using SAX or into the DOM tree if
         using DOM.) You can quit the parse any time by just not
         calling parseNext() anymore and breaking out of the loop. When
         you call parseNext() and the end of the root element is the
         next piece of markup, the parser will continue on to the end
         of the file and return false, to let you know that the parse
         is done. So a typical progressive parse loop will look like
         this:</p>

 <source>// Create a progressive scan token
 XMLPScanToken token;

 if (!parser.parseFirst(xmlFile, token))
 {
   cerr &lt;&lt; "scanFirst() failed\n" &lt;&lt; endl;
   return 1;
 }

 //
 // We started ok, so lets call scanNext()
 // until we find what we want or hit the end.
 //
 bool gotMore = true;
 while (gotMore &amp;&amp; !handler.getDone())
   gotMore = parser.parseNext(token);</source>

         <p>In this case, our event handler object (named 'handler'
         surprisingly enough) is watching form some criteria and will
         return a status from its getDone() method. Since the handler
         sees the SAX events coming out of the SAXParser, it can tell
         when it finds what it wants. So we loop until we get no more
         data or our handler indicates that it saw what it wanted to
         see.</p>

         <p>When doing non-progressive parses, the parser can easily
         know when the parse is complete and insure that any used
         resources are cleaned up. Even in the case of a fatal parsing
         error, it can clean up all per-parse resources. However, when
         progressive parsing is done, the client code doing the parse
         loop might choose to stop the parse before the end of the
         primary file is reached. In such cases, the parser will not
         know that the parse has ended, so any resources will not be
         reclaimed until the parser is destroyed or another parse is started.</p>

         <p>This might not seem like such a bad thing; however, in this case,
         the files and sockets which were opened in order to parse the
         referenced XML entities will remain open. This could cause
         serious problems. Therefore, you should destroy the parser instance
         in such cases, or restart another parse immediately. In a future
         release, a reset method will be provided to do this more cleanly.</p>

         <p>Also note that you must create a scan token and pass it
         back in on each call. This insures that things don't get done
         out of sequence. When you call parseFirst() or parse(), any
         previous scan tokens are invalidated and will cause an error
         if used again. This prevents incorrect mixed use of the two
         different parsing schemes or incorrect calls to
         parseNext().</p>

     </s2>


     <anchor name="Namespace"/>
     <s2 title="Namespace support">

         <p>The C++ parser now supports namespaces. With current XML
         interfaces (SAX/DOM) this doesn't mean very much because these
         APIs are incapable of passing on the namespace information.
         However, if you are using our internal APIs to write your own
         parsers, you can make use of this new information. Since the
         internal event APIs must be able to now support both namespace
         and non-namespace information, they have more
         parameters. These allow namespace information to be passed
         along.</p>

         <p>Most of the samples now have a new command line parameter
         to turn on namespace support. You turn on namespaces like
         this:</p>

 <source>SAXParser myParser;
 // Tell it whether to do namespace
 myParser.setDoNamespaces(true);</source>
     </s2>


     <anchor name="MovedToSrcFramework"/>
     <s2 title="Moved Classes to src/framework">

         <p>Some of the classes previously in the src/internal/
         directory have been moved to their more correct location in
         the src/framework/ directory. These are classes used by the
         outside world and should have been framework classes to begin
         with. Also, to avoid name classes in the absense of C++ namespace
         support, some of these clashes have been renamed to make them
         more XML specific and less likely to clash. More
         classes might end up being moved to framework as well.</p>

         <p>So you might have to change a few include statements to
         find these classes in their new locations. And you might have
         to rename some of the names of the classes, if you used any of
         the ones whose names were changed.</p>

     </s2>


     <anchor name="LoadableMessageText"/>
     <s2 title="Loadable Message Text">

         <p>The system now supoprts loadable message text, instead of
         having it hard coded into the program. The current drop still
         just supports English, but it can now support other
         languages. Anyone interested in contributing any translations
         should contact us. This would be an extremely useful
         service.</p>

         <p>In order to support the local message loading services, we
         have created a pretty flexible framework for supporting
         loadable text. Firstly, there is now an XML file, in the
         src/NLS/ directory, which contains all of the error messages.
         There is a simple program, in the Tools/NLSXlat/ directory,
         which can spit out that text in various formats. It currently
         supports a simple 'in memory' format (i.e. an array of
         strings), the Win32 resource format, and the message catalog
         format.  The 'in memory' format is intended for very simple
         installations or for use when porting to a new platform (since
         you can use it until you can get your own local message
         loading support done.)</p>

         <p>In the src/util/ directory, there is now an XMLMsgLoader
         class.  This is an abstraction from which any number of
         message loading services can be derived. Your platform driver
         file can create whichever type of message loader it wants to
         use on that platform.  We currently have versions for the in
         memory format, the Win32 resource format, and the message
         catalog format. An ICU one is present but not implemented
         yet. Some of the platforms can support multiple message
         loaders, in which case a #define token is used to control
         which one is used. You can set this in your build projects to
         control the message loader type used.</p>

         <p>Both the Java and C++ parsers emit the same messages for an XML error
         since they are being taken from the same message file.</p>

     </s2>


     <anchor name="PluggableValidators"/>
     <s2 title="Pluggable Validators">

         <p>In a preliminary move to support Schemas, and to make them
         first class citizens just like DTDs, the system has been
         reworked internally to make validators completely pluggable.
         So now the DTD validator code is under the src/validators/DTD/
         directory, with a future Schema validator probably going into
         the src/validators. The core scanner architecture now works
         completely in terms of the framework/XMLValidator abstract
         interface and knows almost nothing about DTDs or Schemas. For
         now, if you don't pass in a validator to the parsers, they
         will just create a DTDValidator. This means that,
         theoretically, you could write your own validator. But we
         would not encourage this for a while, until the semantics of
         the XMLValidator interface are completely worked out and
         proven to handle DTD and Schema cleanly.</p>

     </s2>


     <anchor name="PluggableTranscoders"/>
     <s2 title="Pluggable Transcoders">

         <p>Another abstract framework added in the src/util/ directory
         is to support pluggable transcoding services. The
         XMLTransService class is an abtract API that can be derived
         from, to support any desired transcoding
         service. XMLTranscoder is the abstract API for a particular
         instance of a transcoder for a particular encoding. The
         platform driver file decides what specific type of transcoder
         to use, which allows each platform to use its native
         transcoding services, or the ICU service if desired.</p>

         <p>Implementations are provided for Win32 native services, ICU
         services, and the <ref>iconv</ref> services available on many
         Unix platforms. The Win32 version only provides native code
         page services, so it can only handle XML code in the intrinsic
         encodings ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4
         (Big/Small Endian), EBCDIC code pages IBM037 and
         IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. The ICU version
         provides all of the encodings that ICU supports. The
         <ref>iconv</ref> version will support the encodings supported
         by the local system. You can use transcoders we provide or
         create your own if you feel ours are insufficient in some way,
         or if your platform requires an implementation that we do not
         provide.</p>

     </s2>


     <anchor name="UtilReorg"/>
     <s2 title="Util directory Reorganization">

         <p>The src/util directory was becoming somewhat of a dumping
         ground of platform and compiler stuff. So we reworked that
         directory to better spread things out. The new scheme is:
         </p>

         <anchor name="UtilPlatform"/>
         <s3 title="util - The platform independent utility stuff">
             <ul>
                 <li>MsgLoaders - Holds the msg loader implementations</li>
                 <ol>
                     <li>ICU</li>
                     <li>InMemory</li>
                     <li>MsgCatalog</li>
                     <li>Win32</li>
                 </ol>
                 <li>Compilers - All the compiler specific files</li>
                 <li>Transcoders - Holds the transcoder implementations</li>
                 <ol>
                     <li>Iconv</li>
                     <li>ICU</li>
                     <li>Win32</li>
                 </ol>
                 <li>Platforms</li>
                 <ol>
                     <li>AIX</li>
                     <li>HP-UX</li>
                     <li>Linux</li>
                     <li>Solaris</li>
                     <li>....</li>
                     <li>Win32</li>
                 </ol>
             </ul>
         </s3>

         <p>This organization makes things much easier to understand.
         And it makes it easier to find which files you need and which
         are optional. Note that only per-platform files have any hard
         coded references to specific message loaders or
         transcoders. So if you don't include the ICU implementations
         of these services, you don't need to link in ICU or use any
         ICU headers. The rest of the system works only in terms of the
         abstraction APIs.</p>

     </s2>

 </s1>
	<?xml version="1.0" standalone="no"?>
	<!DOCTYPE s1 SYSTEM "./dtd/document.dtd">

	<s1 title="Migrating from XML4C 2.x">
	<p>This document is a discussion of the technical differences between
	XML4C 2.x code base and the new &XercesCName; &XercesCVersion; code base.</p>

	<p>Topics discussed are:</p>
	<ul>
	<li><link anchor="GenImprovements">General Improvements</link></li>
	<ul>
	<li><link anchor="Compliance">Compliance</link></li>
	<li><link anchor="BugFixes">Bug Fixes</link></li>
	<li><link anchor="Speed">Speed</link></li>
	</ul>
	<li><link anchor="Summary">Summary of changes required to migrate from XML4C 2.x to &XercesCName; &XercesCVersion;</link></li>
	<li><link anchor="Samples">The Samples</link></li>
	<li><link anchor="ParserClasses">Parser Classes</link></li>
	<li><link anchor="DOMLevel2">DOM Level 2 support</link></li>
	<li><link anchor="Progressive">Progressive Parsing</link></li>
	<li><link anchor="Namespace">Namespace support</link></li>
	<li><link anchor="MovedToSrcFramework">Moved Classes to src/framework</link></li>
	<li><link anchor="LoadableMessageText">Loadable Message Text</link></li>
	<li><link anchor="PluggableValidators">Pluggable Validators</link></li>
	<li><link anchor="PluggableTranscoders">Pluggable Transcoders</link></li>
	<li><link anchor="UtilReorg">Util directory Reorganization</link></li>
	<ul>
	<li><link anchor="UtilPlatform">util - The platform independent utility stuff</link></li>
	</ul>
	</ul>



	<anchor name="GenImprovements"/>
	<s2 title="General Improvements">

	<p>The new version is improved in many ways. Some general improvements
	are: significantly better conformance to the XML spec, cleaner
	internal architecture, many bug fixes, and faster speed.</p>

	<anchor name="Compliance"/>
	<s3 title="Compliance">
	<p>Except for a couple of the very obscure (mostly related to
	the 'standalone' mode), this version should be quite compliant.
	We have more than a thousand tests, some collected from various
	public sources and some IBM generated, which are used to do
	regression testing. The C++ parser is now passing all but a
	handful of them.</p>
	</s3>

	<anchor name="BugFixes"/>
	<s3 title="Bug Fixes">
	<p>This version has many bug fixes with regard to XML4C version 2.x.
	Some of these were reported by users and some were brought up by
	way of the conformance testing.</p>
	</s3>

	<anchor name="Speed"/>
	<s3 title="Speed">
	<p>Much work was done to speed up this version. Some of the
	new features, such as namespaces, and conformance checks ended
	up eating up some of these gains, but overall the new version
	is significantly faster than previous versions, even while doing
	more.</p>
	</s3>
	</s2>


	<anchor name="Summary"/>
	<s2 title="Summary of changes required to migrate from XML4C 2.x to &XercesCName; &XercesCVersion;">

	<p>As mentioned, there are some major architectural changes
	between the 2.3.x and &XercesCName; &XercesCVersion; releases
	of the parser, and as a result the code has undergone
	significant restructuring. The list below mentions the public
	api's which existed in 2.3.x and no longer exist in
	&XercesCName; &XercesCVersion;. It also mentions the
	&XercesCName; &XercesCVersion; api which will give you the
	same functionality. Note: This list is not exhaustive. The
	API docs (and ultimately the header files) supplement this
	information.</p>

	<ul>

	<li><code>parsers/[Non]Validating[DOM/SAX]parser.hpp</code><br/>
	These files/classes have all been consolidated in the new
	version to just two files/classes:
	<code>[DOM/SAX]Parser.hpp</code>. Validation is now a
	property which may be set before invoking the
	<code>parse</code>. Now, the
	<code>setDoValidation()</code> method controls the
	validation processing.</li>

	<li>The <code>framework/XMLDocumentTypeHandler.hpp</code>
	been replaced with
	<code>validators/DTD/DocTypeHandler.hpp</code>.</li>

	<li>The following methods now have different set of
	parameters because the underlying base class methods have
	changed in the 3.x release. These methods belong to one of
	<code>XMLDocumentHandler</code>,
	<code>XMLErrorReporter</code> or
	<code>DocTypeHandler</code> interfaces.</li>
	<ul>
	<li><code>[Non]Validating[DOM/SAX]Parser::docComment</code></li>
	<li><code>[Non]Validating[DOM/SAX]Parser::doctypePI</code></li>
	<li><code>[Non]ValidatingSAXParser::elementDecl</code></li>
	<li><code>[Non]ValidatingSAXParser::endAttList</code></li>
	<li><code>[Non]ValidatingSAXParser::entityDecl</code></li>
	<li><code>[Non]ValidatingSAXParser::notationDecl</code></li>
	<li><code>[Non]ValidatingSAXParser::startAttList</code></li>
	<li><code>[Non]ValidatingSAXParser::TextDecl</code></li>
	<li><code>[Non]ValidatingSAXParser::docComment</code></li>
	<li><code>[Non]ValidatingSAXParser::docPI</code></li>
	<li><code>[Non]Validating[DOM/SAX]Parser::endElement</code></li>
	<li><code>[Non]Validating[DOM/SAX]Parser::startElement</code></li>
	<li><code>[Non]Validating[DOM/SAX]Parser::XMLDecl</code></li>
	<li><code>[Non]Validating[DOM/SAX]Parser::error</code></li>
	</ul>

	<li>The following methods/data members changed visibility
	from <code>protected</code> in 2.3.x to
	<code>private</code> (with public setters and getters, as
	appropriate).</li>

	<ul>
	<li><code>[Non]ValidatingDOMParser::fDocument</code></li>
	<li><code>[Non]ValidatingDOMParser::fCurrentParent</code></li>
	<li><code>[Non]ValidatingDOMParser::fCurrentNode</code></li>
	<li><code>[Non]ValidatingDOMParser::fNodeStack</code></li>
	</ul>


	<li>The following files have moved, possibly requiring
	changes in the <code>#include</code> statements.</li>

	<ul>
	<li><code>MemBufInputSource.hpp</code></li>
	<li><code>StdInInputSource.hpp</code></li>
	<li><code>URLInputSource.hpp</code></li>
	</ul>


	<li>All the DTD validator code was moved from
	<code>internal</code> to separate
	<code>validators/DTD</code> directory.</li>

	<li>The error code definitions which were earlier in
	<code>internal/ErrorCodes.hpp</code> are now splitup into
	the following files:</li>

	<ul>
	<li><code>framework/XMLErrorCodes.hpp </code> - Core XML errors</li>
	<li><code>framework/XMLValidityCodes.hpp</code> - DTD validity errors</li>
	<li><code>util/XMLExceptMsgs.hpp </code> - C++ specific exception codes.</li>
	</ul>
	</ul>

	</s2>



	<anchor name="Samples"/>
	<s2 title="The Samples">

	<p>The sample programs no longer use any of the unsupported
	util/xxx classes. They only existed to allow us to write
	portable samples. But, since we feel that the wide character
	APIs are supported on a lot of platforms these days, it was
	decided to go ahead and just write the samples in terms of
	these. If your system does not support these APIs, you will
	not be able to build and run the samples. On some platforms,
	these APIs might perhaps be optional packages or require
	runtime updates or some such action.</p>

	<p>More samples have been added as well. These highlight some
	of the new functionality introduced in the new code base. And
	the existing ones have been cleaned up as well.</p>

	<p>The new samples are:</p>
	<ol>
	<li>PParse - Demonstrates 'progressive parse' (see below)</li>
	<li>StdInParse - Demonstrates use of the standard in input source</li>
	<li>EnumVal - Shows how to enumerate the markup decls in a DTD Validator</li>
	</ol>
	</s2>


	<anchor name="ParserClasses"/>
	<s2 title="Parser Classes">

	<p>In the XML4C 2.x code base, there were the following parser
	classes (in the src/parsers/ source directory):
	NonValidatingSAXParser, ValidatingSAXParser,
	NonValidatingDOMParser, ValidatingDOMParser. The
	non-validating ones were the base classes and the validating
	ones just derived from them and turned on the validation.
	This was deemed a little bit overblown, considering the tiny
	amount of code required to turn on validation and the fact
	that it makes people use a pointer to the parser in most cases
	(if they needed to support either validating or non-validating
	versions.)</p>

	<p>The new code base just has SAXParer and DOMParser
	classes. These are capable of handling both validating and
	non-validating modes, according to the state of a flag that
	you can set on them. For instance, here is a code snippet that
	shows this in action.</p>

	<source>void ParseThis(const XMLCh* const fileToParse,
	const bool validate)
	{
	//
	// Create a SAXParser. It can now just be
	// created by value on the stack if we want
	// to parse something within this scope.
	//
	SAXParser myParser;

	// Tell it whether to validate or not
	myParser.setDoValidation(validate);

	// Parse and catch exceptions...
	try
	{
	myParser.parse(fileToParse);
	}
	...
	};</source>

	<p>We feel that this is a simpler architecture, and that it makes things
	easier for you. In the above example, for instance, the parser will be
	cleaned up for you automatically upon exit since you don't have to
	allocate it anymore.</p>

	</s2>


	<anchor name="DOMLevel2"/>
	<s2 title="DOM Level 2 support">

	<p>Experimental early support for some parts of the DOM level
	2 specification have been added. These address some of the
	shortcomings in our DOM implementation,
	such as a simple, standard mechanism for tree traversal.</p>

	</s2>


	<anchor name="Progressive"/>
	<s2 title="Progressive Parsing">

	<p>The new parser classes support, in addition to the
	<ref>parse()</ref> method, two new parsing methods,
	<ref>parseFirst()</ref> and <ref>parseNext()</ref>. These are
	designed to support 'progressive parsing', so that you don't
	have to depend upon throwing an exception to terminate the
	parsing operation. Calling parseFirst() will cause the DTD (or
	in the future, Schema) to be parsed (both internal and
	external subsets) and any pre-content, i.e. everything up to
	but not including the root element. Subsequent calls to
	parseNext() will cause one more pieces of markup to be parsed,
	and spit out from the core scanning code to the parser (and
	hence either on to you if using SAX or into the DOM tree if
	using DOM.) You can quit the parse any time by just not
	calling parseNext() anymore and breaking out of the loop. When
	you call parseNext() and the end of the root element is the
	next piece of markup, the parser will continue on to the end
	of the file and return false, to let you know that the parse
	is done. So a typical progressive parse loop will look like
	this:</p>

	<source>// Create a progressive scan token
	XMLPScanToken token;

	if (!parser.parseFirst(xmlFile, token))
	{
	cerr << "scanFirst() failed\n" << endl;
	return 1;
	}

	//
	// We started ok, so lets call scanNext()
	// until we find what we want or hit the end.
	//
	bool gotMore = true;
	while (gotMore && !handler.getDone())
	gotMore = parser.parseNext(token);</source>

	<p>In this case, our event handler object (named 'handler'
	surprisingly enough) is watching form some criteria and will
	return a status from its getDone() method. Since the handler
	sees the SAX events coming out of the SAXParser, it can tell
	when it finds what it wants. So we loop until we get no more
	data or our handler indicates that it saw what it wanted to
	see.</p>

	<p>When doing non-progressive parses, the parser can easily
	know when the parse is complete and insure that any used
	resources are cleaned up. Even in the case of a fatal parsing
	error, it can clean up all per-parse resources. However, when
	progressive parsing is done, the client code doing the parse
	loop might choose to stop the parse before the end of the
	primary file is reached. In such cases, the parser will not
	know that the parse has ended, so any resources will not be
	reclaimed until the parser is destroyed or another parse is started.</p>

	<p>This might not seem like such a bad thing; however, in this case,
	the files and sockets which were opened in order to parse the
	referenced XML entities will remain open. This could cause
	serious problems. Therefore, you should destroy the parser instance
	in such cases, or restart another parse immediately. In a future
	release, a reset method will be provided to do this more cleanly.</p>

	<p>Also note that you must create a scan token and pass it
	back in on each call. This insures that things don't get done
	out of sequence. When you call parseFirst() or parse(), any
	previous scan tokens are invalidated and will cause an error
	if used again. This prevents incorrect mixed use of the two
	different parsing schemes or incorrect calls to
	parseNext().</p>

	</s2>


	<anchor name="Namespace"/>
	<s2 title="Namespace support">

	<p>The C++ parser now supports namespaces. With current XML
	interfaces (SAX/DOM) this doesn't mean very much because these
	APIs are incapable of passing on the namespace information.
	However, if you are using our internal APIs to write your own
	parsers, you can make use of this new information. Since the
	internal event APIs must be able to now support both namespace
	and non-namespace information, they have more
	parameters. These allow namespace information to be passed
	along.</p>

	<p>Most of the samples now have a new command line parameter
	to turn on namespace support. You turn on namespaces like
	this:</p>

	<source>SAXParser myParser;
	// Tell it whether to do namespace
	myParser.setDoNamespaces(true);</source>
	</s2>



	<anchor name="MovedToSrcFramework"/>
	<s2 title="Moved Classes to src/framework">

	<p>Some of the classes previously in the src/internal/
	directory have been moved to their more correct location in
	the src/framework/ directory. These are classes used by the
	outside world and should have been framework classes to begin
	with. Also, to avoid name classes in the absense of C++ namespace
	support, some of these clashes have been renamed to make them
	more XML specific and less likely to clash. More
	classes might end up being moved to framework as well.</p>

	<p>So you might have to change a few include statements to
	find these classes in their new locations. And you might have
	to rename some of the names of the classes, if you used any of
	the ones whose names were changed.</p>

	</s2>


	<anchor name="LoadableMessageText"/>
	<s2 title="Loadable Message Text">

	<p>The system now supoprts loadable message text, instead of
	having it hard coded into the program. The current drop still
	just supports English, but it can now support other
	languages. Anyone interested in contributing any translations
	should contact us. This would be an extremely useful
	service.</p>

	<p>In order to support the local message loading services, we
	have created a pretty flexible framework for supporting
	loadable text. Firstly, there is now an XML file, in the
	src/NLS/ directory, which contains all of the error messages.
	There is a simple program, in the Tools/NLSXlat/ directory,
	which can spit out that text in various formats. It currently
	supports a simple 'in memory' format (i.e. an array of
	strings), the Win32 resource format, and the message catalog
	format. The 'in memory' format is intended for very simple
	installations or for use when porting to a new platform (since
	you can use it until you can get your own local message
	loading support done.)</p>

	<p>In the src/util/ directory, there is now an XMLMsgLoader
	class. This is an abstraction from which any number of
	message loading services can be derived. Your platform driver
	file can create whichever type of message loader it wants to
	use on that platform. We currently have versions for the in
	memory format, the Win32 resource format, and the message
	catalog format. An ICU one is present but not implemented
	yet. Some of the platforms can support multiple message
	loaders, in which case a #define token is used to control
	which one is used. You can set this in your build projects to
	control the message loader type used.</p>

	<p>Both the Java and C++ parsers emit the same messages for an XML error
	since they are being taken from the same message file.</p>

	</s2>


	<anchor name="PluggableValidators"/>
	<s2 title="Pluggable Validators">

	<p>In a preliminary move to support Schemas, and to make them
	first class citizens just like DTDs, the system has been
	reworked internally to make validators completely pluggable.
	So now the DTD validator code is under the src/validators/DTD/
	directory, with a future Schema validator probably going into
	the src/validators. The core scanner architecture now works
	completely in terms of the framework/XMLValidator abstract
	interface and knows almost nothing about DTDs or Schemas. For
	now, if you don't pass in a validator to the parsers, they
	will just create a DTDValidator. This means that,
	theoretically, you could write your own validator. But we
	would not encourage this for a while, until the semantics of
	the XMLValidator interface are completely worked out and
	proven to handle DTD and Schema cleanly.</p>

	</s2>


	<anchor name="PluggableTranscoders"/>
	<s2 title="Pluggable Transcoders">

	<p>Another abstract framework added in the src/util/ directory
	is to support pluggable transcoding services. The
	XMLTransService class is an abtract API that can be derived
	from, to support any desired transcoding
	service. XMLTranscoder is the abstract API for a particular
	instance of a transcoder for a particular encoding. The
	platform driver file decides what specific type of transcoder
	to use, which allows each platform to use its native
	transcoding services, or the ICU service if desired.</p>

	<p>Implementations are provided for Win32 native services, ICU
	services, and the <ref>iconv</ref> services available on many
	Unix platforms. The Win32 version only provides native code
	page services, so it can only handle XML code in the intrinsic
	encodings ASCII, UTF-8, UTF-16 (Big/Small Endian), UCS4
	(Big/Small Endian), EBCDIC code pages IBM037 and
	IBM1140 encodings, ISO-8859-1 (aka Latin1) and Windows-1252. The ICU version
	provides all of the encodings that ICU supports. The
	<ref>iconv</ref> version will support the encodings supported
	by the local system. You can use transcoders we provide or
	create your own if you feel ours are insufficient in some way,
	or if your platform requires an implementation that we do not
	provide.</p>

	</s2>


	<anchor name="UtilReorg"/>
	<s2 title="Util directory Reorganization">

	<p>The src/util directory was becoming somewhat of a dumping
	ground of platform and compiler stuff. So we reworked that
	directory to better spread things out. The new scheme is:
	</p>

	<anchor name="UtilPlatform"/>
	<s3 title="util - The platform independent utility stuff">
	<ul>
	<li>MsgLoaders - Holds the msg loader implementations</li>
	<ol>
	<li>ICU</li>
	<li>InMemory</li>
	<li>MsgCatalog</li>
	<li>Win32</li>
	</ol>
	<li>Compilers - All the compiler specific files</li>
	<li>Transcoders - Holds the transcoder implementations</li>
	<ol>
	<li>Iconv</li>
	<li>ICU</li>
	<li>Win32</li>
	</ol>
	<li>Platforms</li>
	<ol>
	<li>AIX</li>
	<li>HP-UX</li>
	<li>Linux</li>
	<li>Solaris</li>
	<li>....</li>
	<li>Win32</li>
	</ol>
	</ul>
	</s3>

	<p>This organization makes things much easier to understand.
	And it makes it easier to find which files you need and which
	are optional. Note that only per-platform files have any hard
	coded references to specific message loaders or
	transcoders. So if you don't include the ICU implementations
	of these services, you don't need to link in ICU or use any
	ICU headers. The rest of the system works only in terms of the
	abstraction APIs.</p>

	</s2>

	</s1>