blob: 8bb916fd366a0e7388750619a06b4803a9e35ed9 [file] [log] [blame]
<?xml version="1.0" standalone="no"?>
<!DOCTYPE s1 SYSTEM "sbk:/style/dtd/document.dtd">
<s1 title="Programming Guide">
<p>This page has sections on the following topics:</p>
<ul>
<li><link anchor="SAXProgGuide">SAX Programming Guide</link></li>
<ul>
<li><link anchor="ConstructParser">Constructing a parser</link></li>
<li><link anchor="UsingSAXAPI">Using the SAX API</link></li>
</ul>
<li><link anchor="DOMProgGuide">DOM Programming Guide</link></li>
<ul>
<li><link anchor="JAVAandCPP">Comparision of Java and C++ DOM's</link></li>
<ul>
<li><link anchor="AccessAPI">Accessing the API from application code</link></li>
<li><link anchor="ClassNames">Class Names</link></li>
<li><link anchor="ObjMemMgmt">Objects and Memory Management</link></li>
</ul>
<li><link anchor="DOMString">DOMString</link></li>
<ul>
<li><link anchor="EqualityTesting">Equality Testing</link></li>
</ul>
<li><link anchor="Downcasting">Downcasting</link></li>
<li><link anchor="Subclassing">Subclassing</link></li>
</ul>
</ul>
<anchor name="SAXProgGuide"/>
<s2 title="SAX Programming Guide">
<anchor name="ConstructParser"/>
<s3 title="Constructing a parser">
<p>In order to use &XercesCName; to parse XML files, you will
need to create an instance of the SAXParser class. The example
below shows the code you need in order to create an instance
of SAXParser. The DocumentHandler and ErrorHandler instances
required by the SAX API are provided using the HandlerBase
class supplied with &XercesCName;.</p>
<source>
int main (int argc, char* args[]) {
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException&amp; toCatch) {
cout &lt;&lt; "Error during initialization! :\n"
&lt;&lt; toCatch.getMessage() &lt;&lt; "\n";
return 1;
}
char* xmlFile = "x1.xml";
SAXParser* parser = new SAXParser();
parser->setDoValidation(true); // optional.
parser->setDoNamespaces(true); // optional
DocumentHandler* docHandler = new HandlerBase();
ErrorHandler* errHandler = (ErrorHandler*) docHandler;
parser->setDocumentHandler(docHandler);
parser->setErrorHandler(errHandler);
try {
parser->parse(xmlFile);
}
catch (const XMLException&amp; toCatch) {
cout &lt;&lt; "\nFile not found: '" &lt;&lt; xmlFile &lt;&lt; "'\n"
&lt;&lt; "Exception message is: \n"
&lt;&lt; toCatch.getMessage() &lt;&lt; "\n" ;
return -1;
}
}</source>
</s3>
<anchor name="UsingSAXAPI"/>
<s3 title="Using the SAX API">
<p>The SAX API for XML parsers was originally developed for
Java. Please be aware that there is no standard SAX API for
C++, and that use of the &XercesCName; SAX API does not
guarantee client code compatibility with other C++ XML
parsers.</p>
<p>The SAX API presents a callback based API to the parser. An
application that uses SAX provides an instance of a handler
class to the parser. When the parser detects XML constructs,
it calls the methods of the handler class, passing them
information about the construct that was detected. The most
commonly used handler classes are DocumentHandler which is
called when XML constructs are recognized, and ErrorHandler
which is called when an error occurs. The header files for the
various SAX handler classes are in
'&lt;&XercesCInstallDir;>/include/sax'</p>
<p>As a convenience, &XercesCName; provides the class
HandlerBase, which is a single class which is publicly derived
from all the Handler classes. HandlerBase's default
implementation of the handler callback methods is to do
nothing. A convenient way to get started with &XercesCName; is
to derive your own handler class from HandlerBase and override
just those methods in HandlerBase which you are interested in
customizing. This simple example shows how to create a handler
which will print element names, and print fatal error
messages. The source code for the sample applications show
additional examples of how to write handler classes.</p>
<p>This is the header file MySAXHandler.hpp:</p>
<source>
#include &lt;sax/HandlerBase.hpp>
class MySAXHandler : public HandlerBase {
public:
void startElement(const XMLCh* const, AttributeList&amp;);
void fatalError(const SAXParseException&amp;);
};</source>
<p>This is the implementation file MySAXHandler.cpp:</p>
<source>
#include "MySAXHandler.hpp"
#include &lt;iostream.h>
MySAXHandler::MySAXHandler()
{
}
MySAXHandler::startElement(const XMLCh* const name,
AttributeList&amp; attributes)
{
// transcode() is an user application defined function which
// converts unicode strings to usual 'char *'. Look at
// the sample program SAXCount for an example implementation.
cout &lt;&lt; "I saw element: " &lt;&lt; transcode(name) &lt;&lt; endl;
}
MySAXHandler::fatalError(const SAXParseException&amp; exception)
{
cout &lt;&lt; "Fatal Error: " &lt;&lt; transcode(exception.getMessage())
&lt;&lt; " at line: " &lt;&lt; exception.getLineNumber()
&lt;&lt; endl;
}</source>
<p>The XMLCh and AttributeList types are supplied by
&XercesCName; and are documented in the include
files. Examples of their usage appear in the source code to
the sample applications.</p>
</s3>
</s2>
<anchor name="DOMProgGuide"/>
<s2 title="DOM Programming Guide">
<anchor name="JAVAandCPP"/>
<s3 title="Java and C++ DOM comparisons">
<p>The C++ DOM API is very similar in design and use, to the
Java DOM API bindings. As a consequence, conversion of
existing Java code that makes use of the DOM to C++ is a
straight forward process.
</p>
<p>
This section outlines the differences between Java and C++ bindings.
</p>
</s3>
<anchor name="AccessAPI"/>
<s3 title="Accessing the API from application code">
<source>
// C++
#include &lt;dom/DOM.hpp></source>
<source>
// Java
import org.w3c.dom.*</source>
<p>The header file &lt;dom/DOM.hpp&gt; includes all the
individual headers for the DOM API classes. </p>
</s3>
<anchor name="ClassNames"/>
<s3 title="Class Names">
<p>The C++ class names are prefixed with "DOM_". The intent is
to prevent conflicts between DOM class names and other names
that may already be in use by an application or other
libraries that a DOM based application must link with.</p>
<p>The use of C++ namespaces would also have solved this
conflict problem, but for the fact that many compilers do not
yet support them.</p>
<source>
DOM_Document myDocument; // C++
DOM_Node aNode;
DOM_Text someText;</source>
<source>
Document myDocument; // Java
Node aNode;
Text someText;</source>
<p>If you wish to use the Java class names in C++, then you need
to typedef them in C++. This is not advisable for the general
case - conflicts really do occur - but can be very useful when
converting a body of existing Java code to C++.</p>
<source>
typedef DOM_Document Document;
typedef DOM_Node Node;
Document myDocument; // Now C++ usage is
// indistinguishable from Java
Node aNode;</source>
</s3>
<anchor name="ObjMemMgmt"/>
<s3 title="Objects and Memory Management">
<p>The C++ DOM implementation uses automatic memory management,
implemented using reference counting. As a result, the C++
code for most DOM operations is very similar to the equivalent
Java code, right down to the use of factory methods in the DOM
document class for nearly all object creation, and the lack of
any explicit object deletion.</p>
<p>Consider the following code snippets </p>
<source>
// This is C++
DOM_Node aNode;
aNode = someDocument.createElement("ElementName");
DOM_Node docRootNode = someDoc.getDocumentElement();
docRootNode.AppendChild(aNode);</source>
<source>
// This is Java
Node aNode;
aNode = someDocument.createElement("ElementName");
Node docRootNode = someDoc.getDocumentElement();
docRootNode.AppendChild(aNode);</source>
<p>The Java and the C++ are identical on the surface, except for
the class names, and this similarity remains true for most DOM
code. </p>
<p>However, Java and C++ handle objects in somewhat different
ways, making it important to understand a little bit of what
is going on beneath the surface.</p>
<p>In Java, the variable <code>aNode</code> is an object reference ,
essentially a pointer. It is initially == null, and references
an object only after the assignment statement in the second
line of the code.</p>
<p>In C++ the variable <code>aNode</code> is, from the C++ language's
perspective, an actual live object. It is constructed when the
first line of the code executes, and DOM_Node::operator = ()
executes at the second line. The C++ class DOM_Node
essentially a form of a smart-pointer; it implements much of
the behavior of a Java Object Reference variable, and
delegates the DOM behaviors to an implementation class that
lives behind the scenes. </p>
<p>Key points to remember when using the C++ DOM classes:</p>
<ul>
<li>Create them as local variables, or as member variables of
some other class. Never "new" a DOM object into the heap or
make an ordinary C pointer variable to one, as this will
greatly confuse the automatic memory management. </li>
<li>The "real" DOM objects - nodes, attributes, CData
sections, whatever, do live on the heap, are created with the
create... methods on class DOM_Document. DOM_Node and the
other DOM classes serve as reference variables to the
underlying heap objects.</li>
<li>The visible DOM classes may be freely copied (assigned),
passed as parameters to functions, or returned by value from
functions.</li>
<li>Memory management of the underlying DOM heap objects is
automatic, implemented by means of reference counting. So long
as some part of a document can be reached, directly or
indirectly, via reference variables that are still alive in
the application program, the corresponding document data will
stay alive in the heap. When all possible paths of access have
been closed off (all of the application's DOM objects have
gone out of scope) the heap data itself will be automatically
deleted. </li>
<li>There are restrictions on the ability to subclass the DOM
classes. </li>
</ul>
</s3>
<anchor name="DOMString"/>
<s3 title="DOMString">
<p>Class DOMString provides the mechanism for passing string
data to and from the DOM API. DOMString is not intended to be
a completely general string class, but rather to meet the
specific needs of the DOM API.</p>
<p>The design derives from two primary sources: from the DOM's
CharacterData interface and from class java.lang.string</p>
<p>Main features are:</p>
<ul>
<li>It stores Unicode text.</li>
<li>Automatic memory management, using reference counting.</li>
<li>DOMStrings are mutable - characters can be inserted,
deleted or appended.</li>
</ul>
<p></p>
<p>When a string is passed into a method of the DOM, when
setting the value of a Node, for example, the string is cloned
so that any subsequent alteration or reuse of the string by
the application will not alter the document contents.
Similarly, when strings from the document are returned to an
application via the DOM API, the string is cloned so that the
document can not be inadvertently altered by subsequent edits
to the string.</p>
<note>The ICU classes are a more general solution to UNICODE
character handling for C++ applications. ICU is an Open
Source Unicode library, available at the <jump
href="http://www.software.ibm.com/developerworks/opensource/icu/index.html">IBM
DeveloperWorks website</jump>.</note>
</s3>
<anchor name="EqualityTesting"/>
<s3 title="Equality Testing">
<p>The DOMString equality operators (and all of the rest of the
DOM class conventions) are modeled after the Java
equivalents. The equals() method compares the content of the
string, while the == operator checks whether the string
reference variables (the application program variables) refer
to the same underlying string in memory. This is also true of
DOM_Node, DOM_Element, etc., in that operator == tells whether
the variables in the application are referring to the same
actual node or not. It's all very Java-like </p>
<ul>
<li>bool operator == () is true if the DOMString variables
refer to the same underlying storage. </li>
<li>bool equals() is true if the strings contain the same
characters. </li>
</ul>
<p>Here is an example of how the equality operators work: </p>
<source>
DOMString a = "Hello";
DOMString b = a;
DOMString c = a.clone();
if (b == a) // This is true
if (a == c) // This is false
if (a.equals(c)) // This is true
b = b + " World";
if (b == a) // Still true, and the string's
// value is "Hello World"
if (a.equals(c)) // false. a is "Hello World";
// c is still "Hello".</source>
</s3>
<anchor name="Downcasting"/>
<s3 title="Downcasting">
<p>Application code sometimes must cast an object reference from
DOM_Node to one of the classes deriving from DOM_Node,
DOM_Element, for example. The syntax for doing this in C++ is
different from that in Java.</p>
<source>
// This is C++
DOM_Node aNode = someFunctionReturningNode();
DOM_Element el = (Element &amp;) aNode;</source>
<source>
// This is Java
Node aNode = someFunctionReturningNode();
Element el = (Element) aNode;</source>
<p>The C++ cast is not type-safe; the Java cast is checked for
compatible types at runtime. If necessary, a type-check can
be made in C++ using the node type information: </p>
<source>
// This is C++
DOM_Node aNode = someFunctionReturningNode();
DOM_Element el; // by default, el will == null.
if (anode.getNodeType() == DOM_Node::ELEMENT_NODE)
el = (Element &amp;) aNode;
else
// aNode does not refer to an element.
// Do something to recover here.</source>
</s3>
<anchor name="Subclassing"/>
<s3 title="Subclassing">
<p>The C++ DOM classes, DOM_Node, DOM_Attr, DOM_Document, etc.,
are not designed to be subclassed by an application
program. </p>
<p>As an alternative, the DOM_Node class provides a User Data
field for use by applications as a hook for extending nodes by
referencing additional data or objects. See the API
description for DOM_Node for details.</p>
</s3>
</s2>
</s1>