blob: 18a99023cc6ee8e79a5b7364c9d69f187fb802c4 [file] [log] [blame]
#########################################################################
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#########################################################################
== overview
The files in this directory are intended as an example of how to use
the Apache Digester to parse "document-markup" style xml. It also serves as an
example of how to subclass the main Digester class in order to extend
its functionality.
By "document-markup" xml, we mean input like XHTML, where the data is valid
xml and where some elements contain interleaved text and child elements.
For example, "<p>Hi, <i>this</i> is some <b>document-style</b> xml.</p>"
Topics covered:
* how to subclass digester
* how to process markup-style xml.
== compiling and running
* to compile:
mvn compile
* to build the jar artifact
mvn package
* to run:
mvn verify
Alternatively, you can set up your CLASSPATH appropriately, and
run the example directly.
== Notes
The primary use of the Digester is to process xml configuration files.
Such files do not typically interleave text and child elements in the
style encountered with document markup. The standard Digester behaviour is
therefore to accumulate all text within an xml element's body (of which there is
expected to be only one "segment") and present it to a Rule or user method
as a single string.
While this significantly simplifies the implementation of Rule classes for
the primary Digester goal of parsing configuration files, this process of
simplifying all text within an element into a single string "loses" critical
information necessary to correctly parse "document-markup" xml.
This example shows one method of extending the Digester class to resolve
this issue..
At some time the ability to process "document-markup" style xml may be built
into the standard Digester class.