blob: 6a1c332ab27fbd8f97c7d332aaeaed9d8296fe3f [file] [log] [blame]
$Id: $
Commons Digester Package
Version 2.0 alpha
Release Notes
INTRODUCTION:
============
The Apache Jakarta Commons Digester
Release 2.0 of the Apache Jakarta Commons Digester package is a significant
rewrite of the original package. All the fundamental concepts remain the same,
but the APIs have been redesigned based on the lessons learnt from the 1.x
series of releases.
IMPORTANT NOTES
===============
Dependencies
------------
The 2.0 Digester release requires:
Logging 1.0.x + BeanUtils 1.7
MAJOR CHANGES SINCE 1.x
=======================
This section is intended for the use of those familiar with the 1.x releases
of this product. There are many changes, but those listed below are the
most significant. Mostly, this information is restricted to listing changes
in *functionality*; only a few implementation-level changes are listed here.
Versioning
----------
At the current time, the new code uses the package name
org.apache.commons.digester2.*
There will no doubt be debate over whether this is a good idea, or whether
the original
org.apache.commons.digester.*
package names should be used.
General principles
------------------
* Protected members are not used for classes in the o.a.c.digester2 package.
Instead, members are private, and protected setter/getter methods are provided
where needed. This makes it easier in future to change classes without
breaking existing subclasses that have been defined by users of the Digester
classes.
* It is still undecided whether concrete Action classes should follow the above
approach or use protected members.
Renamed/repackaged classes
----------------
* Rule --> Action
The term "rule" has confused a number of people over the years. The new
and hopefully clearer term "action" is used instead. The word "rule" is
now used only to refer to a (pattern, action) pair, which is more intuitive.
* Rules --> RuleManager
The word "Rules" to mean *not* a collection of Rule objects, but instead
the pattern-matching engine that happens to *contain* a collection of Rule
objects was always confusing.
* RulesBase --> DefaultRuleManager
This should speak for itself.
* All the basic action classes (formerly Rule classes) now reside in the
o.a.c.digester2.actions package.
* Renamed actions:
NodeCreateRule --> CreateNodeRule
ObjectCreateRule --> CreateObjectAction
FactoryCreateRule --> CreateObjectWithFactoryAction
ObjectCreationFactory --> ObjectFactory
AbstractObjectCreationFactory --> AbstractObjectFactory
Digester class
------------------
* Digester has been split into:
* Digester
* SAXHandler
* Context
* ActionFactory
The old Digester interface had a huge number of methods. Many of these
were only because Digester also implemented the interfaces necessary to:
(a) handle the SAX parser callbacks, and
(b) for the Rule (now Action) classes to store data on it during the
parse (the object stack etc).
(c) conveniently create Rule (now Action) instances.
These pieces of functionality have now been split out into separate
classes, so:
* Digester now contains only the basic methods that users of the
library need to interact with.
* SAXHandler handles the callbacks from the parser
* Context holds the object stack, current match path, and related data.
* ActionFactory provides the factory methods to conveniently create,
configure and add Action objects to a Digester or RuleManager. Moving
this functionality out of the Digester object also allows the Digester
class to be distributed with a subset (including none) of the default
Action classes if desired.
Note that because parsing state is stored on the Context object now, it
is easier to implement the often-requested feature of being able to parse
multiple xml documents with the same Digester instance.
Namespace-aware parsing
-----------------------
The Digester now *always* uses a namespace-aware xml parser.
The DefaultRuleManager patterns properly support namespaces, eg
/ns1:foo/ns2:bar/baz
where the URIs that ns1 and ns2 correspond to have been defined via
earlier calls to method DefaultRuleManager.addNamespace(prefix, uri).
Entity Resolution
-----------------
The basic functionality previously provided for entity resolution has been
improved.
* By default any attempt to access an external entity which has not
been explicitly mapped to some (presumably local) resource is regarded as a
fatal error. See setAllowUnknownExternalEntities
* External DTDs can be ignored. Yes, this has dangers, but sometimes it is
necessary. See setIgnoreExternalDTD.
DefaultRuleManager
------------------
The DefaultRuleManager (formerly RulesBase) now uses a more xpath-like
syntax for its patterns. It still isn't full xpath support, just a little
closer for general consistency. In particular, a leading slash is required
on absolute paths. A pattern with no leading slash is a relative path, and
is equivalent to the old "*/" prefix.
Action (formerly Rule) API changes
-----------------------------------
* Action is an interface. The AbstractAction class has been defined and is
the recommended base for all custom actions.
* Action classes no longer have a "digester" member pointing to their "owner".
Instead, the begin/body/end methods are always passed a Context object that
allows them to access the object stack etc.
* Action classes are required to avoid modification of any member variable
during parsing (ie from their begin/body/end methods). All data must instead
be stored on the provided Context object. This effectively makes an Action
instance both re-entrant and thread-safe.
* The two regulations above mean that an Action instance can now be used
concurrently by multiple Digester instances (eg in a pool).
* Deprecated methods have been removed.
* Actions get "bodySegment" callbacks when their content is mixed
text and child elements. This allows Actions to process XHTML-style
markup input more easily.
* Actions get a new "beginParse" callback when startDocument occurs.
* method finish renamed to finishParse
SetPropertiesAction
-------------------
* The option now exists to specify the custom attr->property mapping via a
Map parameter, not just a pair of String arrays. This is much nicer.
* hyphenated xml attribute names are now automatically mapped to camelCase,
eg some-attr="1" causes a call to setSomeAttr("1").
CreateNodeAction
----------------
* It is now possible to create DOM1 (ie non-namespaced) nodes and attributes
even when the parser being used is namespace-aware.
* Namespace-aware elements and attributes are created by default
* The implementation has changed; rather than redirecting the xml parser
to itself, the SAXHandler object is requested to forward ContentHandler
calls to itself. This has no externally-visible effect, but makes the
implementation much cleaner (esp. cleanup after a parse failure).
CreateObjectAction
------------------
* The ignoreCreateException functionality has been removed. I'm not sure
what use-cases it supports, or whether anybody actually uses it. The code
is rather complex and nasty, so if someone really needs this functionality
they can complain, and we can add it back in later with sufficient comments
to allow future maintainers to know when the feature is useful...
Exceptions
-----------
A lot more methods are declared to throw explicit Exceptions, which should
result in more reliable and explicit error-handling.
Terminology
------------
The word "pattern" is now used exclusively for a string that is interpreted
by a RuleManager instance.
The word "path" is now used for a string that describes an absolute path
from the root document node to the current xml element. When a pattern
matches the path, the associated Action is executed.
Xml-rules
------------
The xmlrules module has not yet been reimplemented. However the following
changes are planned:
* A RuleManager instance will be returned rather than a Digester.
Because a RuleManager is thread-safe, this allows a pool of Digester
instances to be configured with this object without having to reparse
the xmlrules input file.
* the xmlrules file will be able to specify what RuleManager subclass
is desired (with the default being the DefaultRuleManager class).
* The rule parser constructor will take a list of Action (formerly Rule)
classes, and will auto-configure itself by using reflection against these
classes rather than the current system where code is written for each
Rule class.
* Because the list of Actions to support is passed in at runtime, the rule
parser class will not have explicit dependencies upon the default actions.
This allows the class to be distributed without the set of default actions
if desired. The ActionFactory class will provide a factory method for
creating a rule parser instance which knows about all the default actions
* The input xmlrules file will be able to specify custom action classes.
Other notes
-----------
* The Digester class now only deals in XMLReader rather than SAXParser.
This shouldn't remove any functionality, just simplify the code.
* The default errorHandler methods now throw an exception for errors and
fatal-errors reported by the parser rather than the old behaviour of just
logging the error then continuing.
* ParserFeatureSetterFactory and related classes have not been reimplemented,
and will not be reimplemented by me. If they are wanted, someone else will
have to do this.
* I haven't implemented RuleSets. Are they useful to anyone?
* the peek and pop methods on the digester, parameter and named stacks
now throw an exception if misused rather than return null.
Still TO-DO
------------
* Think about alternative ways of performing logging.
* Think about how to support pattern syntax of "/foo[@attr=value]" style.
This may require a quite different API for RuleManager, so that RuleManager
is passed the actual Elements required, rather than a string representing
just the current path.
* break up CallParamAction into multiple simpler actions
* refactor CallMethodAction to clean up its constructor.
* Fix rules that store data on themselves.
* Think about resolving dependency issues on Beanutils by allowing digester
to use beanutils via a local classloader. That means that it is ok to use
digester even in a situation where another version of beanutils is the
default.
* sort out schemaLocation/schemaLanguage mess.
* support rules to handle processing instructions.
* look into moving from BeanUtils to Morph, as BeanUtils has a lot of
functionality we don't use.