RuntimeData objects simplified, compile-memory-leak fixed.

All transient params for runtime data objects are gone.

All by-name passing of args to runtime structures is gone.

Compile-time objects no longer retained after compilation is over.

Cyclic graph constructions are more explicit.

Only a select few TransientParam are left, and they have been
converted to use the Delay type instead of using by-name passing. They
are all ones that actually involve circular relationships, so they
cannot be removed without going to non-functional graph construction.

This is achieved by a Delay object type which is enhanced to avoid
retaining any closures or other objects the Scala compiler creates to
implement by-name/lazy evaluation.

OOLAG now uses regular lazy val again. Got rid of lvCache.

The lvCache strategy (which was to make objects smaller by keeping LVs
for objects in a centralized hash-table cache) was causing a
perceived massive number of schema-compiler objects and this made
figuring out the memory-leak (holding onto the compile-time objects)
too hard.

This change set also removes the extension functions and associated tests:
    dfdlx:repTypeValue
    dfdlx:logicalTypeValue
    dfdlx:outputTypeCalcNextSibling
which is part of DAFFODIL-2273

Made diagnosticDebugName more robust.

A large number of stack-overflow issues arise from a diagnostic
message needing the name, but computing the name somehow fails inside
a more complex method that itself is trying to issue a diagnostic.

Now we always catch these nests, which will help in some cases.
DSOM objects construct their diagnostic debug names when they are created.

Eliminated memory leak of holding onto all the compiler objects.

The culprit was the validator. Internally it caches the error handlers, and
all objects reachable from them. Hence, one must avoid making
mixing ErrorHandler into compiler objects because then they
and everything reachable from them cannot be reclaimed.

Some errors were being suppressed by throws of the
final fatal SAXParseError before they could be accumulated.

The logic for dealing with these errors/warnings is pretty convoluted.
I created DAFFODIL-2541 for cleaning up this logic.

Remove @TransientParam from SchemaFileLocation

Add forcing of most Delay objects in daffodil-core.runtime1 package
using requiredEvaluations calls.

Fixed that invalid DFDL schema was not being treated as a fatal error that
prevents Daffodil from proceeding to compile the schema.

Validation detects more errors now which must have been erroneously
dropped before. One is, e.g., that xs:import statements with no
namespace require both the source and target schemas to have a
namespace. Another is it detects UPA errors it wasn't detecting
before. 2 tests had ambiguity.

Tests that had these errors and numerous other tests had to be
fixed because errors in their DFDL schemas are now detected preventing
them from working as before.

Need for serialization to force things is gone. Any Delay objects that
have not been forced when serialization happens cause an abort.

The way things force delays now, is that things that use delays all
define an initialize method, which must be called to insure all
delays are forced.

The forcing is done not by explicitly forcing the delay objects
themselves, but by initializing the the cyclic objects.

This insures we're not in a circular situation because
it explicitly demands the creation of the objects be completed first
before requesting the delays are evaluatable.

NodeInfo object is now self-initializing.

Daffodil's schema compiler now has coarse "passes"

The first pass is validating the DFDL schema with XML Schema validation.

Next is the initialize method that runs when DSOM objects are constructed.

Next is constructing the DSOM tree/directed-graph. This is driven
by the SchemaSet.allSchemaComponents calculation.

Next pass is evaluating the requiredEvaluations, which are activated
only for the components created in SchemaSet.allSchemaComponents.

Next pass is checkUnusedProperties

Last pass is code-generation (or parser/unparser object creation)
and serialization.

Unified component traversal algorithms.

Improve commentary around initialize() protocol for DSOM objects.

Remove Delay of minimizedScope for ERD.

Other small changes to comments and to AV.dfdl.xsd to make it spit
out fewer warnings.

Test is TestRefMap.testUnusedGroup1 simulates failure that was only occuring
in the VMF schema (which is not public)

The tests characterize the situations where checkAllTopLevels
interacts with errors where global elements have up-and-out paths in
them.

Fixes to Namespaces tests which were failing because errors were going
undetected before, and had to be split out to use separate schemas to
isolate the error situations.

DAFFODIL-2326, DAFFODIL-1879, DAFFODIL-2273, DAFFODIL-2526
119 files changed
tree: 95dae5e7ab756fe9f668af43bfd87803bb8346cd
  1. .github/
  2. containers/
  3. daffodil-cli/
  4. daffodil-core/
  5. daffodil-io/
  6. daffodil-japi/
  7. daffodil-lib/
  8. daffodil-macro-lib/
  9. daffodil-propgen/
  10. daffodil-runtime1/
  11. daffodil-runtime1-unparser/
  12. daffodil-runtime2/
  13. daffodil-sapi/
  14. daffodil-schematron/
  15. daffodil-tdml-lib/
  16. daffodil-tdml-processor/
  17. daffodil-test/
  18. daffodil-test-ibm1/
  19. daffodil-udf/
  20. project/
  21. test-stdLayout/
  22. tutorials/
  23. .asf.yaml
  24. .codecov.yml
  25. .gitattributes
  26. .gitignore
  27. .sbtopts
  28. .sonar-project.properties
  29. BUILD.md
  30. build.sbt
  31. KEYS
  32. LICENSE
  33. NOTICE
  34. README.md
README.md

Apache Daffodil is an open-source implementation of the DFDL specification that uses DFDL data descriptions to parse fixed format data into an infoset. This infoset is commonly converted into XML or JSON to enable the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of serializing or “unparsing” data back to the original data format. The DFDL infoset can also be converted directly to/from the data structures carried by data processing frameworks so as to bypass any XML/JSON overheads.

For more information about Daffodil, see https://daffodil.apache.org/.

Build Requirements

  • JDK 8 or higher
  • SBT 0.13.8 or higher
  • C compiler C99 or higher
  • Mini-XML Version 3.0 or higher

See BUILD.md for more details.

Getting Started

SBT is the officially supported tool to build Daffodil. Below are some of the more commonly used commands for Daffodil development.

Compile

Compile source code:

sbt compile

Tests

Run unit tests:

sbt test

Run command line interface tests:

sbt IntegrationTest/test

Command Line Interface

Build the command line interface (Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/; see the Command Line Interface documentation for details on their usage):

sbt daffodil-cli/stage

License Check

Run Apache RAT (license audit report in target/rat.txt and error if any unapproved licenses are found):

sbt ratCheck

Test Coverage Report

Run sbt-scoverage (report in target/scala-ver/scoverage-report/):

sbt clean coverage test IntegrationTest/test
sbt coverageAggregate

Getting Help

You can ask questions on the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. You can report bugs via the Daffodil JIRA.

License

Apache Daffodil is licensed under the Apache License, v2.0.