Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's

The UStateForSuspensions and DirectOrBufferedDataOutputStream classes
have members that effectively create linked lists. In each of these
cases, we unknowingly hold onto the head of these linked lists, which
prevents garbage collection of all UStateForSuspensions and
DirectOrBufferedDataOutputStream instances. This means we essentially
hold on to all unparse state, which quickly leads to out of memory
errors for large format that require many suspensions.

- The first issue is the "prior" member of UStateMain/UStateForSuspensions.
  This member is set so that each UState points to the previous
  UStateForSuspension that has been created, essentially creating a
  linked list of all UStateForSuspensions, with the head in UStateMain.
  This prevents all UStateForSuspensions from being garbage collected,
  as well all the state they point to (it's a lot).

  Fortunately, this member isn't used anywhere anymore. Presumably it
  was once used for debugging suspensions, but is no longer used or
  needed. So we can simply remove this member so these
  UStateForSuspensions can be garbage collected once the Suspensions
  that use them are finished and garbage collected.

- The second issue is related to the "following" member in
  DirectOrBufferedDataOutputStream's. This member is used too keep track
  of the buffered DOS that follows this DOS (and iteratively, all
  following DOS's). As the Direct DOS is finished, we make the following
  DOS direct update pointers correctly. However, we create the very
  first direct DOS in the "unparse" function, which means it lives on
  the stack and cannot be garbage collected until unparse finished. And
  because this DOS iteratively points to all following DOS's via the
  "following" member, it means we can never free any DOS's (and all the
  buffered data associated with those DOS's) until the end of unparse.

  The solution in this case is to not create the initial direct DOS in
  the unparse function on the stack, but instead to create it as part of
  the UState creation when we initialize the "dataOutputStream" var.
  This way there is no pointer to the initial DOS except for those held
  in UState or Suspensions. As the UState mutates or Suspensions
  resolve, we will complete lose a reference to earlier DOS's and they
  can be garbage collected.

Fixing these two issues allows unparsing very large infosets that
require buffering, without running into out of memory errors.

DAFFODIL-2468
2 files changed
tree: b78ab44c7bd804f9e189c5209f65eecc4083a7d3
  1. .github/
  2. containers/
  3. daffodil-cli/
  4. daffodil-core/
  5. daffodil-io/
  6. daffodil-japi/
  7. daffodil-lib/
  8. daffodil-macro-lib/
  9. daffodil-propgen/
  10. daffodil-runtime1/
  11. daffodil-runtime1-unparser/
  12. daffodil-sapi/
  13. daffodil-schematron/
  14. daffodil-tdml-lib/
  15. daffodil-tdml-processor/
  16. daffodil-test/
  17. daffodil-test-ibm1/
  18. daffodil-udf/
  19. project/
  20. test-stdLayout/
  21. tutorials/
  22. .codecov.yml
  23. .gitattributes
  24. .gitignore
  25. .sbtopts
  26. build.sbt
  27. DISCLAIMER
  28. KEYS
  29. LICENSE
  30. NOTICE
  31. README.md
  32. sonar-project.properties
README.md

Apache Daffodil (incubating) is the open source implementation of the Data Format Description Language (DFDL), a specification created by the Open Grid Forum. DFDL is capable of describing many data formats, including textual and binary, commercial record-oriented, scientific and numeric, modern and legacy, and many industry standards. It leverages XML technology and concepts, using a subset of W3C XML schema type system and annotations to describe such data. Daffodil uses this description to parse data into an infoset represented as XML or JSON, easily capable of ingestion, validation, and transformation.

For more information about Daffodil, see https://daffodil.apache.org/.

Build Requirements

  • JDK 8 or higher
  • SBT 0.13.8 or higher

Getting Started

SBT is the officially supported tool to build Daffodil, run all tests, create packages, and more. Below are some of the more common commands used for Daffodil development.

Compile

$ sbt compile

Tests

Run all unit tests:

$ sbt test 

Run all command line interface tests:

$ sbt it:test

Command Line Interface

Create Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/. See the Command Line Interface documentation for details on its usage:

$ sbt daffodil-cli/stage

License Check

Generate an Apache RAT license check report located in target/rat.txt and error if any unapproved licenses are found:

$ sbt ratCheck

Test Coverage Report

Generate an sbt-scoverage test coverage report located in target/scala-ver/scoverage-report/:

$ sbt clean coverage test it:test
$ sbt coverageAggregate

Getting Help

For questions, we can be reached at the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. Bugs can be reported via the Daffodil JIRA.

License

Apache Daffodil is licensed under the Apache License, v2.0.

Disclaimer

Apache Daffodil is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.