commit | 0a390f67a28f1f65b84b52583917beda34de1a88 | [log] [tgz] |
---|---|---|
author | Steve Lawrence <slawrence@apache.org> | Tue Feb 09 07:51:00 2021 -0500 |
committer | Steve Lawrence <stephen.d.lawrence@gmail.com> | Tue Feb 09 11:59:32 2021 -0500 |
tree | b78ab44c7bd804f9e189c5209f65eecc4083a7d3 | |
parent | 4674dee30716f1b8147e11fb47838872461bf143 [diff] |
Allow garbage collection of UStateForSuspension's and DirectOrBufferedDataOutputStream's The UStateForSuspensions and DirectOrBufferedDataOutputStream classes have members that effectively create linked lists. In each of these cases, we unknowingly hold onto the head of these linked lists, which prevents garbage collection of all UStateForSuspensions and DirectOrBufferedDataOutputStream instances. This means we essentially hold on to all unparse state, which quickly leads to out of memory errors for large format that require many suspensions. - The first issue is the "prior" member of UStateMain/UStateForSuspensions. This member is set so that each UState points to the previous UStateForSuspension that has been created, essentially creating a linked list of all UStateForSuspensions, with the head in UStateMain. This prevents all UStateForSuspensions from being garbage collected, as well all the state they point to (it's a lot). Fortunately, this member isn't used anywhere anymore. Presumably it was once used for debugging suspensions, but is no longer used or needed. So we can simply remove this member so these UStateForSuspensions can be garbage collected once the Suspensions that use them are finished and garbage collected. - The second issue is related to the "following" member in DirectOrBufferedDataOutputStream's. This member is used too keep track of the buffered DOS that follows this DOS (and iteratively, all following DOS's). As the Direct DOS is finished, we make the following DOS direct update pointers correctly. However, we create the very first direct DOS in the "unparse" function, which means it lives on the stack and cannot be garbage collected until unparse finished. And because this DOS iteratively points to all following DOS's via the "following" member, it means we can never free any DOS's (and all the buffered data associated with those DOS's) until the end of unparse. The solution in this case is to not create the initial direct DOS in the unparse function on the stack, but instead to create it as part of the UState creation when we initialize the "dataOutputStream" var. This way there is no pointer to the initial DOS except for those held in UState or Suspensions. As the UState mutates or Suspensions resolve, we will complete lose a reference to earlier DOS's and they can be garbage collected. Fixing these two issues allows unparsing very large infosets that require buffering, without running into out of memory errors. DAFFODIL-2468
Apache Daffodil (incubating) is the open source implementation of the Data Format Description Language (DFDL), a specification created by the Open Grid Forum. DFDL is capable of describing many data formats, including textual and binary, commercial record-oriented, scientific and numeric, modern and legacy, and many industry standards. It leverages XML technology and concepts, using a subset of W3C XML schema type system and annotations to describe such data. Daffodil uses this description to parse data into an infoset represented as XML or JSON, easily capable of ingestion, validation, and transformation.
For more information about Daffodil, see https://daffodil.apache.org/.
SBT is the officially supported tool to build Daffodil, run all tests, create packages, and more. Below are some of the more common commands used for Daffodil development.
$ sbt compile
Run all unit tests:
$ sbt test
Run all command line interface tests:
$ sbt it:test
Create Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/
. See the Command Line Interface documentation for details on its usage:
$ sbt daffodil-cli/stage
Generate an Apache RAT license check report located in target/rat.txt
and error if any unapproved licenses are found:
$ sbt ratCheck
Generate an sbt-scoverage test coverage report located in target/scala-ver/scoverage-report/
:
$ sbt clean coverage test it:test $ sbt coverageAggregate
For questions, we can be reached at the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. Bugs can be reported via the Daffodil JIRA.
Apache Daffodil is licensed under the Apache License, v2.0.
Apache Daffodil is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.