commit | eb603e7f4a342e1a63d1e07f59c714cf531724bc | [log] [tgz] |
---|---|---|
author | Michael Beckerle <mbeckerle@tresys.com> | Tue Jun 29 18:19:34 2021 -0400 |
committer | Mike Beckerle <mbeckerle@tresys.com> | Wed Jul 28 15:16:41 2021 -0400 |
tree | 95dae5e7ab756fe9f668af43bfd87803bb8346cd | |
parent | 032347dfcfce47573f81b3eaac3c782d0b3b7d9f [diff] |
RuntimeData objects simplified, compile-memory-leak fixed. All transient params for runtime data objects are gone. All by-name passing of args to runtime structures is gone. Compile-time objects no longer retained after compilation is over. Cyclic graph constructions are more explicit. Only a select few TransientParam are left, and they have been converted to use the Delay type instead of using by-name passing. They are all ones that actually involve circular relationships, so they cannot be removed without going to non-functional graph construction. This is achieved by a Delay object type which is enhanced to avoid retaining any closures or other objects the Scala compiler creates to implement by-name/lazy evaluation. OOLAG now uses regular lazy val again. Got rid of lvCache. The lvCache strategy (which was to make objects smaller by keeping LVs for objects in a centralized hash-table cache) was causing a perceived massive number of schema-compiler objects and this made figuring out the memory-leak (holding onto the compile-time objects) too hard. This change set also removes the extension functions and associated tests: dfdlx:repTypeValue dfdlx:logicalTypeValue dfdlx:outputTypeCalcNextSibling which is part of DAFFODIL-2273 Made diagnosticDebugName more robust. A large number of stack-overflow issues arise from a diagnostic message needing the name, but computing the name somehow fails inside a more complex method that itself is trying to issue a diagnostic. Now we always catch these nests, which will help in some cases. DSOM objects construct their diagnostic debug names when they are created. Eliminated memory leak of holding onto all the compiler objects. The culprit was the validator. Internally it caches the error handlers, and all objects reachable from them. Hence, one must avoid making mixing ErrorHandler into compiler objects because then they and everything reachable from them cannot be reclaimed. Some errors were being suppressed by throws of the final fatal SAXParseError before they could be accumulated. The logic for dealing with these errors/warnings is pretty convoluted. I created DAFFODIL-2541 for cleaning up this logic. Remove @TransientParam from SchemaFileLocation Add forcing of most Delay objects in daffodil-core.runtime1 package using requiredEvaluations calls. Fixed that invalid DFDL schema was not being treated as a fatal error that prevents Daffodil from proceeding to compile the schema. Validation detects more errors now which must have been erroneously dropped before. One is, e.g., that xs:import statements with no namespace require both the source and target schemas to have a namespace. Another is it detects UPA errors it wasn't detecting before. 2 tests had ambiguity. Tests that had these errors and numerous other tests had to be fixed because errors in their DFDL schemas are now detected preventing them from working as before. Need for serialization to force things is gone. Any Delay objects that have not been forced when serialization happens cause an abort. The way things force delays now, is that things that use delays all define an initialize method, which must be called to insure all delays are forced. The forcing is done not by explicitly forcing the delay objects themselves, but by initializing the the cyclic objects. This insures we're not in a circular situation because it explicitly demands the creation of the objects be completed first before requesting the delays are evaluatable. NodeInfo object is now self-initializing. Daffodil's schema compiler now has coarse "passes" The first pass is validating the DFDL schema with XML Schema validation. Next is the initialize method that runs when DSOM objects are constructed. Next is constructing the DSOM tree/directed-graph. This is driven by the SchemaSet.allSchemaComponents calculation. Next pass is evaluating the requiredEvaluations, which are activated only for the components created in SchemaSet.allSchemaComponents. Next pass is checkUnusedProperties Last pass is code-generation (or parser/unparser object creation) and serialization. Unified component traversal algorithms. Improve commentary around initialize() protocol for DSOM objects. Remove Delay of minimizedScope for ERD. Other small changes to comments and to AV.dfdl.xsd to make it spit out fewer warnings. Test is TestRefMap.testUnusedGroup1 simulates failure that was only occuring in the VMF schema (which is not public) The tests characterize the situations where checkAllTopLevels interacts with errors where global elements have up-and-out paths in them. Fixes to Namespaces tests which were failing because errors were going undetected before, and had to be split out to use separate schemas to isolate the error situations. DAFFODIL-2326, DAFFODIL-1879, DAFFODIL-2273, DAFFODIL-2526
Apache Daffodil is an open-source implementation of the DFDL specification that uses DFDL data descriptions to parse fixed format data into an infoset. This infoset is commonly converted into XML or JSON to enable the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of serializing or “unparsing” data back to the original data format. The DFDL infoset can also be converted directly to/from the data structures carried by data processing frameworks so as to bypass any XML/JSON overheads.
For more information about Daffodil, see https://daffodil.apache.org/.
See BUILD.md for more details.
SBT is the officially supported tool to build Daffodil. Below are some of the more commonly used commands for Daffodil development.
Compile source code:
sbt compile
Run unit tests:
sbt test
Run command line interface tests:
sbt IntegrationTest/test
Build the command line interface (Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/
; see the Command Line Interface documentation for details on their usage):
sbt daffodil-cli/stage
Run Apache RAT (license audit report in target/rat.txt
and error if any unapproved licenses are found):
sbt ratCheck
Run sbt-scoverage (report in target/scala-ver/scoverage-report/
):
sbt clean coverage test IntegrationTest/test sbt coverageAggregate
You can ask questions on the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. You can report bugs via the Daffodil JIRA.
Apache Daffodil is licensed under the Apache License, v2.0.