released: true apache: true title: 2.5.0 date: 2020-01-12 summary: > User defined functions, BLOBs, unordered sequences, 2GB+ files, preparation for decreasing schema compile time
artifact-root: “https://archive.apache.org/dist/incubator/daffodil/2.5.0/” checksum-root: “https://archive.apache.org/dist/incubator/daffodil/2.5.0/” key-file: “https://downloads.apache.org/daffodil/KEYS”
source-dist: - “apache-daffodil-2.5.0-incubating-src.zip”
binary-dist: - “apache-daffodil-2.5.0-incubating-bin.tgz” - “apache-daffodil-2.5.0-incubating-bin.zip” - “apache-daffodil-2.5.0-incubating-bin.msi” - “apache-daffodil-2.5.0.incubating-1.noarch.rpm”
scala-version: 2.12
A new extension is added to support custom DFDL expression functions written in Java or Scala. To add new functions, the UserDefinedFunctionProvider and UserDefinedFunction interfaces must be implemented, compiled into a jar, and added to the classpath. Once defined, the function can then be called just like a normal DFDL expression function, such as pre:myUserDefinedFunction(args)
. For more information on usage, see the User Defined Function page.
A new extension is added to support Binary Large Objects. Setting an element type to xs:anyURI
and setting the DFDL property dfdlx:objectKind="bytes"
will cause Daffodil to write the bytes associated with the element to a file rather than to the infoset. The URI to the file is stored in the infoset. The function setBlobAttributes
is added to the InfosetOutputter to support changing the directory and name of these BLOB files.
Add support for unordered sequences via the dfdl:sequenceKind="unordered"
property. See section 14.3 of the DFDL specification for more information on the behavior of unordered sequences.
During parsing, Daffodil stores the data stream in a cache to allow for backtracking. However, when streaming very large files that might need to backtrack long distances it is possible that the cache could outgrow the size of the heap, resulting in out of memory errors. In the majority of these cases, although possible, a parse would never actually need to backtrack that far, so we hold on to memory unnecessarily. New parameters are added to the input stream to support the ability to limit the maximum size of this cache. Daffodil will throw away old data when this limit is reached, and will only error if trying to backtrack to the discarded data.
During unparsing, in some circumstances it was possible for Daffodil to cache unparsed data that could reach a 2GB+ limit and cause an out of memory error. To prevent this error, each unparse cache will store a maximum amount in memory (defined by the maxByteArrayOutputStreamBufferSizeInBytes
tunable). When this maximum value is reached, Daffodil will switch to writing to a temporary file (defined by the tempFilePath
tunable). Eventually, the contents of the file will be written to the unparse data stream in chunks (defined by the outputStreamChunkSizeInBytes
tunable).
With these changes and BLOB support, handling large files, including those greater than 2GB is possible.
Improvements were made to the TDML runner, including API updates to support use in Java, and improved output on failures.
Multiple infrastructure changes were made, including support for GitHub Actions continuous integration, Windows CI tests, a new container based system for creating releases, and website updates.
Many non-functional changes were made, including improved internal type-safety, improved separation of runtime objects to support different runtimes in the future, initial changes to improve schema compilation speed, and other miscellaneous improvements.
The following changes have been made which may affect compatibility with past releases: