release: final apache: true title: 2.5.0 date: 2020-01-12 summary: > User defined functions, BLOBs, unordered sequences, 2GB+ files, preparation for decreasing schema compile time

source-dist: - “apache-daffodil-2.5.0-incubating-src.zip”

binary-dist: - “apache-daffodil-2.5.0-incubating-bin.tgz” - “apache-daffodil-2.5.0-incubating-bin.zip” - “apache-daffodil-2.5.0-incubating-bin.msi” - “apache-daffodil-2.5.0.incubating-1.noarch.rpm”

scala-version: 2.12


Extension: User Defined Functions [Proposal]

A new extension is added to support custom DFDL expression functions written in Java or Scala. To add new functions, the UserDefinedFunctionProvider and UserDefinedFunction interfaces must be implemented, compiled into a jar, and added to the classpath. Once defined, the function can then be called just like a normal DFDL expression function, such as pre:myUserDefinedFunction(args). For more information on usage, see the User Defined Function page.

  • {% jira 2186 %} User defined functions for DPath expressions
  • {% jira 2228 %} Javadoc warnings in UDF Provider
  • {% jira 2235 %} Exception when UDF object contains nonSerializable field

Extension: Binary Large Objects [Proposal]

A new extension is added to support Binary Large Objects. Setting an element type to xs:anyURI and setting the DFDL property dfdlx:objectKind="bytes" will cause Daffodil to write the bytes associated with the element to a file rather than to the infoset. The URI to the file is stored in the infoset. The function setBlobAttributes is added to the InfosetOutputter to support changing the directory and name of these BLOB files.

  • {% jira 1735 %} BLOB/CLOB - large object handles

Unordered Sequences

Add support for unordered sequences via the dfdl:sequenceKind="unordered" property. See section 14.3 of the DFDL specification for more information on the behavior of unordered sequences.

  • {% jira 918 %} Unordered sequence cannot have empty content.
  • {% jira 1010 %} Re-enable checks for unordered seq and choice branch violations
  • {% jira 1091 %} NumberFormatException when Unbounded element in unordered sequence
  • {% jira 1100 %} Unordered sequence with dfdl:format value of occursCountKind != “parsed” fails to compile
  • {% jira 1120 %} DPath: fn:exists() Abort: Invariant Broken (unordered sequence)
  • {% jira 1151 %} Test failing due to a slotIndexInParent error. (unordered sequences)
  • {% jira 1159 %} Reimplement sequenceKind unordered feature
  • {% jira 2261 %} Optional first element in unordered sequence fails to unparse

Memory Limitations

During parsing, Daffodil stores the data stream in a cache to allow for backtracking. However, when streaming very large files that might need to backtrack long distances it is possible that the cache could outgrow the size of the heap, resulting in out of memory errors. In the majority of these cases, although possible, a parse would never actually need to backtrack that far, so we hold on to memory unnecessarily. New parameters are added to the input stream to support the ability to limit the maximum size of this cache. Daffodil will throw away old data when this limit is reached, and will only error if trying to backtrack to the discarded data.

During unparsing, in some circumstances it was possible for Daffodil to cache unparsed data that could reach a 2GB+ limit and cause an out of memory error. To prevent this error, each unparse cache will store a maximum amount in memory (defined by the maxByteArrayOutputStreamBufferSizeInBytes tunable). When this maximum value is reached, Daffodil will switch to writing to a temporary file (defined by the tempFilePath tunable). Eventually, the contents of the file will be written to the unparse data stream in chunks (defined by the outputStreamChunkSizeInBytes tunable).

With these changes and BLOB support, handling large files, including those greater than 2GB is possible.

  • {% jira 2194 %} buffered data output stream has a chunk limit of 2GB

TDML

Improvements were made to the TDML runner, including API updates to support use in Java, and improved output on failures.

  • {% jira 2204 %} unparserTestCase should dump hex data not just iso-8859-1 when mixed text/binary TDML doesn't compare
  • {% jira 2209 %} TDML Improperly handles MAC Style (CR) line endings
  • {% jira 2229 %} TDML Runner Class doesn't have access to simple constructors
  • {% jira 2241 %} Java Junit Test TDML not working

Infrastructure

Multiple infrastructure changes were made, including support for GitHub Actions continuous integration, Windows CI tests, a new container based system for creating releases, and website updates.

  • {% jira 498 %} Add automated build for Windows
  • {% jira 2175 %} link to calabash is to nifi page
  • {% jira 2181 %} Prepare for 2.5.0
  • {% jira 2215 %} GitHub Actions not showing failure
  • {% jira 2223 %} Testing that JIRA emails go to commits@
  • {% jira 2225 %} Github Actions builds fails do to missing mklink
  • {% jira 2227 %} Develop a container for release candidates
  • {% jira 2251 %} add newer draft of DFDL spec as html to site for limited/users review
  • {% jira 2253 %} Remove Github Actions Linux dependency step
  • {% jira 2258 %} Prepare for 2.5.0 Release
  • {% jira 2264 %} Add more automation to the release candidate container

Code Refactoring

Many non-functional changes were made, including improved internal type-safety, improved separation of runtime objects to support different runtimes in the future, initial changes to improve schema compilation speed, and other miscellaneous improvements.

  • {% jira 2169 %} Add type safety to DPath variables
  • {% jira 2233 %} refactor to move runtime1-specific calculations out of dsom and grammar
  • {% jira 2242 %} Remove DaffodilTunables object from DPathCompileInfo and RuntimeData objects.
  • {% jira 2244 %} Refactor ResolvesProperties into Scoped and Local variants
  • {% jira 2250 %} remove excess dependency on RuntimeData classes.
  • {% jira 2256 %} Purge scala.math.{BigInt, BigDecimal} from code

Miscellaneous Changes and Bug Fixes

  • {% jira 1034 %} IBM Compatibility - VCard Schemas cause abort
  • {% jira 1526 %} unparser: preserve/set bitPosition so unparse can be repeatedly called to unparse to a stream
  • {% jira 1908 %} Three daffodil-test sbt tests intermittently fail
  • {% jira 2168 %} Provide support for Non-spacing mark/Combining Characters in Debugger
  • {% jira 2173 %} Data dumps should be bit-aware
  • {% jira 2184 %} No line number in dfdl:assert parse error message.
  • {% jira 2185 %} Add more charset encoding for obscure 2, 3, 4, 5 bit encodings
  • {% jira 2188 %} inputTypeCalc sometimes fails due to BigInt vs. JBigInt
  • {% jira 2192 %} Incorrect warning: Neighboring QNames differ only by namespaces
  • {% jira 2197 %} abort: There are no references to this component
  • {% jira 2200 %} xs:import problems with xsd files provided in daffodil-lib
  • {% jira 2207 %} Tests needed to reassure that choices nested in fact work.
  • {% jira 2224 %} Unable to unparse bitmap schema
  • {% jira 2230 %} Various cleanups - non-functional changes
  • {% jira 2259 %} unparse of choices not working properly - eventUnparserMap
  • {% jira 2262 %} occursCountKind ‘expression’ should avoid separator suppression, but does not.
  • {% jira 2263 %} separatorSuppressionPolicy=“never” in nested array causes abort
  • {% jira 2265 %} Update copyright dates to 2020

Deprecation/Compatibility

The following changes have been made which may affect compatibility with past releases:

  • None