release: final apache: true title: 2.2.0 date: 2018-09-04 summary: > Message-Streaming API, numerous bug fixes, and features to support additional DFDL schemas.

source-dist: - “apache-daffodil-2.2.0-incubating-src.zip”

binary-dist: - “apache-daffodil-2.2.0-incubating-bin.tgz” - “apache-daffodil-2.2.0-incubating-bin.zip” - “apache-daffodil-2.2.0.incubating-1.noarch.rpm”

scala-version: 2.12


Daffodil 2.2.0 is the second release of Daffodil as an Apache incubator project. This release includes numerous bug fixes and DFDL feature additions to support more DFDL schemas.

New Features

Layering

A experimental DFDL language feature known as layering has been added to Daffodil. This feature has been discussed for future inclusion in the DFDL standard by the DFDL working group. The syntax of the properties is subject to some change in the future. The layering feature is described at Data Layering.

  • {% jira 1734 %} base64 encoding and other preprocessing
  • {% jira 1805 %} Support AIS format ascii armoring and 6-bit encoding
  • {% jira 1888 %} byte swap layer capability - was: byteOrder bigEndian with bitOrder leastSignificantBitFirst combination
  • {% jira 1933 %} Performance regression with new Base64/Layering changes
  • {% jira 1934 %} Inability to read data following a layer
  • {% jira 1935 %} Debugger/trace broken with new layering

Message Streaming API

Daffodil's parser API has been enhanced to allow parsing of unbounded streams of messages, symmetric with the Daffodil unparser. The API enables calling the parse method repeatedly on the same data stream so as to parse one data element (typically a message, but could be any DFDL-described element) at a time from the stream. In prior releases, Daffodil had only a parse API that expected data for the parse to consume the entire data stream.

The Command Line Interface (CLI) now has options (--stream) for driving the parser in streaming mode.

Documentation for this API is in the SAPI - Scala API package, or for Java users in the JAPI - Java API package overview.

  • {% jira 1799 %} Enable data streaming in the CLI
  • {% jira 1967 %} Support --stream option for CLI unparse subcommands
  • {% jira 1065 %} parser: API needs to enable repeated calls to parser - not treat unconsumed data as ‘left over’
  • {% jira 1985 %} beforeState in sequences prevents streaming behavior in I/O layer
  • {% jira 1987 %} Buckets not released with small bucket sizes

Scala 2.12 Support and Library Dependencies Updated

Daffodil now supports both Scala 2.11 and Scala 2.12. Most testing is now done on Scala 2.12 with only occasional regression checking done on Scala 2.11. Support for Scala 2.11 will eventually be elminated in a future release.

In addition, Daffodil depends on a number of other libraries. These have been updated to using the latest available versions of those libraries.

  • {% jira 1652 %} Scala 2.12 Upgrade
  • {% jira 1973 %} Dependencies on old libraries should be updated
  • {% jira 1430 %} macroLib should not be needed after scala compilation, but seems to be required

Zoned Decimal Text Numbers and Packed Decimal Calendars

Support for the dfdl:textNumberRep of zoned has been added along with the related DFDL properties that specify zoned-number format.

Also, for the dfdl:binaryCalendarRep property, in addition to standard, values of bcd, packed, and ibm4690Packed to support Binary Coded Decimal, IBM 390 Packed Decimal, and IBM 4690 Packed Decimal, respectively are supported.

  • {% jira 1738 %} Implement textNumberRep=‘zoned’
  • {% jira 1882 %} Implement support for the various formats of binaryCalendarRep
  • {% jira 1941 %} Date, Time, DateTime - Binary - binaryCalendarRep=‘bcd’
  • {% jira 1942 %} Date, Time, DateTime - Binary - binaryCalendarRep=‘packed’
  • {% jira 1943 %} Date, Time, DateTime - Binary - binaryCalendarRep=‘ibm4690Packed’
  • {% jira 99 %} Date, Time, DateTime - Binary - binaryCalendarRep=‘binary’

Deprecation/Compatability

As of 2.2.0, the following changes have been made which affect compatibility with past releases:

DFDLGeneralFormat.dfdl.xsd changes

This DFDL schema is used as a starting point by many schemas. Some changes to it can lead to incompatibilities.

Property dfdl:calendarTimeZone - was "UTC" now "" (empty string, meaning unknown time zone). This change often results in Infoset data that does not have the UTC time zone suffix +00:00 appended to it unless a time-zone was specified in the parsed data. In prior releases this suffix would have generally been appended, which was surprising, and deemed incorrect by users.

  • {% jira 1929 %} Parse of date results in date+time-zone specifier - should be just date.
  • {% jira 1930 %} DFDLGeneralFormat defines calendarTimeZone as UTC - should define as ""

Property dfdl:occursCountKind - was parsed now implicit. With implicit the number of element occurrences that will be parsed ends when maxOccurs is reached. This is generally desirable, especially for optional elements (minOccurs="0, maxOccurs="1) where users found it unintuitive that such an element could end up as an array of more than one occurrance. In prior releases the dfdl:occursCountKind was parsed, which instructs Daffodil to continue parsing as many instances as it can find, and to stop accumulating them only when it is unable to successfully parse another.

  • {% jira 1948 %} DFDLGeneralFormat has dfdl:occursCountKind ‘parsed’. Should be ‘implicit’

Property dfdl:textNumberRoundingMode - was roundUnnecessary now roundHalfEven. This change should be compatible, and was necessary due to an update of the underlying ICU libraries used by Daffodil. The behavior of that library with respect to rounding modes was fixed, allowing us to change the rounding mode to the more reasonable roundHalfEven behavior.

The maxOccursBound limit

Prior Daffodil releases had a very small limit on the number of repeating occurrences an element could have, 1024, which was not being checked in some cases. It is now checked in all cases; however, with this checking it was discovered that many schemas break on this small limit. The limit was changed to Int.MaxValue (maximum positive value of a 32-bit integer), and must be tuned downward by users who wish to catch situations where there are an excessive number of repeats much larger than is reasonable for the data described by their DFDL schema. In addition, rather than a Parse Error, which causes backtracking, exceeding this tuned value will result in a runtime Schema Definition Error - which halts processing. If desired, in a DFDL Schema, a Parse Error can still be obtained by way of a dfdl:assert statement on the recurring element which tests that the dfdl:occursIndex() is less than the desired maximum bound.

  • {% jira 1519 %} occursCount that is greater than maxOccursBounds causes PE/UE rather than SDE

The suppressSchemaDefinitionWarnings Tunable

Some schemas produce large number of schema definition warnings, and it is desirable to tolerate, and suppress these. The name of the tunable is now suppressSchemaDefinitionWarnings. Prior releases used suppressWarnings which was inconsistent with the API-level access names/symbols for the same concept.

  • {% jira 1980 %} suppressWarnings vs. suppressSchemaDefinitionWarnings in config files

TDML Runner Enhancements

The TDML Runner was enhanced to eliminate some false-positive test situations. Specifically it now requires controlled selections on round-trip testing. The TDML defaultRoundTrip and roundTrip attributes can now take on the values none, onePass, twoPass, or threePass. The values false and true are also accepted for compatibility and correspond to none and onePass respectively. This change reduces the number of false-positive tests which should be failing, but which pass because multiple-passes were used automatically. Please see the Test Data Markup Language (TDML) page.

  • {% jira 1961 %} TDML Runner - enhance round-trip to distinguish simple parse-unparse from multi-trip cases
  • {% jira 1947 %} TDMLRunner - incorrect error on left-over data: TDMLException: Left over data. Consumed 48 bit(s) with 0 bit(s) remaining.

Additional Changes

Infrastructure

  • {% jira 1962 %} TravisCI builds fail with error 137 out of memory
  • {% jira 1938 %} RPM/tar/zips contain jars with different hashes
  • {% jira 1937 %} Make the Apache RAT check more automated
  • {% jira 1936 %} Prepare for Daffodil 2.2.0 development
  • {% jira 1966 %} Excessive memory usage in CLI performance command
  • {% jira 1982 %} Update Eclipse Paths for new library versions.
  • {% jira 1925 %} Remove testWSPStar.dfdl.xsd and json5.dfdl.xsd

Bugs Fixed

  • {% jira 1923 %} Excel export CSV escaping - can't be handled with standard block escape
  • {% jira 1984 %} Separator prefix of terminator - not getting longest match
  • {% jira 1714 %} ULong modulus operator broken
  • {% jira 1736 %} Very large BigInt unparses to zero.
  • {% jira 1964 %} Unparsing ArrayCombinator assertion failure
  • {% jira 1977 %} Misspellings in Java/Scala API doc
  • {% jira 1979 %} UTF8 decoder doesn't handle 3-byte and 4-byte correctly
  • {% jira 785 %} Assert execution timing for sequence is incorrect.
  • {% jira 851 %} textNumberProps: raw byte entity should not be allowed
  • {% jira 931 %} Variable-width charset with ‘replace’ can result in wrong length calculations
  • {% jira 1969 %} Exception on unparse of string with explicit length and truncateSpecifiedLengthString=“yes”
  • {% jira 1970 %} Exception instead of error message - expression variable on wrong level

Other Miscellaneous

  • {% jira 1918 %} Update JSON schema in tests to use Boolean type
  • {% jira 1960 %} RepUnboundedParser incorrectly handling points of uncertainty
  • {% jira 1978 %} Change runtime to use vectors not lists/seq
  • {% jira 991 %} DFDL-Spec: Update spec to reflect change to Schema Definition Warning for alignment ambiguity
  • {% jira 1924 %} DaffodilXMLLoader needs to explicitly specify which instance to use
  • {% jira 1939 %} Errata 5.32 to 5.38 status is not on the Unimplemented Features/Errata page