Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Currently, if the primitive type is an integer then text number parsing
disallows parsing decimal points, even if the pattern contains a decimal
point. Instead, when parsing integers, we should allow decimals as long
the fractional part is zero. And when unparsing, we should unparse a
decimal point with a zero fractional part according to the pattern.

This changes the behavior so integer parsing uses the same DecimalFormat
configuration as non-integer parsing (i.e. decimals are allowed), but we
throw a parse error if the fractional part that was parsed is non-zero.
This also means that unparsing integers now outputs decimal points
according to the pattern.

Additionally, if textNumberCheckPolicy is strict, we enable ICU
setDecimalPatternMatchRequired to true so that we allow or disallow
decimal points in the data depending on if the pattern does or does not
have a decimal point. Note that lax parsing always allows decimal points
regardless of the pattern. For this reason, we now always require the
grouping/decimal separator DFDL properties in lax mode.

One bug was discovered in ICU (ICU-22303) where if we require the
decimal point due to strict mode enabled, then ICU never parses the
infinity/NaN representation. A workaround is added to manually check for
these representations until this bug is fixed. ICU unit tests are also
added which should fail if ICU fixes this bug so we can remove this
workaround.

Make sure we always specify infinity and NaN representations from the
DFDL schema for all primitives, not just for xs:double/xs:float. There
is no way to disable infinity/NaN ICU parsing, so when if we do not
specify these values ICU just uses the locale values, which could lead
to unwanted locale specific behavior. Related, this modifies NodeInfo
types so that fromNumber fails for types that do not support
infinity/NaN (i.e. everything except Double/Float) and creates a parse
error.

Modifies virtual decimal logic to ensure we handle cases for numbers
that do not fit in a Long (should work) or contain decimal points
(should be a parse error).

Tests are updated so if they want to differentiate between int and
decimal depending on if a decimal exists in the data, then they must
specify a pattern with or without a decimal and enable strict mode--lax
mode allows a decimal regardless of type so cannot differentiate the
types.

DAFFODIL-2158
15 files changed
tree: 70ea2e30819e376c35531250eabfdffde1ef8fef
  1. .github/
  2. containers/
  3. daffodil-cli/
  4. daffodil-codegen-c/
  5. daffodil-core/
  6. daffodil-io/
  7. daffodil-japi/
  8. daffodil-lib/
  9. daffodil-macro-lib/
  10. daffodil-propgen/
  11. daffodil-runtime1/
  12. daffodil-runtime1-layers/
  13. daffodil-runtime1-unparser/
  14. daffodil-sapi/
  15. daffodil-schematron/
  16. daffodil-slf4j-logger/
  17. daffodil-tdml-lib/
  18. daffodil-tdml-processor/
  19. daffodil-test/
  20. daffodil-test-ibm1/
  21. daffodil-udf/
  22. project/
  23. scripts/
  24. test-stdLayout/
  25. tutorials/
  26. .asf.yaml
  27. .codecov.yml
  28. .gitattributes
  29. .gitignore
  30. .sbtopts
  31. .scalafmt.conf
  32. .sonar-project.properties
  33. BUILD.md
  34. build.sbt
  35. DEVELOP.md
  36. KEYS
  37. LICENSE
  38. NOTICE
  39. README.md
README.md

Apache Daffodil is an open-source implementation of the DFDL specification that uses DFDL data descriptions to parse fixed format data into an infoset. This infoset is commonly converted into XML or JSON to enable the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of serializing or “unparsing” data back to the original data format. The DFDL infoset can also be converted directly to/from the data structures carried by data processing frameworks so as to bypass any XML/JSON overheads.

For more information about Daffodil, see https://daffodil.apache.org/.

Build Requirements

  • Java 8 or higher
  • sbt 0.13.8 or higher
  • C compiler C99 or higher
  • Mini-XML Version 3.0 or higher

See BUILD.md for more details and DEVELOP.md for a developer guide.

Getting Started

sbt is the officially supported tool to build Daffodil. Below are some of the more commonly used commands for Daffodil development.

Compile

Compile source code:

sbt compile

Tests

Run unit tests:

sbt test

Run command line interface tests:

sbt IntegrationTest/test

Command Line Interface

Build the command line interface (Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/; see the Command Line Interface documentation for details on their usage):

sbt daffodil-cli/stage

License Check

Run Apache RAT (license audit report in target/rat.txt and error if any unapproved licenses are found):

sbt ratCheck

Test Coverage Report

Run sbt-scoverage (report in target/scala-ver/scoverage-report/):

sbt clean coverage test IntegrationTest/test
sbt coverageAggregate

Getting Help

You can ask questions on the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. You can report bugs via the Daffodil JIRA.

License

Apache Daffodil is licensed under the Apache License, v2.0.