| commit | f8a786734237cfb6469e76dcd3c1a0d015ba114c | [log] [tgz] |
|---|---|---|
| author | Steve Lawrence <slawrence@apache.org> | Wed Mar 08 11:40:28 2023 -0500 |
| committer | Steve Lawrence <slawrence@apache.org> | Mon Mar 27 13:38:36 2023 -0400 |
| tree | 70ea2e30819e376c35531250eabfdffde1ef8fef | |
| parent | 6590127afe09aaa5a8bcffa4296fa6ec852d0ebe [diff] |
Ensure all primitives use textNumberPattern and infinfity/NaN correctly Currently, if the primitive type is an integer then text number parsing disallows parsing decimal points, even if the pattern contains a decimal point. Instead, when parsing integers, we should allow decimals as long the fractional part is zero. And when unparsing, we should unparse a decimal point with a zero fractional part according to the pattern. This changes the behavior so integer parsing uses the same DecimalFormat configuration as non-integer parsing (i.e. decimals are allowed), but we throw a parse error if the fractional part that was parsed is non-zero. This also means that unparsing integers now outputs decimal points according to the pattern. Additionally, if textNumberCheckPolicy is strict, we enable ICU setDecimalPatternMatchRequired to true so that we allow or disallow decimal points in the data depending on if the pattern does or does not have a decimal point. Note that lax parsing always allows decimal points regardless of the pattern. For this reason, we now always require the grouping/decimal separator DFDL properties in lax mode. One bug was discovered in ICU (ICU-22303) where if we require the decimal point due to strict mode enabled, then ICU never parses the infinity/NaN representation. A workaround is added to manually check for these representations until this bug is fixed. ICU unit tests are also added which should fail if ICU fixes this bug so we can remove this workaround. Make sure we always specify infinity and NaN representations from the DFDL schema for all primitives, not just for xs:double/xs:float. There is no way to disable infinity/NaN ICU parsing, so when if we do not specify these values ICU just uses the locale values, which could lead to unwanted locale specific behavior. Related, this modifies NodeInfo types so that fromNumber fails for types that do not support infinity/NaN (i.e. everything except Double/Float) and creates a parse error. Modifies virtual decimal logic to ensure we handle cases for numbers that do not fit in a Long (should work) or contain decimal points (should be a parse error). Tests are updated so if they want to differentiate between int and decimal depending on if a decimal exists in the data, then they must specify a pattern with or without a decimal and enable strict mode--lax mode allows a decimal regardless of type so cannot differentiate the types. DAFFODIL-2158
Apache Daffodil is an open-source implementation of the DFDL specification that uses DFDL data descriptions to parse fixed format data into an infoset. This infoset is commonly converted into XML or JSON to enable the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of serializing or “unparsing” data back to the original data format. The DFDL infoset can also be converted directly to/from the data structures carried by data processing frameworks so as to bypass any XML/JSON overheads.
For more information about Daffodil, see https://daffodil.apache.org/.
See BUILD.md for more details and DEVELOP.md for a developer guide.
sbt is the officially supported tool to build Daffodil. Below are some of the more commonly used commands for Daffodil development.
Compile source code:
sbt compile
Run unit tests:
sbt test
Run command line interface tests:
sbt IntegrationTest/test
Build the command line interface (Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/; see the Command Line Interface documentation for details on their usage):
sbt daffodil-cli/stage
Run Apache RAT (license audit report in target/rat.txt and error if any unapproved licenses are found):
sbt ratCheck
Run sbt-scoverage (report in target/scala-ver/scoverage-report/):
sbt clean coverage test IntegrationTest/test sbt coverageAggregate
You can ask questions on the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. You can report bugs via the Daffodil JIRA.
Apache Daffodil is licensed under the Apache License, v2.0.