Add schema validation code to C generator

Modify C generator to compare floating point and integer numbers with
enumerations and ranges specified in DFDL schemas.  Test validation
code with additional simple type root elements in simple.dfdl.xsd
schema and additional TDML tests in simple_errors.tdml.

Also allow C generator to compare hexBinary elements to values in
enumerations since Daffodil does the same thing and C generator
already compares hexBinary elements to values in fixed attributes.

Allow C generator to ignore dfdl:assert expressions in DFDL schemas
and generate code anyway instead of throwing an exception.

Allow C generator to get schema version from DFDL schemas and put it
into the generated C code to version the generated code as well.

Found out Daffodil's default alignment properties (alignment="1" and
alignmentUnits="bytes") makes Daffodil append extra fill bits to
elements with odd sizes not a multiple of 8 bits.  Because
compatibility between Daffodil and DaffodilC depends on defining both
dfdl:alignment="1" (default) and dfdl:alignmentUnits="bits"
(non-default), define a known good binary data format in one place
(network/format.dfdl.xsd) and include it in rest of
daffodil-codegen-c's test schemas.

Finally, make changes requested by PR review.

DAFFODIL-2853

BUILD.md: Document how to install iwyu and libcriterion-dev (used only
for Daffodil C code generator development and maintenance).

README.md: Document how to check formatting of or reformat Daffodil.
Also document how to put the Daffodil jars in the maven and ivy
caches.

c/files/.clang-format: Update to clang-format 14.  Format declarations
more concisely by removing AlignConsecutiveDeclarations (this is why
some reformatted C files lose some extra whitespace).

c/files/Makefile: Shorten iwyu's maximum line length from 999 to 111.

c/files/**/**.[ch]: Add "auto-maintained by iwyu" comment to headers.

c/files/libcli/cli_errors.[ch]: Rename CLI_ZZZ, ERR_ZZZ, and FIELD_ZZZ
to CLI__NUM_CODES, ERR__NUM_CODES, and FIELD__NO_ARGS.

c/files/libcli/daffodil_getopt.[ch]: Merge
daffodil_parse_cli and daffodil_unparse_cli structs and objects into
daffodil_pu_cli and daffodil_pu.  Skip and ignore -r and -s options
so one can run "c/daffodil parse" with the same options as "daffodil
parse -r root -s schema -o outfile infile" without having to remove
these options.

c/files/libcli/daffodil_main.c: Merge daffodil_parse_cli and
daffodil_unparse_cli structs and objects into daffodil_pu_cli and
daffodil_pu.  Initialize ParserOrUnparserState fields as well as
PState/UState fields.  Make sure any diagnostics will fail unparse as
well as parse if validate mode is on.

c/files/libruntime/errors.[ch]: Add ERR_ENUM_MATCH and
ERR_OUTSIDE_RANGE error messages to diagnose elements not matching any
of their enumerations or having values outside their allowed ranges.
Rename ERR_ZZZ and FIELD_ZZZ to ERR__NUM_CODES and FIELD__NO_ARGS.
Rename ERR_ENUM_MATCH, ERR_FIXED_VALUE, and ERR_OUTSIDE_RANGE to
ERR_RESTR_ENUM, ERR_RESTR_FIXED, and ERR_RESTR_RANGE. Remove
`daffodil_program_version` since we use `daffodil_version` in
daffodil_version.h instead.

c/files/libruntime/infoset.h: Move common PState/UState fields to
ParseOrUnparseState to allow parse and unparse functions to call
common validate functions.

c/files/libruntime/parsers.[ch]: Use ParserOrUnparserState fields as
well as PState/UState fields.  Replace parse_check_bounds and
parse_validate_fixed functions with validate_array_bounds and
validate_fixed_attribute functions in validators.[ch].  Rename
parse_align and parse_fill_bits to parse_align_to and
parse_alignment_bits.

c/files/libruntime/unparsers.[ch]: Use ParserOrUnparserState fields as
well as UState fields.  Replace unparse_check_bounds and
unparse_validate_fixed functions with validate_array_bounds and
validate_fixed_attribute functions in validators.[ch].  Rename
unparse_align and unparse_fill_bits to unparse_align_to and
unparse_alignment_bits.

c/files/libruntime/validators.[ch]: Move common validation functions
here from parsers.[ch] and unparsers.[ch].  Add new float, hexBinary,
int, and universal validation functions to check that elements match
their allowed enumerations (requires passing array of floating point
numbers, hexBinary structs, or integer numbers) and fit within their
allowed ranges.  Validation functions create diagnostics instead of
errors; the CLI or a caller is responsible for printing the
diagnostics and exiting with the appropriate status code.

c/files/tests/bits.c: Use ParserOrUnparserState fields as well as
PState/UState fields.

c/files/tests/extras.c: Regenerate iwyu comments.

c/DaffodilCCodeGenerator.scala:  Accept but ignore dfdl:assert statements
(for now).

c/DaffodilCExamplesGenerator.scala: Generate code from simple's
all-in-one root element in order to show generated code for all simple
types, enums, and ranges.

c/generators/AlignmentFillCodeGenerator.scala: Rename parse_align and
unparse_align to parse_align_to and unparse_align_to.  Use
ParserOrUnparserState fields as well as PState/UState fields.

c/generators/BinaryBooleanCodeGenerator.scala: Use
ParserOrUnparserState fields as well as PState/UState fields.

c/generators/BinaryValueCodeGenerator.scala: Generate C code to
validate enumerations and ranges of primitive elements.  Get raw
enumeration values into a Seq[String].  Avoid unsigned >= 0
comparisons to prevent gcc warnings.  Use ParserOrUnparserState fields
as well as PState/UState fields.  Call correct function to validate
enums depending on element's primType.  Generate extra C
initialization code for hexBinary enumerations to define arrays of
hexBinary structs and pass them to validate_hexbinary_enumeration.

c/generators/CodeGeneratorState.scala: Remove unnecessary immutable
(built-in Set is immutable).  Use ParserOrUnparserState fields as well
as PState/UState fields.  Rename parse_fill_bits and unparse_fill_bits
to parse_alignment_bits and unparse_alignment_bits.  Get actual schema
version from Daffodil and assign its value to `schema_version` in
generated_code.c.  Declare `schema_version` in generated_code.h.  Call
validate_array_bounds instead of parse/unparse_check_bounds.  Make
cStructFieldAccess work correctly for DFDL expressions like
"/rr:ReqstReply/..." used by NFS schemas.

c/generators/HexBinaryCodeGenerator.scala: Use ParserOrUnparserState
fields as well as PState/UState fields.  Call validate_fixed_attribute
instead of parse/unparse_validate_fixed.

examples/**/generated_code.[ch]: Regenerate to show changes in C
generator such as adding new "auto-maintained by iwyu" comment, calls
to renamed functions such as parse_align_to and unparse_align_to,
defining schema_version, using pu fields, and calling validation
functions.

c/data/simple*.dat: Remove (contents now inside simple.tdml).

c/ex_nums.dfdl.xsd: Define schema version to be "1.0.2".  Include
network format instead of defining own binary format.  Adjust
elements' own format properties as needed to match original data file
(explicitness is better than using defaults).

c/infosets/simple*.xml: Remove (contents now inside simple.tdml).

c/nested.dfdl.xsd: Include network format instead of defining own
binary format.

c/network/format.dfdl.xsd: Define network format in only one place,
making sure to use vitally needed alignment properties but keep the
format as minimal as possible to make it easier to include in a wide
variety of schemas.  Change "Network order binary format" comment to
"Network order big endian format" as requested.  Define bitOrder and
byteOrder explicitly in format schema instead of inheriting them from
DFDLGeneralFormat schema.

c/padtest.dfdl.xsd: Include network format instead of defining own
binary format.

c/simple.dfdl.xsd: Define schema version to be "1.0.0".  Include
network format instead of defining own binary format.  Add new
elements to test enumerations and ranges of primitive types.  Use
custom simple types in order to define simple type root elements,
enumerations of simple type root elements, ranges of simple type
elements, and all-in-one root element as compactly as possible.

c/simple.tdml: Make comments clearer how to run tests and include all
data and infosets in test cases instead of using files.

c/simple_errors.tdml: Make comments clearer how to run tests and add
new test cases to validate enumerations and ranges of primitive types.

c/variablelen.dfdl.xsd: Include network format instead of defining own
binary format.

core/dsom/SchemaDocument.scala: Modify SchemaDocument to capture
schema versions from XML schema definitions and provide to codegen-c.

tdml/TestDaffodilC.scala: Add future-proofing test to check test
schema compiles without any warnings (would catch "relative location
deprecated" warning if DFDLGeneralFormat url was still relative).  Fix
inconsistent use of "dp"/"tdp" and "isError"/"isProcessingError".

c/TestSimpleErrors.scala: Call new test cases in simple_errors.tdml.
85 files changed
tree: d0a507d02d5761db65263ccdc5f28a2c88e1c495
  1. .github/
  2. containers/
  3. daffodil-cli/
  4. daffodil-codegen-c/
  5. daffodil-core/
  6. daffodil-io/
  7. daffodil-japi/
  8. daffodil-lib/
  9. daffodil-macro-lib/
  10. daffodil-propgen/
  11. daffodil-runtime1/
  12. daffodil-runtime1-layers/
  13. daffodil-runtime1-unparser/
  14. daffodil-sapi/
  15. daffodil-schematron/
  16. daffodil-slf4j-logger/
  17. daffodil-tdml-lib/
  18. daffodil-tdml-processor/
  19. daffodil-test/
  20. daffodil-test-ibm1/
  21. daffodil-test-integration/
  22. daffodil-udf/
  23. project/
  24. scripts/
  25. test-stdLayout/
  26. tutorials/
  27. .asf.yaml
  28. .codecov.yml
  29. .gitattributes
  30. .gitignore
  31. .sbtopts
  32. .scalafmt.conf
  33. .sonar-project.properties
  34. BUILD.md
  35. build.sbt
  36. DEVELOP.md
  37. KEYS
  38. LICENSE
  39. NOTICE
  40. README.md
README.md

Apache Daffodil is an open-source implementation of the DFDL specification that uses DFDL data descriptions to parse fixed format data into an infoset. This infoset is commonly converted into XML or JSON to enable the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of serializing or “unparsing” data back to the original data format. The DFDL infoset can also be converted directly to/from the data structures carried by data processing frameworks so as to bypass any XML/JSON overheads.

For more information about Daffodil, see https://daffodil.apache.org/.

Build Requirements

  • Java 8 or higher
  • sbt 0.13.8 or higher
  • C compiler C99 or higher
  • Mini-XML Version 3.0 or higher

See BUILD.md for more details and DEVELOP.md for a developer guide.

Getting Started

sbt is the officially supported tool to build Daffodil. Below are some of the more commonly used commands for Daffodil development.

Compile

Compile source code:

sbt compile

Test

Check all unit tests pass:

sbt test

Check all integration tests pass:

sbt daffodil-test-integration/test

Format

Check format of source and sbt files:

sbt scalafmtCheckAll scalafmtSbtCheck

Reformat source and sbt files if necessary:

sbt scalafmtAll scalafmtSbt

Build

Build the Daffodil command line interface (Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/; see the Command Line Interface documentation for details on their usage):

sbt daffodil-cli/stage

Publish the Daffodil jars to a Maven repository (for Java projects) or Ivy repository (for Scala or schema projects).

Maven (for Java or mvn):

sbt publishM2

Ivy (for Scala or sbt):

sbt publishLocal

Check Licenses

Run Apache RAT (license audit report in target/rat.txt and error if any unapproved licenses are found):

sbt ratCheck

Check Coverage

Run sbt-scoverage (report in target/scala-ver/scoverage-report/):

sbt clean coverage test daffodil-test-integration/test
sbt coverageAggregate

Getting Help

You can ask questions on the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. You can report bugs via the Daffodil JIRA.

License

Apache Daffodil is licensed under the Apache License, v2.0.