Fix some runtime2 todos

Parse C executable's command line arguments with getopt instead of
argp (now have to put all options before first non-option argument and
lost ability to say "daffodil parse in.dat -o out.xml", but CLI was
meant for TDML runner rather than users anyway).

Update build instructions and CI workflow to build with clang instead
of gcc.  Fix MSYS2 portability problems exposed by change (e.g., no
error() function).

Lower minimum required Mini-XML version from 3.2 to 3.0.  Add
missing LICENSE and NOTICE files to daffodil-runtime2 jar.

Move everything CLI-related in libruntime out to libcli.  Make error
lookup mechanism pluggable with a hook to let libcli insert its own
error lookup routine.  Also make changes and move files to help
automate clang-format, include-what-you-use, and generated_code.[ch]
updates later.

DAFFODIL-2500, DAFFODIL-2505, DAFFODIL-2508
---

main.yml: Build C files with clang instead of gcc, but build mxml with
gcc on MYSYS2 since runtime2 tests break with mxml compiled by
clang. Also need diffutils on MSYS2 to avoid mxml makefile calling
Cygwin's cmp with unnecessary error message.

BUILD.md: Lower Mini-XML version to 3.0.  Remove argp and gcc
instructions.  Add clang instructions.  Show how to set env vars on
one line.  On MSYS2, install diffutils to let mxml makefile call cmp.

README.md: Lower Mini-XML version to 3.0.

.clang-format: Add StatementsMacros to allow clang-format to format
parsers.c and unparsers.c without unexpected indentation.  Also make
all C files tell clang-format to leave include lines alone to prevent
clang-format from interfering with include-what-you-use (SortIncludes:
false isn't sufficient since clang-format still re-indents comments):

  // clang-format off
  #include ...  // for ...
  // clang-format on

LICENSE: Add to daffdil-runtime2 jar.

NOTICE: Add to daffdil-runtime2 jar.

Makefile: Add comments explaining how to use targets.  Rename tests to
check target for consistency with GNU make standard.  Put options
before non-option arguments in daffodil commands.

All C files: Tell clang-format to leave includes alone so it won't
interfere with include-what-you-use.

cli_errors.[ch]: Move all error codes and messages used by libcli
here.  Use lookup tables and pluggable mechanism to allow libruntime
to look up and format libcli messages.  Move/rename those CLI-related
enum constants here from libruntime:

  - ERR_FILE_CLOSE (all ERR_* renamed to CLI_*)
  - ERR_FILE_OPEN
  - ERR_INFOSET_READ/WRITE (-> CLI_INVALD_INFOSET)
  - ERR_STACK_EMPTY
  - ERR_STACK_OVERFLOW
  - ERR_STACK_UNDERFLOW
  - ERR_STRTOBOOL
  - ERR_STRTOD_ERRNO
  - ERR_STRTOI_ERRNO
  - ERR_STRTONUM_EMPTY
  - ERR_STRTONUM_NOT
  - ERR_STRTONUM_RANGE
  - ERR_XML_DECL
  - ERR_XML_ELEMENT
  - ERR_XML_ERD
  - ERR_XML_GONE
  - ERR_XML_INPUT
  - ERR_XML_LEFT
  - ERR_XML_MISMATCH
  - ERR_XML_WRITE
  - LIMIT_XML_NESTING

daffodil_argp.[ch]: Rename to daffodil_getopt.[ch].

daffodil_getopt.c: Remove all argp structs and handlers.  Simplify to
just a single daffodil_parse_cli function calling getopt and returning
a pointer to Error if any error happens.  Note getopt has no portable
way to parse "daffodil [options] command [more options] arguments" so
callers now have to put all options before first non-option argument
("daffodil [options] command arguments").

daffodil_getopt.h: Include "errors.h" and make parse_daffodil_cli
return a pointer to Error so we can use continue_or_exit(error) for
all errors/messages.

daffodil_main.c: Remove fflush_continue_or_exit (no really good reason
to flush a stream right before closing it).  Call continue_or_exit
after parse_daffodil_cli to handle any CLi error.  Simplify rest of
code in main function.  Rename error enumerations (ERR -> CLI) and
initialize c field with 0 instead of s field with NULL since we now
sort Error fields alphabetically (also needed in stack.c,
xml_reader.c, xml_writer.c).

errors.c: Remove <error.h> and replace error calls with fprintf/exit
calls since error's a GNU function which MYSYS2 clang doesn't provide.
Move eof_or_error, get_diagnostics, add_diagnostics up so they come
first in file.  Replace error_message function containing switch
statement with error_lookup function indexing lookup table and
returning ErrorLookup structs.  Change print_maybe_stop function to
call both error_lookup and cli_error_lookup (pluggable mechanism used
to look up CLI errors) and switch on ErrorField enumerations instead
of ErrorCode enumerations to print formatted messages with appropriate
Error fields.  Add check_error_lookup function (after you rearrange
codes or messages or get an assertion failure in error_lookup, call
check_error_lookup from the debugger to check all codes map to
expected messages).

errors.h: Move CLI-related errors to cli_errors.h.  Add ERR_ZZZ
enumeration to allow libcli's first error code to be numbered
consecutively following libruntime's last error code without a gap
between them.  Add "int c" member to ErrorCode anonymous union for
getopt-related errors.  Add ErrorField enumerations and ErrorLookup
struct to allow print_maybe_stop to print libcli's messages without
hardcoding any knowledge about them.  Sort Error's fields
alphabetically.  Move PState and UState structs back to infoset.h
where they really belong.  Declare daffodil_program_version (we were
using argp_program_version before which is no longer available) and
move eof_or_error up before get_diagnostics.  Declare cli_error_lookup
pluggable mechanism to allow print_maybe_stop to look up libcli's
messages without knowing anythng about them.

infoset.h: Move PState and UState structs back here.

parsers.c, unparsers.c: Initialize c field with 0 and use explicit .s
identifier to initialize s field since we now sort Error fields
alphabetically.

unparsers.h: Format from 80 to 100 columns with clang-format (missed
this one before).

CodeGenerator.scala: Pass relative/*.c filenames instead of absolute
filenames on Windows to avoid problem running MSYS2 clang compiler on
Windows.  Remove "-largp" since we no longer need it on MSYS2.
Reorder pickCompiler's list of compilers to prefer ${CC}, cc, clang,
gcc in that order.

Runtime2DataProcessor.scala: Call executable with -o outfile before
parse or unparse in CLI command lines.

CodeGeneratorState.scala: Initialize c field with 0 since we now sort
Error fields alphabetically.  Replace argp_program_version with
daffodil_program_version since we no longer use "argp.h".

NestedUnion.[ch] -> generated_code.[ch]: Move and regenerate with code
generator without renaming or manual editing to enable automated
update of generated_code.[ch] examples with daffodil's C code
generator in future.

ex_nums.[ch] -> generated_code.[ch]: Move and regenerate with code
generator without renaming or manual editing to enable automated
update of generated_code.[ch] examples with daffodil's C code
generator in future.

Rat.scala: Ignore generated_code.[ch] examples since daffodil's C code
generator doesn't include Apache license when generating them.
36 files changed
tree: 19b1c00d9e2f0b472c00fab54a60f77805060f9f
  1. .github/
  2. containers/
  3. daffodil-cli/
  4. daffodil-core/
  5. daffodil-io/
  6. daffodil-japi/
  7. daffodil-lib/
  8. daffodil-macro-lib/
  9. daffodil-propgen/
  10. daffodil-runtime1/
  11. daffodil-runtime1-unparser/
  12. daffodil-runtime2/
  13. daffodil-sapi/
  14. daffodil-schematron/
  15. daffodil-tdml-lib/
  16. daffodil-tdml-processor/
  17. daffodil-test/
  18. daffodil-test-ibm1/
  19. daffodil-udf/
  20. project/
  21. test-stdLayout/
  22. tutorials/
  23. .asf.yaml
  24. .codecov.yml
  25. .gitattributes
  26. .gitignore
  27. .sbtopts
  28. BUILD.md
  29. build.sbt
  30. KEYS
  31. LICENSE
  32. NOTICE
  33. README.md
  34. sonar-project.properties
README.md

Apache Daffodil is an open-source implementation of the DFDL specification that uses DFDL data descriptions to parse fixed format data into an infoset. This infoset is commonly converted into XML or JSON to enable the use of well-established XML or JSON technologies and libraries to consume, inspect, and manipulate fixed format data in existing solutions. Daffodil is also capable of serializing or “unparsing” data back to the original data format. The DFDL infoset can also be converted directly to/from the data structures carried by data processing frameworks so as to bypass any XML/JSON overheads.

For more information about Daffodil, see https://daffodil.apache.org/.

Build Requirements

  • JDK 8 or higher
  • SBT 0.13.8 or higher
  • C compiler C99 or higher
  • Mini-XML Version 3.0 or higher

See BUILD.md for more details.

Getting Started

SBT is the officially supported tool to build Daffodil. Below are some of the more commonly used commands for Daffodil development.

Compile

Compile source code:

sbt compile

Tests

Run unit tests:

sbt test

Run command line interface tests:

sbt IntegrationTest/test

Command Line Interface

Build the command line interface (Linux and Windows shell scripts in daffodil-cli/target/universal/stage/bin/; see the Command Line Interface documentation for details on their usage):

sbt daffodil-cli/stage

License Check

Run Apache RAT (license audit report in target/rat.txt and error if any unapproved licenses are found):

sbt ratCheck

Test Coverage Report

Run sbt-scoverage (report in target/scala-ver/scoverage-report/):

sbt clean coverage test IntegrationTest/test
sbt coverageAggregate

Getting Help

You can ask questions on the dev@daffodil.apache.org or users@daffodil.apache.org mailing lists. You can report bugs via the Daffodil JIRA.

License

Apache Daffodil is licensed under the Apache License, v2.0.