| :page-layout: page |
| :keywords: dfdl-to-c backend code-generator runtime2 |
| // /////////////////////////////////////////////////////////////////////////// |
| // |
| // This file is written in https://asciidoctor.org/docs/what-is-asciidoc/[AsciiDoc] |
| // with https://rhodesmill.org/brandon/2012/one-sentence-per-line/[semantic linefeeds]. |
| // |
| // When editing, please start each sentence on a new line. |
| // This makes textual diffs of this file useful |
| // in a similar way to the way they work for code. |
| // |
| // ////////////////////////////////////////////////////////////////////////// |
| |
| == Runtime2 ToDos |
| |
| === Overview |
| |
| We have built an initial DFDL-to-C backend |
| and code generator for Apache Daffodil. |
| Currently the C code generator can support |
| binary boolean, integer, and real numbers, |
| arrays of simple and complex elements, |
| choice groups using dispatch/branch keys, |
| validation of "fixed" attributes, |
| and padding of explicit length complex elements with fill bytes. |
| We plan to continue building out the C code generator |
| until it supports a minimal subset of the DFDL 1.0 specification |
| for embedded devices. |
| |
| We are using this document |
| to keep track of some changes |
| requested by reviewers |
| so we don't forget to make these changes. |
| If someone wants to help |
| (which would be appreciated), |
| please let the mailto:dev@daffodil.apache.org[dev] list know |
| in order to avoid duplication. |
| |
| === Reporting errors using structs, not strings |
| |
| We have replaced error message strings |
| with error structs everywhere now. |
| However, we may need to expand the error struct |
| to include a pointer (pstate/ustate for data position) |
| and another pointer (ERD or static context object |
| for schema filename/line number). |
| |
| We also may want to implement error logging variants |
| that both do and don't humanize the errors, |
| e.g., a hardware/FPGA-type implementation might just output numbers |
| and an external tool might have to "humanize" these numbers |
| using knowledge of the schema and runtime data objects, |
| like an offline log processor does. |
| |
| === Recovering after errors |
| |
| As we continue to build out runtime2, |
| we may need to distinguish more types of errors |
| and allow backtracking and retrying. |
| Right now we handle only parse/unparse and |
| validation errors in limited ways. |
| Parse/unparse errors abort the parsing/unparsing |
| and return to the caller immediately |
| without resetting the stream's position. |
| Validation errors are collected in an array |
| and printed after parsing or unparsing. |
| The only places where there are calls to stop the program |
| are in daffodil_main.c (top-level error handling) |
| and stack.c (empty, overflow, underflow errors which should never happen). |
| |
| Most of the parse functions set pstate->error |
| only if they couldn't read data into their buffer |
| due to an I/O error or EOF, |
| which doesn't seem recoverable to me. |
| Likewise, the unparse functions set ustate->error |
| only if they couldn't write data from their buffer |
| due to an I/O error, which doesn't seem recoverable to me. |
| |
| Only the parse_endian_bool functions set pstate->error |
| if they read an integer which doesn't match either true_rep or false_rep |
| when an exact match to either is required. |
| If we decide to implement backtracking and retrying, |
| they should call fseek to reset the stream's position |
| back to where they started reading the integer |
| before they return to their callers. |
| Right now all parse calls are followed by |
| if statements to check for error and return immediately. |
| The code generator would have to generate code |
| which can advance the stream's position by some byte(s) |
| and try the parse call again as an attempt |
| to resynchronize with a correct data stream |
| after a bunch of failures. |
| |
| Note that we actually run the generated code in an embedded processor |
| and call our own fread/frwrite functions |
| which replace the stdio fread/fwrite functions |
| since the C code runs bare metal without OS functions. |
| We can implement fseek but we should have a good use case. |
| |
| === Javadoc-like tool for C code |
| |
| We should consider adopting |
| one of the javadoc-like tools for C code |
| and structuring our comments that way. |
| |
| === DSOM "fixed" getter |
| |
| Note: If we change runtime1 to validate "fixed" values |
| like runtime2 does, then we can resolve {% jira 117 %}. |
| |
| === Update to TDML Runner |
| |
| We want to update the TDML Runner |
| to make it easier to run TDML tests |
| with both runtime1 and runtime2. |
| We want to eliminate the need |
| to configure a `daf:tdmlImplementation` tunable |
| in the TDML test using 12 lines of code. |
| The TDML Runner should configure itself |
| to run both/either runtime1 and/or runtime2 |
| just from seeing a root attribute |
| saying `defaultImplementations="daffodil runtime2"` |
| or a parser/unparseTestCase attribute saying `implementations="runtime2"`. |
| Maybe we also want to add an implementation attribute |
| to tdml:errors/warnings elements |
| saying which implementation they are for too. |
| If we do that, |
| we should tell the TDML Runner |
| runtime2 tests are not cross tests |
| so it will check their errors/warnings. |
| |
| === C struct/field name collisions |
| |
| To avoid possible name collisions, |
| we should prepend struct names and field names with namespace prefixes |
| if their infoset elements have non-null namespace prefixes. |
| Alternatively, we may need to use enclosing elements' names |
| as prefixes to avoid name collisions without namespaces. |
| |
| === Anonymous/multiple choice groups |
| |
| We already handle elements having xs:choice complex types. |
| In addition, we should support anonymous/multiple choice groups. |
| We may need to refine the choice runtime structure |
| in order to allow multiple choice groups |
| to be inlined into parent elements. |
| Here is an example schema |
| and corresponding C code to demonstrate: |
| |
| [source,xml] |
| ---- |
| <xs:complexType name="NestedUnionType"> |
| <xs:sequence> |
| <xs:element name="first_tag" type="idl:int32"/> |
| <xs:choice dfdl:choiceDispatchKey="{xs:string(./first_tag)}"> |
| <xs:element name="foo" type="idl:FooType" dfdl:choiceBranchKey="1 2"/> |
| <xs:element name="bar" type="idl:BarType" dfdl:choiceBranchKey="3 4"/> |
| </xs:choice> |
| <xs:element name="second_tag" type="idl:int32"/> |
| <xs:choice dfdl:choiceDispatchKey="{xs:string(./second_tag)}"> |
| <xs:element name="fie" type="idl:FieType" dfdl:choiceBranchKey="1"/> |
| <xs:element name="fum" type="idl:FumType" dfdl:choiceBranchKey="2"/> |
| </xs:choice> |
| </xs:sequence> |
| </xs:complexType> |
| ---- |
| |
| [source,c] |
| ---- |
| typedef struct NestedUnion |
| { |
| InfosetBase _base; |
| int32_t first_tag; |
| size_t _choice_1; // choice of which union field to use |
| union |
| { |
| foo foo; |
| bar bar; |
| }; |
| int32_t second_tag; |
| size_t _choice_2; // choice of which union field to use |
| union |
| { |
| fie fie; |
| fum fum; |
| }; |
| } NestedUnion; |
| ---- |
| |
| === Choice dispatch key expressions |
| |
| We currently support only a very restricted |
| and simple subset of choice dispatch key expressions. |
| We would like to refactor the DPath expression compiler |
| and make it generate C code |
| in order to support arbitrary choice dispatch key expressions. |
| |
| === No match between choice dispatch key and choice branch keys |
| |
| Right now c-daffodil is more strict than scala-daffodil |
| when unparsing infoset XML files with no matches (or mismatches) |
| between choice dispatch keys and branch keys. |
| Perhaps c-daffodil should load such an XML file |
| without a no match processing error |
| and unparse the infoset to a binary data file |
| without a no match processing error. |
| We would have to code and call a choice branch resolver in C |
| which peeks at the next XML element, |
| figures out which branch |
| does that element indicate exists |
| inside the choice group, |
| and initializes the choice and element runtime data |
| (_choice and childNode->erd member fields) accordingly. |
| We probably would replace the initChoice() call in walkInfosetNode() |
| with a call to that choice branch resolver |
| and we might not need to call initChoice() in unparseSelf(). |
| When I called initChoice() in all these parse, walk, and unparse places, |
| I was pondering removing the _choice member field |
| and calling initChoice() as a function |
| to tell us which element to visit next, |
| but we probably should have a mutable choice runtime data structure |
| that applications can override if they want to. |
| |
| === Floating point numbers |
| |
| Right now runtime2 prints floating point numbers |
| in XML infosets slightly differently than runtime1 does. |
| This means we may need to use different XML infosets |
| in TDML tests depending on the runtime implementation. |
| In order to use the same XML infoset in TDML tests, |
| we should make the TDML Runner |
| compare floating point numbers numerically, not textually, |
| as discussed in https://issues.apache.org/jira/browse/DAFFODIL-2402[DAFFODIL-2402]. |
| |
| === Arrays |
| |
| Instead of expanding arrays inline within childrenERDs, |
| we may want to store a single entry |
| for an array in childrenERDs |
| giving the array's offset and size of all its elements. |
| We would have to write code |
| for special case treatment of array member fields |
| versus scalar member fields |
| but we could save space/memory in childrenERDs |
| for use cases with very large arrays. |
| An array element's ERD should have minOccurs and maxOccurs |
| where minOccurs is unsigned |
| and maxOccurs is signed with -1 meaning "unbounded". |
| The actual number of children in an array instance |
| would have to be stored with the array instance |
| in the C struct or the ERD. |
| An array node has to be a different kind of infoset node |
| with a place for this number of actual children to be stored. |
| Probably all ERDs should just get minOccurs and maxOccurs |
| and a scalar is just one with 1, 1 as those values, |
| an optional element is 0, 1, |
| and an array is all other legal combinations |
| like N, -1 and N, and M with N<=M. |
| A restriction that minOccurs is 0, 1, |
| or equal to maxOccurs (which is not -1) |
| is acceptable. |
| A restriction that maxOccurs is 1, -1, |
| or equal to minOccurs |
| is also fine |
| (means variable-length arrays always have unbounded number of elements). |
| |
| === Daffodil module/subdirectory names |
| |
| When Daffodil is ready to move from a 3.x to a 4.x release, |
| rename the modules to have shorter and easier to understand names |
| as discussed in https://issues.apache.org/jira/browse/DAFFODIL-2406[DAFFODIL-2406]. |