| :page-layout: page |
| :keywords: unparser InfosetInputter NextElementResolver Streaming |
| // /////////////////////////////////////////////////////////////////////////// |
| // |
| // This file is written in AsciiDoc. |
| // |
| // If you can read this comment, your browser is not rendering asciidoc automatically. |
| // |
| // You need to install the asciidoc plugin to Chrome or Firefox |
| // so that this page will be properly rendered for your viewing pleasure. |
| // |
| // You can get the plugins by searching the web for 'asciidoc plugin' |
| // |
| // You will want to change plugin settings to enable diagrams (they're off by default.) |
| // |
| // You need to view this page with Chrome or Firefox. |
| // |
| // /////////////////////////////////////////////////////////////////////////// |
| // |
| // When editing, please start each sentence on a new line. |
| // See https://asciidoctor.org/docs/asciidoc-recommended-practices/#one-sentence-per-line[one sentence-per-line writing technique.] |
| // This makes textual diffs of this file useful in a similar way to the way they work for code. |
| // |
| // ////////////////////////////////////////////////////////////////////////// |
| |
| == Infoset Inputters and Runtime 1 Streaming Unparser & NextElementResolver |
| |
| == Introduction |
| |
| Describes the way the streaming unparser in Runtime 1 works, in particular how |
| the unparser resolves the next element - identifying which ERD (Element Runtime Data) |
| corresponds to the incoming infoset event. |
| |
| == Design Principles |
| |
| Daffodil Runtime 1 keeps track of the nest of dynamic context terms at runtime on a stack which |
| is maintained by the Unparsers. |
| |
| === Schema Compilation |
| |
| At schema compilation time, we compute, for each Term, the set of possible next |
| elements that can follow it, but only as far as the end of the _lexically |
| enclosing_ model group. |
| |
| This is done in the possibleNextLexicalSiblingElementsInInfoset member of |
| the Term class. |
| |
| From this information, we compute a element resolver that handles only this |
| partial resolution, which is an instance of class PartialNextElementResolver. |
| This is associated with every Term and carried fro runtime on the TermRuntimeData |
| objects. |
| |
| This is done in the partialNextElementResolver member of the TermRuntime1Mixin, |
| as the PartialNextElementResolver is a Runtime 1 data structure/object. |
| This object is serialized as part of the TermRuntimeData. |
| |
| The fact that this is computed using only information going out to the end of the |
| single lexically enclosing model-group avoids a combinatorial |
| explosion in the schema compiler. |
| |
| === Unparser Runtime |
| |
| At runtime for the Unparser, element resolving is done by the InfosetInputter. |
| |
| Complete/total element name resolution is done by combining the partial |
| next element resolvers from the dynamic nest of terms. |
| This is done with a stack in the UState. |
| Specifically, the stack, called `trdStack` is state of the InfosetInputter. |
| Unparsers are burdened with having to push/pop TermRuntimeData objects to maintain this stack. |
| |
| The algorithm which iterates down the stack entries to carry out a next element |
| resolution for the InfosetInputter, is carried on the NextElementResolver trait, |
| which is mixed into the InfosetInputter trait, so is shared by all InfosetInputter |
| derived classes. |
| Note that no state is shared. |
| The stack is private to a given instance of an InfosetInputter class, |
| so is private per-thread state just like the rest of the UState. |
| |
| This design incurs runtime overhead to push/pop TRDs on the stack, |
| and next element resolution is no longer just a single hash-table lookup, but |
| potentially an iteration through multiple such lookups. |
| |
| === Schema Compiler Support for Dynamic Context Stack |
| |
| Now we focus in on just Term, ModelGroup, and ElementBase. |
| The diagram below shows the Term class, with the possibleNextLexicalSiblingElementsInInfoset |
| member. |
| All Terms have as a member their (1-based) position within the enclosing |
| lexical model group. |
| |
| The set of possible elements that can follow a given Term can be closed, |
| as when a required element is found downstream. If all downstream elements |
| in the enclosing lexical model group are optional, then the set of possible |
| elements is open ended. The Open/Closed status of the list is |
| signified by a PossibleNextElements case class object correspondingly. |
| |
| One final detail is that when compiling the set of possible next elements for |
| a choice, there can be elements on branches that are required, but in |
| the context of that choice, if any branch has all optional content, then |
| all the possible next elements for all branches are optional. |
| |
| The Term itself, if it is an ElementBase, has a isRequiredInInfoset member, which is determined |
| based on properties. |
| When combined in a choice, we need a way to override |
| this, and we encapsulate the ElementBase in a Sibling case class object which |
| carries its own isRequiredInInfoset member which can be set to false to |
| override the underlying ElementBase's isRequiredInInfoset due to that |
| corresponding element appearing in a choice branch. |
| |
| [plantuml, format="png"] |
| .... |
| hide empty members |
| |
| package core <<rectangle>> { |
| |
| class TermRuntime1Mixin << trait >> { |
| partialNextElementResolver : PartialNextElementResolver |
| termRuntimeData : TermRuntimeData |
| } |
| |
| class TermGrammarMixin << trait >> |
| |
| TermGrammarMixin -up-|> TermRuntime1Mixin |
| |
| abstract class Term { |
| {field} position : Int |
| {field} possibleNextLexicalSiblingElementsInInfoset : PossibleNextElements |
| } |
| Term -up-|> TermGrammarMixin |
| |
| abstract class ModelGroup |
| |
| Term <|-- ElementBase |
| Term <|-- ModelGroup |
| |
| |
| ModelGroup "1" --> "*" Term : group members |
| ModelGroup "0..1" <-- "1" ElementBase : model group |
| |
| class Sibling { |
| e: ElementBase |
| isRequiredInInfoset : Boolean |
| } |
| |
| abstract class PossibleNextElements { |
| sibs : Seq[Sibling] |
| } |
| |
| Term "1" *-- "1" PossibleNextElements : possibles > |
| PossibleNextElements <|-- Closed |
| PossibleNextElements <|-- Open |
| Sibling "*" <-right- "1" PossibleNextElements : sibs < |
| |
| |
| ElementBase "1" <-left- "1" Sibling : e < |
| |
| } |
| |
| package runtime_1 <<rectangle>> { |
| |
| abstract class TermRuntimeData{ |
| partialNextElementResolver : PartialNextElementResolver |
| } |
| |
| abstract class PartialNextElementResolver { |
| maybeNextElement : Maybe[ElementRuntimeData] |
| } |
| TermRuntimeData -down-> PartialNextElementResolver |
| |
| TermRuntimeData <-right- TermRuntime1Mixin : creates < |
| |
| PossibleNextElements .right. PartialNextElementResolver : compiles into > |
| |
| } |
| |
| .... |
| |
| The set of possible subsequent ElementBase objects that can |
| follow another is computed inductively starting from the last element |
| or group within a sequence. |
| It is limited to the length of the current sequence, since sequences can be |
| shared. |
| |
| (Note that unshared i.e., local sequences could in principle be |
| special cased - collapsed together - for purposes of this analysis. |
| We assume here that we will treat all sequence groups as if they were |
| shared via group refs.) |
| |
| === Runtime Support for Dynamic Context Stack |
| |
| Below we show the runtime 1 objects that implement the next element resolution |
| for the streaming unparser. |
| |
| The Unparser has a UState, which has an InfosetInputter. |
| The InfosetInputter trait mixes in the implementation of NextElementResolver |
| which provides the push/pop API for maintaining the TRD stack, and it |
| provides the nextElement method which does complete next element resolving, chaining |
| through the PartialNextElementResolvers obtained from the stack of TRDs. |
| |
| [plantuml, format="png"] |
| .... |
| hide empty members |
| |
| package runtime_1 <<rectangle>> { |
| |
| abstract class PartialNextElementResolver { |
| maybeNextElement : Maybe[ElementRuntimeData] |
| } |
| |
| abstract class NextElementResolver <<trait>> { |
| def nextElement(name, ns, useNS) : ElementRuntimeData |
| def pushTRD(trd) |
| def popTRD() |
| } |
| |
| NextElementResolver -right-> TermRuntimeData : has stack of |
| |
| abstract class InfosetInputter |
| |
| |
| abstract class TermRuntimeData |
| |
| TermRuntimeData *-down- "1" PartialNextElementResolver : has |
| |
| abstract class UStateMain{ |
| pushTRD(trd) |
| popTRD() |
| } |
| |
| abstract class Unparser { |
| pushTRD(trd) |
| popTRD() |
| } |
| |
| Unparser -down-> UStateMain : push/pop TRDs > |
| |
| UStateMain -right-> InfosetInputter : push/pop TRDs > |
| |
| InfosetInputter -up-|> NextElementResolver : is a |
| |
| |
| } |
| .... |
| |
| Any given PartialNextElementResolver can resolve a name (+ namespace when used) |
| to an ERD, or it can not be able to perform the resolution, which is not |
| an error. |
| |
| As a whole the next-element resoluiton algorithm requires that the unparser |
| maintain the stack of TermRuntimeData objects and the resolution |
| algorithm works down the stack using the partial resolver from the most- |
| recently pushed runtime data object first. |
| If that does not resolve the element it either fails - because the partial |
| resolver knows that the incoming element event *must* be one of the possibilities |
| it represents (Closed set of possibles), or it moves to the next deeper runtime-data |
| object on the stack having a partial next element resolver (for Open set of possibles), |
| and tries that. |
| This continues until it succeeds, or until an ERD is found on the stack, at |
| which point the resolution fails and an unparse error (fatal) is issued. |
| |
| Note that the context of next-element-resolution cannot span the boundary of a |
| complex element. |
| This is because an end-element event must be received before any subsequent |
| element start events. |
| |
| This dynamic-context TRD stack need not be copied to UStates for Suspensions as |
| those only occur after Infoset elements have been created. |
| |
| === Some interactions, or non-interactions with other aspects of the Unparser |
| |
| * Orthogonal to suspensions - next element resolution is over before a |
| suspension is constructed for the element. |
| |
| ==== Interaction with ChoiceBranchMap |
| |
| Next element resolving for the infoset inputter is orthogonal to |
| choiceBranchMap - next element resolution must be done |
| first, and the result of it is used by the choiceBranchMap to choose |
| which branch of the choice is implied by arrival of that element. |
| |
| === Testing and Design for Test |
| |
| * Schema Compiler |
| ** Unit tests in scala code for schema compiler methods that compute the _possibles_ object. |
| These should cover various combinations of sequences/choices with required |
| and optional elements and groups shared using group references. |
| ** Instrumentation in the schema compiler to measure the amount of work going |
| on so that combinatorial explosions are detected earlier. |
| This should output a report of the metrics at the end of compilation, or perhaps |
| incrementally as compilation proceeds. |
| |
| * Runtime 1 |
| ** Faking TRDs so as to test XML InfosetOutputters in unit testing. Only the |
| partialNextElementResolver member of the TRD is needed. |
| It may be possible to use |
| the schema compiler to create the appropriate scenario of TRDs and ERDs and |
| then the InfosetOutputter push(trd) method can be called to create a stack |
| that matches a test scenario. This allows testing the nextElement resolver |
| algorithm in isolation from the push/pop logic that Unparsers must maintain. |
| ** Unparsers must push/pop appropriately. Possibly built in checking can be |
| put in place that detects when a pop is missing? This is TBD as the push/pop |
| may or may not be able to be centralized in the code base. |
| |
| |
| === Summary of Features |
| |
| * Dynamic context stack (stack of RuntimeData) in UState |
| * Unparsers maintain dynamic context stack, pushing and popping TRDs as model-groups are processed. |
| * Stack algorithm for NextElementResolver runs down stack trying partial resolutions |
| * Algorithms on Term to determine sets of possibly following infoset events. |
| |
| == Appendix: Review of DSOM Design |
| |
| === DSOM Term Classes |
| |
| In Daffodil, DSOM now enables sharing of ModelGroup members, which are instances |
| of classes derived from Term. |
| |
| The Term class is central to Daffodil because Terms are the entities that can |
| can resolve scoped properties by combining those from an element ref or group ref, to |
| and element decl or group def, from element decl to type def, etc. |
| Term is the start point for the chains of non-default property providers, |
| and for the chains of default property providers, from which scoped properties are resolved. |
| |
| For review, a DSOM class diagram showing the Term class hierarchy is below. |
| |
| [plantuml, format="png"] |
| .... |
| hide empty members |
| abstract class Term |
| |
| together { |
| abstract class SequenceTermBase |
| abstract class ChoiceTermBase |
| abstract class ElementBase |
| } |
| |
| abstract class ModelGroup |
| |
| Term <|-- ElementBase |
| Term <|-- ModelGroup |
| ModelGroup <|-- SequenceTermBase |
| ModelGroup <|-- ChoiceTermBase |
| |
| abstract class LocalElementDeclBase |
| ElementBase <|-- LocalElementDeclBase |
| LocalElementDeclBase <|-- LocalElementDecl |
| abstract class AbstractElementRef |
| ElementBase <|-- AbstractElementRef |
| AbstractElementRef <|-- Root |
| AbstractElementRef <|-- ElementRef |
| |
| together { |
| class LocalElementDecl |
| class ElementRef |
| |
| class Sequence |
| class SequenceGroupRef |
| class Choice |
| class ChoiceGroupRef |
| } |
| together { |
| class Root |
| class PrefixLengthQuasiElementDecl |
| class RepTypeQuasiElementDecl |
| class ChoiceBranchImpliedSequence |
| } |
| abstract class QuasiElementDeclBase |
| LocalElementDeclBase <|-- QuasiElementDeclBase |
| QuasiElementDeclBase <|-- PrefixLengthQuasiElementDecl |
| QuasiElementDeclBase <|-- RepTypeQuasiElementDecl |
| |
| abstract class SequenceGroupTermBase |
| SequenceTermBase <|-- SequenceGroupTermBase |
| SequenceTermBase <|-- ChoiceBranchImpliedSequence |
| SequenceGroupTermBase <|-- Sequence |
| SequenceGroupTermBase <|-- SequenceGroupRef |
| |
| abstract class ChoiceGroupTermBase |
| ChoiceTermBase <|-- ChoiceGroupTermBase |
| ChoiceGroupTermBase <|-- Choice |
| ChoiceGroupTermBase <|-- ChoiceGroupRef |
| |
| ModelGroup "1" ..> "*" Term : group members |
| ModelGroup "0..1" <-- "1" ElementBase : model group |
| |
| .... |
| |
| The above reflects the status quo of objects. |
| The concrete classes are all at the bottom (marked with C). |
| |
| Most of organizing principles depend on just the core abstract classes. |
| Here's the same diagram, but focusing in on just the core abstract classes. |
| |
| [plantuml, format="png"] |
| .... |
| hide empty members |
| abstract class Term |
| |
| together { |
| abstract class SequenceTermBase |
| abstract class ChoiceTermBase |
| abstract class ElementBase |
| } |
| |
| abstract class ModelGroup |
| abstract class GroupDefLike <<trait>> |
| abstract class GroupRef <<trait>> |
| |
| Term <|-- ElementBase |
| Term <|-up- ModelGroup |
| ModelGroup <|-- SequenceTermBase |
| ModelGroup <|-- ChoiceTermBase |
| GroupDefLike <|.. ModelGroup : some are |
| GroupRef <|.. ModelGroup : some are |
| GroupRef -left-> GroupDefLike : group ref |
| |
| GroupDefLike "1" -up-> "*" Term : group members |
| ModelGroup "0..1" <-- "1" ElementBase : model group |
| |
| .... |
| |
| The dashed lines from ModelGroup are to illustrate that some ModelGroup classes |
| inherit from GroupDefLike, as they define lexically, a surround of their group |
| members. |
| Other ModelGroup classes inherit GroupRef (SequenceGroupRef and ChoiceGroupRef) |
| which share the group definition. |
| Ignoring the Root element for a moment, every Term has a lexically enclosing |
| ModelGroup which inherits from GroupDefLike. This is the mechanism by which |
| ModelGroups and their contents can be shared in multiple distinct contexts. |
| |
| With those classes in mind, we can look into the changes to Daffodil |
| to support a _dynamic context stack_. |
| |
| CAUTION: Some code in the schema compiler has moved around to prepare for an era where |
| Daffodil supports more than one backend/runtime system. |
| See the <<Appendix: Schema Compiler grammar and runtime1 Packages,Appendix: Schema Compiler grammar and runtime1 Packages>> |
| |
| |
| == Appendix: Schema Compiler grammar and runtime1 Packages |
| |
| The basic DSOM object classes are composed by mixin with components coming |
| from different packages that separate concerns. |
| |
| Some basic non-function-changing refactoring begins the process of breaking out |
| runtime1-specific code and separating that from code that will be in common |
| to all runtime backends. |
| |
| The primary packages in the schema compiler (daffodil-core) are: |
| |
| * dsom - The Daffodil Schema Object Model - object corresponding to the parts |
| of a DFDL schema. Handles properties, property scoping, include/import, |
| namespaces, and all XML-schema-related functionality. |
| * grammar - Introduces combinators and primitives of the data syntax grammar. |
| Supposed to contain all backend/runtime independent logic. |
| * runtime1 - Logic (mixins, methods, members, classes, etc.) specific to the |
| _Runtime 1_ backend. |
| For example, this includes calculations that create the data structures |
| needed to support streaming unparsing, a complex behavior that other Daffodil |
| runtime backends are unlikely to implement. |
| |
| Some primary classes and mixins from these packages are shown here: |
| |
| [plantuml, format="png"] |
| .... |
| hide empty members |
| package core <<rectangle>> { |
| |
| package grammar { |
| |
| abstract class Gram <<trait>> { |
| } |
| |
| abstract class TermGrammarMixin <<trait>> { |
| def gram : Gram |
| } |
| abstract class ModelGroupGrammarMixin <<trait>> |
| abstract class ElementBaseGrammarMixin <<trait>> |
| abstract class SequenceGrammarMixin <<trait>> |
| abstract class ChoiceGrammarMixin <<trait>> |
| } |
| |
| note bottom of grammar |
| Contains Combinators and Primitives that |
| represent the grammar regions that make up |
| the data syntax grammar. |
| |
| Optimizes this in a runtime-backend independent |
| manner. |
| end note |
| |
| package runtime1 { |
| |
| abstract class GramRuntime1Mixin <<trait>> { |
| def parser(): Parser |
| def unparser(): Unparser |
| } |
| Gram -up-|> GramRuntime1Mixin |
| abstract class TermRuntime1Mixin <<trait>> { |
| def termRuntimeData |
| } |
| TermGrammarMixin -up-> TermRuntime1Mixin |
| abstract class ModelGroupRuntime1Mixin <<trait>> { |
| def termRuntimeData = modelGroupRuntimeData |
| def modelGroupRuntimeData |
| } |
| ModelGroupGrammarMixin -up-> ModelGroupRuntime1Mixin |
| abstract class ElementBaseRuntime1Mixin <<trait>> { |
| def termRuntimeData = elementRuntimeData |
| lazy val elementRuntimeData |
| } |
| ElementBaseGrammarMixin -up-> ElementBaseRuntime1Mixin |
| abstract class SequenceRuntime1Mixin <<trait>> { |
| def modelGroupRuntimeData = sequenceRuntimeData |
| lazy val sequenceRuntimeData |
| } |
| SequenceGrammarMixin -up-> SequenceRuntime1Mixin |
| abstract class ChoiceRuntime1Mixin <<trait>> { |
| def modelGroupRuntimeData = choiceRuntimeData |
| lazy val choiceRuntimeData |
| } |
| ChoiceGrammarMixin -up-> ChoiceRuntime1Mixin |
| |
| } |
| |
| note bottom of runtime1 |
| Contains mixins with methods and members |
| specific to the Runtime 1 backend. |
| end note |
| |
| package dsom { |
| abstract class Term |
| Term -up-|> TermGrammarMixin |
| Term -up-> Gram : gram |
| |
| together { |
| abstract class SequenceTermBase |
| abstract class ChoiceTermBase |
| abstract class ElementBase |
| } |
| |
| abstract class ModelGroup |
| ModelGroup -up-|> ModelGroupGrammarMixin |
| ElementBase -up-|> ElementBaseGrammarMixin |
| SequenceTermBase -up-> SequenceGrammarMixin |
| ChoiceTermBase -up-|> ChoiceGrammarMixin |
| |
| Term <|-- ElementBase |
| Term <|-- ModelGroup |
| ModelGroup <|-- SequenceTermBase |
| ModelGroup <|-- ChoiceTermBase |
| |
| |
| ModelGroup "1" --> "*" Term : group members |
| ModelGroup "0..1" <-- "1" ElementBase : model group |
| } |
| } |
| .... |
| |
| CAUTION: The grammar.primitives package has not yet been refactored to break |
| runtime1-specific methods out into runtime1-package mixins. |
| Each Primitive or |
| Combinator should mixin a trait for each runtime. |
| |
| CAUTION: DPath expressions are another whole area that are not as yet |
| refactored to separate specific runtime 1 functionality from general |
| functionality. |
| |
| CAUTION: Eventually this mixin of runtime should be unmixed and converted |
| into a delegation structure so that one can choose a runtime instead of |
| having all runtimes mixed in. E.g., today every runtime has to use |
| different method names. |
| |
| == Appendix: An Aside about a Recursive Future |
| |
| In a future version of DFDL we plan to allow recursive definitions. |
| It is theoretically possible to define a recursive structure using |
| only reusable groups: |
| |
| <xs:element name="r"> |
| <xs:complexType> |
| <xs:group ref="g"/> |
| </xs:complexType> |
| </xs:eleemnt> |
| |
| <xs:group name="g"> |
| <xs:choice> |
| <xs:sequence> |
| <xs:sequence> |
| <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/"> |
| <dfdl:assert testKind="pattern" testPattern="."/><!-- there is more data --> |
| </xs:appinfo></xs:annotation> |
| </xs:sequence> |
| <xs:element name="x" type="xs:string" dfdl:length="1" dfdl:lengthKind="explicit"/> |
| <xs:group ref="g"/> |
| </xs:sequence> |
| <xs:sequence/> |
| </xs:choice> |
| <xs:group> |
| |
| The above example simulates an array using group recursion. |
| |
| CAUTION: It is not clear that even if DFDL allows recursion it would allow it on |
| groups alone. |
| Requiring recursion to utilize elements would seem to be consistent with DFDL's |
| current restriction which requires repeating/optional things to be elements. |
| |
| CAUTION: Furthermore, it is not clear if the above is allowed in XML Schema. |
| XML Schema's UPA rules may in fact require elements to be used in the |
| formulation of recursive structure. |
| |
| Recursion requires evolution of the current Daffodil architecture to one that |
| enables sharing of definitions of groups, elements and types. |
| |
| == Notes on Rationale and Evolution from Earlier Design Points |
| |
| This section describes the implementation prior to the fix to Daffodil-2192, and |
| how it evolved. The changes from the prior design were significant. |
| |
| The code design is somewhat simpler in that prior code has two different kinds of |
| next element resolvers, one for next-siblings, and one for first children of |
| an element of complex type. |
| This is simplified in the current design. There is only one PartialNextElementResolver |
| per TermRuntimeData object. |
| |
| Note that the prior NextElementResolver class went away and |
| its functionality appears in the PartialNextElementResolver to |
| reflect that the class itself implements only |
| part of the algorithm. |
| |
| === Prior Behavior (Flawed) Part 1 |
| |
| Currently, the InfosetInputter architecture is dependent on being able to |
| take an element name (+ namespace when used), have as a context a current |
| element ElementRuntimeData object, and then given a hash-table which is |
| stored on the element ERD, lookup the name and namespace (when used) to |
| get the next element ERD. |
| This lookup hash table is encapsulated in the NextElementResolver classes. |
| At schema compile time, given each element ERD, it is precomputed what the |
| possible valid set of subsequent following ERDs can be (including the current |
| one again, in case of an array). |
| This set of possibilities is captured in an instance of the NextElementResolver |
| class, and is stored on each ERD. |
| |
| This process makes unparsing a self-validating kind of processing. |
| As an external representation of the Infoset is converted into an |
| actual stream of Infoset events, each incoming element event is |
| scrutinized against the schema for whether it is allowed. |
| This scrutiny is built into the fact that the incoming element |
| name+namespace is being looked up in the NextElementResolver, and then |
| once each element is created, a new NextElementResolver object is |
| taken from its ERD for the next infoset event lookup. |
| |
| An invariant (which remains true in current behavior and in proposed new |
| changes) is that the next element resolution process cannot cross a complex |
| -type element boundary, as an end-element event ends the scope of the possible |
| legal subsequent elements that can arrive next. |
| However, within a complex type, there can be many groups and group references. |
| |
| This architecture depends on the fact that the ElementRuntimeData (ERD) objects |
| are unique per element, and per element context - that is, they are depending |
| on the fact that element ERDs are not shared and so they uniquely identify both |
| the element, and the context of that element (nest of all enclosing groups). |
| |
| As part of the changes to fix our schema compiler speed/space issues |
| (DAFFODIL-1444), this assumption is no longer valid. |
| Elements (specifically DSOM ElementBase objects) can be shared. Groups |
| can be shared. |
| The expected improvement in schema compiler performance and the space |
| reduction is fundamentally driven from this sharing. |
| This sharing is also needed to implement a desirable future feature |
| which is to enable recursive definitions in DFDL. |
| |
| Ultimately, the required change is from one where an ERD is a unique |
| identifier of an element and all its dynamic context, to one where |
| because ERDs are shared, the Unparser runtime must maintain a stack of |
| the dynamic context information sufficient to perform next-element |
| resolution as each infoset event must be synthesized. |
| |
| See also <<Appendix: An Aside about a Recursive Future,Appendix: An Aside about a Recursive Future>> |
| |
| === Prior Behavior (Flawed) Part 2 |
| |
| As part of changes to _eventually_ improve schema compiler space and speed usage, |
| (DAFFODIL-1444) we introduced sharing information to the DSOM objects in the schema. |
| |
| The sharing information is present - every global schema component |
| object can examine every context in the schema from which it is shared. |
| |
| This unfortunately creates a situation where we get a combinatorial explosion |
| of computation. |
| The sharable objects, such as a group definition, are still today NOT being |
| shared, they are being copied. |
| Yet within each copy, each element in it behaves as if it was being shared. |
| So, for example, if you ask the last element of a group definition what possible |
| elements can follow it, it responds with a list of every element that can |
| follow for every context where the group is referenced. |
| What's worse, this same thing is calculated repeatedly since the sharable |
| group is, in fact, not being shared currently so there are many copies of |
| this final element to do this computation on. |
| |
| The result is something which grows in algorithmic complexity very fast. |
| If you have a very small schema or wait long enough, then compilation of the |
| schema should complete and be correct. |
| Hence, most all small tests work fine. |
| Larger schemas fail to finish compiling in any reasonable amount of time or |
| memory space. |