AVRO-2906: Traversal validation (#936)

* AVRO-2906: Convert validation to a traversal-based approach

Use schema-type specific iterators and validators to allow a
breadth-first traversal of a full schema, validating each node
as you go.

The benefit of this approach is that it allows us to pin-point
the specific part of the schema that has failed validation.
Where previously the error message for a large schema would print
the entire datum as well as the full schema and say "this is not
that", this new approach will print the specific sub-schema that has
failed in order to allow more informative errors.

A second improvement is that by traversing the schema instead of
processing it recursively, the algorithm is more efficient in use
of system resources.  In particular for schemas that have lots of
nested parts, this will make a difference.

Make the required changes to pass tests in all supported python versions.

This commit removes type hints present in the first commit in order to
allow using the code in older Python versions.

In addition:
  * the use of `str` has been replaced by the compatible `unicode`.
  * the ValidationNode namedtuple has been re-expressed in syntax available
    in all supported Python versions.
  * the use of a custom InvalidEvent exception has been replace by using
    AvroTypeException
  * all specific single-type validators have been replaced by partials of
    _validate_type with a tuple of one or more type objects.

Fix typos and raise StopIteration as suggested in code review

Move the responsibility for validation to the Schema class.

Each schema subclass will be responsible for its own validation. This
simplifies the structure of io.py, removes the dict lookup of validators,
and reduces somewhat the repetition that was in io.py.

Move validators to a class attribute and update method code.

This makes things look a little bit cleaner than having the validators right in the midst of the method.

Add arg spec docs to docstring for base Schema class.

Clean up mistakes.

* Fix a docstring to be a more accurate statement of reality.
* Remove an unused import.
* Remove extra blank lines.
2 files changed
tree: d6123b121ba7135f0bae697e37e59acda8a4566e
  1. .github/
  2. .travis/
  3. doc/
  4. lang/
  5. share/
  6. .asf.yaml
  7. .editorconfig
  8. .gitignore
  9. .travis.yml
  10. .yamllint.yml
  11. BUILD.md
  12. build.sh
  13. composer.json
  14. DIST_README.txt
  15. LICENSE.txt
  16. NOTICE.txt
  17. pom.xml
  18. README.md
README.md

Build Status

Apache Avro™

Apache Avro™ is a data serialization system.

Learn more about Avro, please visit our website at:

https://avro.apache.org/

To contribute to Avro, please read:

https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute