The Beam YAML API provides a simple declarative syntax for describing pipelines that does not require coding experience or learning how to use an SDK—any text editor will do. Some installation may be required to actually execute a pipeline, but we envision various services (such as Dataflow) to accept yaml pipelines directly obviating the need for even that in the future. We also anticipate the ability to generate code directly from these higher-level yaml descriptions, should one want to graduate to a full Beam SDK (and possibly the other direction as well as far as possible).
Though we intend this syntax to be easily authored (and read) directly by humans, this may also prove a useful intermediate representation for tools to use as well, either as output (e.g. a pipeline authoring GUI) or consumption (e.g. a lineage analysis tool) and expect it to be more easily manipulated and semantically meaningful than the Beam protos themselves (which concern themselves more with execution).
User-facing documentation for Beam YAML has moved to the main Beam site at https://beam.apache.org/documentation/sdks/yaml/
For information about contributing to Beam YAML see https://docs.google.com/document/d/19zswPXxxBxlAUmswYPUtSc-IVAu1qWvpjo1ZSDMRbu0
The integration_tests.py dynamically creates test methods based on the yaml files provided in the tests
and extended_tests
directories and runs the pipeline. It also contains context managers for setting up test environments for both precommit tests (e.g. tests
folder) and postcommit tests (e.g. extended_tests
folder).
To run the precommit tests:
pytest -v integration_tests.py or pytest -v integration_tests.py::<yaml_file_name_without_extension>Test
To run some of the postcommit tests, for example:
pytest -v integration_tests.py --test_files_dir="extended_tests/messaging"