tree: ff10a4b6d25acc9ae4a3a3c94509c332966c50bd [path history] [tgz]
  1. examples/
  2. extended_tests/
  3. tests/
  4. __init__.py
  5. cache_provider_artifacts.py
  6. conftest.py
  7. generate_yaml_docs.py
  8. inline_python.md
  9. integration_tests.py
  10. json_utils.py
  11. json_utils_test.py
  12. main.py
  13. main_test.py
  14. options.py
  15. pipeline.schema.yaml
  16. programming_guide_test.py
  17. README.md
  18. readme_test.py
  19. standard_io.yaml
  20. standard_providers.yaml
  21. yaml_combine.md
  22. yaml_combine.py
  23. yaml_combine_test.py
  24. yaml_enrichment.py
  25. yaml_enrichment_test.py
  26. yaml_errors.md
  27. yaml_errors.py
  28. yaml_io.py
  29. yaml_io_test.py
  30. yaml_join.py
  31. yaml_join_test.py
  32. yaml_mapping.md
  33. yaml_mapping.py
  34. yaml_mapping_test.py
  35. yaml_ml.py
  36. yaml_ml_test.py
  37. yaml_provider.py
  38. yaml_provider_unit_test.py
  39. yaml_specifiable.py
  40. yaml_specifiable_test.py
  41. yaml_testing.py
  42. yaml_testing_test.py
  43. yaml_transform.py
  44. yaml_transform_scope_test.py
  45. yaml_transform_test.py
  46. yaml_transform_unit_test.py
  47. yaml_udf_test.py
  48. yaml_utils.py
  49. yaml_utils_test.py
sdks/python/apache_beam/yaml/README.md

Beam YAML API

The Beam YAML API provides a simple declarative syntax for describing pipelines that does not require coding experience or learning how to use an SDK—any text editor will do. Some installation may be required to actually execute a pipeline, but we envision various services (such as Dataflow) to accept yaml pipelines directly obviating the need for even that in the future. We also anticipate the ability to generate code directly from these higher-level yaml descriptions, should one want to graduate to a full Beam SDK (and possibly the other direction as well as far as possible).

Though we intend this syntax to be easily authored (and read) directly by humans, this may also prove a useful intermediate representation for tools to use as well, either as output (e.g. a pipeline authoring GUI) or consumption (e.g. a lineage analysis tool) and expect it to be more easily manipulated and semantically meaningful than the Beam protos themselves (which concern themselves more with execution).

More details

User-facing documentation for Beam YAML has moved to the main Beam site at https://beam.apache.org/documentation/sdks/yaml/

For information about contributing to Beam YAML see https://docs.google.com/document/d/19zswPXxxBxlAUmswYPUtSc-IVAu1qWvpjo1ZSDMRbu0

Integration Tests

The integration_tests.py dynamically creates test methods based on the yaml files provided in the tests and extended_tests directories and runs the pipeline. It also contains context managers for setting up test environments for both precommit tests (e.g. tests folder) and postcommit tests (e.g. extended_tests folder).

To run the precommit tests:

pytest -v integration_tests.py

or

pytest -v integration_tests.py::<yaml_file_name_without_extension>Test

To run some of the postcommit tests, for example:

pytest -v integration_tests.py --test_files_dir="extended_tests/messaging"