blob: e17a08538364ecba30b91b2a140d236985d2db33 [file] [log] [blame]
:py:mod:`airflow.providers.apache.beam.hooks.beam`
==================================================
.. py:module:: airflow.providers.apache.beam.hooks.beam
.. autoapi-nested-parse::
This module contains a Apache Beam Hook.
Module Contents
---------------
Classes
~~~~~~~
.. autoapisummary::
airflow.providers.apache.beam.hooks.beam.BeamRunnerType
airflow.providers.apache.beam.hooks.beam.BeamCommandRunner
airflow.providers.apache.beam.hooks.beam.BeamHook
Functions
~~~~~~~~~
.. autoapisummary::
airflow.providers.apache.beam.hooks.beam.beam_options_to_args
.. py:class:: BeamRunnerType
Helper class for listing runner types.
For more information about runners see:
https://beam.apache.org/documentation/
.. py:attribute:: DataflowRunner
:annotation: = DataflowRunner
.. py:attribute:: DirectRunner
:annotation: = DirectRunner
.. py:attribute:: SparkRunner
:annotation: = SparkRunner
.. py:attribute:: FlinkRunner
:annotation: = FlinkRunner
.. py:attribute:: SamzaRunner
:annotation: = SamzaRunner
.. py:attribute:: NemoRunner
:annotation: = NemoRunner
.. py:attribute:: JetRunner
:annotation: = JetRunner
.. py:attribute:: Twister2Runner
:annotation: = Twister2Runner
.. py:function:: beam_options_to_args(options)
Returns a formatted pipeline options from a dictionary of arguments
The logic of this method should be compatible with Apache Beam:
https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/
apache_beam/options/pipeline_options.py#L230-L251
:param options: Dictionary with options
:return: List of arguments
:rtype: List[str]
.. py:class:: BeamCommandRunner(cmd, process_line_callback = None, working_directory = None)
Bases: :py:obj:`airflow.utils.log.logging_mixin.LoggingMixin`
Class responsible for running pipeline command in subprocess
:param cmd: Parts of the command to be run in subprocess
:param process_line_callback: Optional callback which can be used to process
stdout and stderr to detect job id
:param working_directory: Working directory
.. py:method:: wait_for_done()
Waits for Apache Beam pipeline to complete.
.. py:class:: BeamHook(runner)
Bases: :py:obj:`airflow.hooks.base.BaseHook`
Hook for Apache Beam.
All the methods in the hook where project_id is used must be called with
keyword arguments rather than positional.
:param runner: Runner type
.. py:method:: start_python_pipeline(variables, py_file, py_options, py_interpreter = 'python3', py_requirements = None, py_system_site_packages = False, process_line_callback = None)
Starts Apache Beam python pipeline.
:param variables: Variables passed to the pipeline.
:param py_file: Path to the python file to execute.
:param py_options: Additional options.
:param py_interpreter: Python version of the Apache Beam pipeline.
If None, this defaults to the python3.
To track python versions supported by beam and related
issues check: https://issues.apache.org/jira/browse/BEAM-1251
:param py_requirements: Additional python package(s) to install.
If a value is passed to this parameter, a new virtual environment has been created with
additional packages installed.
You could also install the apache-beam package if it is not installed on your system or you want
to use a different version.
:param py_system_site_packages: Whether to include system_site_packages in your virtualenv.
See virtualenv documentation for more information.
This option is only relevant if the ``py_requirements`` parameter is not None.
:param process_line_callback: (optional) Callback that can be used to process each line of
the stdout and stderr file descriptors.
.. py:method:: start_java_pipeline(variables, jar, job_class = None, process_line_callback = None)
Starts Apache Beam Java pipeline.
:param variables: Variables passed to the job.
:param jar: Name of the jar for the pipeline
:param job_class: Name of the java class for the pipeline.
:param process_line_callback: (optional) Callback that can be used to process each line of
the stdout and stderr file descriptors.
.. py:method:: start_go_pipeline(variables, go_file, process_line_callback = None, should_init_module = False)
Starts Apache Beam Go pipeline.
:param variables: Variables passed to the job.
:param go_file: Path to the Go file with your beam pipeline.
:param go_file:
:param process_line_callback: (optional) Callback that can be used to process each line of
the stdout and stderr file descriptors.
:param should_init_module: If False (default), will just execute a `go run` command. If True, will
init a module and dependencies with a ``go mod init`` and ``go mod tidy``, useful when pulling
source with GCSHook.
:return: