blob: ce9b538d9d55a503a362d6aed780144aba930b48 [file] [log] [blame]
:mod:`airflow.providers.apache.beam.hooks.beam`
===============================================
.. py:module:: airflow.providers.apache.beam.hooks.beam
.. autoapi-nested-parse::
This module contains a Apache Beam Hook.
Module Contents
---------------
.. py:class:: BeamRunnerType
Helper class for listing runner types.
For more information about runners see:
https://beam.apache.org/documentation/
.. attribute:: DataflowRunner
:annotation: = DataflowRunner
.. attribute:: DirectRunner
:annotation: = DirectRunner
.. attribute:: SparkRunner
:annotation: = SparkRunner
.. attribute:: FlinkRunner
:annotation: = FlinkRunner
.. attribute:: SamzaRunner
:annotation: = SamzaRunner
.. attribute:: NemoRunner
:annotation: = NemoRunner
.. attribute:: JetRunner
:annotation: = JetRunner
.. attribute:: Twister2Runner
:annotation: = Twister2Runner
.. function:: beam_options_to_args(options: dict) -> List[str]
Returns a formatted pipeline options from a dictionary of arguments
The logic of this method should be compatible with Apache Beam:
https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/
apache_beam/options/pipeline_options.py#L230-L251
:param options: Dictionary with options
:type options: dict
:return: List of arguments
:rtype: List[str]
.. py:class:: BeamCommandRunner(cmd: List[str], process_line_callback: Optional[Callable[[str], None]] = None)
Bases: :class:`airflow.utils.log.logging_mixin.LoggingMixin`
Class responsible for running pipeline command in subprocess
:param cmd: Parts of the command to be run in subprocess
:type cmd: List[str]
:param process_line_callback: Optional callback which can be used to process
stdout and stderr to detect job id
:type process_line_callback: Optional[Callable[[str], None]]
.. method:: _process_fd(self, fd)
Prints output to logs.
:param fd: File descriptor.
.. method:: wait_for_done(self)
Waits for Apache Beam pipeline to complete.
.. py:class:: BeamHook(runner: str)
Bases: :class:`airflow.hooks.base.BaseHook`
Hook for Apache Beam.
All the methods in the hook where project_id is used must be called with
keyword arguments rather than positional.
:param runner: Runner type
:type runner: str
.. method:: _start_pipeline(self, variables: dict, command_prefix: List[str], process_line_callback: Optional[Callable[[str], None]] = None)
.. method:: start_python_pipeline(self, variables: dict, py_file: str, py_options: List[str], py_interpreter: str = 'python3', py_requirements: Optional[List[str]] = None, py_system_site_packages: bool = False, process_line_callback: Optional[Callable[[str], None]] = None)
Starts Apache Beam python pipeline.
:param variables: Variables passed to the pipeline.
:type variables: Dict
:param py_options: Additional options.
:type py_options: List[str]
:param py_interpreter: Python version of the Apache Beam pipeline.
If None, this defaults to the python3.
To track python versions supported by beam and related
issues check: https://issues.apache.org/jira/browse/BEAM-1251
:type py_interpreter: str
:param py_requirements: Additional python package(s) to install.
If a value is passed to this parameter, a new virtual environment has been created with
additional packages installed.
You could also install the apache-beam package if it is not installed on your system or you want
to use a different version.
:type py_requirements: List[str]
:param py_system_site_packages: Whether to include system_site_packages in your virtualenv.
See virtualenv documentation for more information.
This option is only relevant if the ``py_requirements`` parameter is not None.
:type py_system_site_packages: bool
:param on_new_job_id_callback: Callback called when the job ID is known.
:type on_new_job_id_callback: callable
.. method:: start_java_pipeline(self, variables: dict, jar: str, job_class: Optional[str] = None, process_line_callback: Optional[Callable[[str], None]] = None)
Starts Apache Beam Java pipeline.
:param variables: Variables passed to the job.
:type variables: dict
:param jar: Name of the jar for the pipeline
:type job_class: str
:param job_class: Name of the java class for the pipeline.
:type job_class: str