| :py:mod:`airflow.providers.apache.beam.hooks.beam` |
| ================================================== |
| |
| .. py:module:: airflow.providers.apache.beam.hooks.beam |
| |
| .. autoapi-nested-parse:: |
| |
| This module contains a Apache Beam Hook. |
| |
| |
| |
| Module Contents |
| --------------- |
| |
| Classes |
| ~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.beam.hooks.beam.BeamRunnerType |
| airflow.providers.apache.beam.hooks.beam.BeamCommandRunner |
| airflow.providers.apache.beam.hooks.beam.BeamHook |
| |
| |
| |
| Functions |
| ~~~~~~~~~ |
| |
| .. autoapisummary:: |
| |
| airflow.providers.apache.beam.hooks.beam.beam_options_to_args |
| |
| |
| |
| .. py:class:: BeamRunnerType |
| |
| Helper class for listing runner types. |
| For more information about runners see: |
| https://beam.apache.org/documentation/ |
| |
| .. py:attribute:: DataflowRunner |
| :annotation: = DataflowRunner |
| |
| |
| |
| .. py:attribute:: DirectRunner |
| :annotation: = DirectRunner |
| |
| |
| |
| .. py:attribute:: SparkRunner |
| :annotation: = SparkRunner |
| |
| |
| |
| .. py:attribute:: FlinkRunner |
| :annotation: = FlinkRunner |
| |
| |
| |
| .. py:attribute:: SamzaRunner |
| :annotation: = SamzaRunner |
| |
| |
| |
| .. py:attribute:: NemoRunner |
| :annotation: = NemoRunner |
| |
| |
| |
| .. py:attribute:: JetRunner |
| :annotation: = JetRunner |
| |
| |
| |
| .. py:attribute:: Twister2Runner |
| :annotation: = Twister2Runner |
| |
| |
| |
| |
| .. py:function:: beam_options_to_args(options) |
| |
| Returns a formatted pipeline options from a dictionary of arguments |
| |
| The logic of this method should be compatible with Apache Beam: |
| https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/ |
| apache_beam/options/pipeline_options.py#L230-L251 |
| |
| :param options: Dictionary with options |
| :return: List of arguments |
| :rtype: List[str] |
| |
| |
| .. py:class:: BeamCommandRunner(cmd, process_line_callback = None, working_directory = None) |
| |
| Bases: :py:obj:`airflow.utils.log.logging_mixin.LoggingMixin` |
| |
| Class responsible for running pipeline command in subprocess |
| |
| :param cmd: Parts of the command to be run in subprocess |
| :param process_line_callback: Optional callback which can be used to process |
| stdout and stderr to detect job id |
| :param working_directory: Working directory |
| |
| .. py:method:: wait_for_done() |
| |
| Waits for Apache Beam pipeline to complete. |
| |
| |
| |
| .. py:class:: BeamHook(runner) |
| |
| Bases: :py:obj:`airflow.hooks.base.BaseHook` |
| |
| Hook for Apache Beam. |
| |
| All the methods in the hook where project_id is used must be called with |
| keyword arguments rather than positional. |
| |
| :param runner: Runner type |
| |
| .. py:method:: start_python_pipeline(variables, py_file, py_options, py_interpreter = 'python3', py_requirements = None, py_system_site_packages = False, process_line_callback = None) |
| |
| Starts Apache Beam python pipeline. |
| |
| :param variables: Variables passed to the pipeline. |
| :param py_file: Path to the python file to execute. |
| :param py_options: Additional options. |
| :param py_interpreter: Python version of the Apache Beam pipeline. |
| If None, this defaults to the python3. |
| To track python versions supported by beam and related |
| issues check: https://issues.apache.org/jira/browse/BEAM-1251 |
| :param py_requirements: Additional python package(s) to install. |
| If a value is passed to this parameter, a new virtual environment has been created with |
| additional packages installed. |
| |
| You could also install the apache-beam package if it is not installed on your system or you want |
| to use a different version. |
| :param py_system_site_packages: Whether to include system_site_packages in your virtualenv. |
| See virtualenv documentation for more information. |
| |
| This option is only relevant if the ``py_requirements`` parameter is not None. |
| :param process_line_callback: (optional) Callback that can be used to process each line of |
| the stdout and stderr file descriptors. |
| |
| |
| .. py:method:: start_java_pipeline(variables, jar, job_class = None, process_line_callback = None) |
| |
| Starts Apache Beam Java pipeline. |
| |
| :param variables: Variables passed to the job. |
| :param jar: Name of the jar for the pipeline |
| :param job_class: Name of the java class for the pipeline. |
| :param process_line_callback: (optional) Callback that can be used to process each line of |
| the stdout and stderr file descriptors. |
| |
| |
| .. py:method:: start_go_pipeline(variables, go_file, process_line_callback = None, should_init_module = False) |
| |
| Starts Apache Beam Go pipeline. |
| |
| :param variables: Variables passed to the job. |
| :param go_file: Path to the Go file with your beam pipeline. |
| :param go_file: |
| :param process_line_callback: (optional) Callback that can be used to process each line of |
| the stdout and stderr file descriptors. |
| :param should_init_module: If False (default), will just execute a `go run` command. If True, will |
| init a module and dependencies with a ``go mod init`` and ``go mod tidy``, useful when pulling |
| source with GCSHook. |
| :return: |
| |
| |
| |