docs-archive/apache-airflow-providers-apache-beam/1.0.1/_sources/_api/airflow/providers/apache/beam/operators/beam/index.rst.txt - airflow-site - Git at Google

 :mod:`airflow.providers.apache.beam.operators.beam`
 ===================================================

 .. py:module:: airflow.providers.apache.beam.operators.beam

 .. autoapi-nested-parse::

    This module contains Apache Beam operators.


 Module Contents
 ---------------

 .. py:class:: BeamDataflowMixin

    Helper class to store common, Dataflow specific logic for both
    :class:`~airflow.providers.apache.beam.operators.beam.BeamRunPythonPipelineOperator` and
    :class:`~airflow.providers.apache.beam.operators.beam.BeamRunJavaPipelineOperator`.

    .. attribute:: dataflow_hook
       :annotation: :Optional[DataflowHook]


    .. attribute:: dataflow_config
       :annotation: :Optional[DataflowConfiguration]


    .. method:: _set_dataflow(self, pipeline_options: dict, job_name_variable_key: Optional[str] = None)


    .. method:: __set_dataflow_hook(self)


    .. method:: __get_dataflow_job_name(self)


    .. method:: __get_dataflow_pipeline_options(self, pipeline_options: dict, job_name: str, job_name_key: Optional[str] = None)


    .. method:: __get_dataflow_process_callback(self)


 .. py:class:: BeamRunPythonPipelineOperator(*, py_file: str, runner: str = 'DirectRunner', default_pipeline_options: Optional[dict] = None, pipeline_options: Optional[dict] = None, py_interpreter: str = 'python3', py_options: Optional[List[str]] = None, py_requirements: Optional[List[str]] = None, py_system_site_packages: bool = False, gcp_conn_id: str = 'google_cloud_default', delegate_to: Optional[str] = None, dataflow_config: Optional[Union[DataflowConfiguration, dict]] = None, **kwargs)

    Bases: :class:`airflow.models.BaseOperator`, :class:`airflow.providers.apache.beam.operators.beam.BeamDataflowMixin`

    Launching Apache Beam pipelines written in Python. Note that both
    ``default_pipeline_options`` and ``pipeline_options`` will be merged to specify pipeline
    execution parameter, and ``default_pipeline_options`` is expected to save
    high-level options, for instances, project and zone information, which
    apply to all beam operators in the DAG.

    .. seealso::
        For more information on how to use this operator, take a look at the guide:
        :ref:`howto/operator:BeamRunPythonPipelineOperator`

    .. seealso::
        For more detail on Apache Beam have a look at the reference:
        https://beam.apache.org/documentation/

    :param py_file: Reference to the python Apache Beam pipeline file.py, e.g.,
        /some/local/file/path/to/your/python/pipeline/file. (templated)
    :type py_file: str
    :param runner: Runner on which pipeline will be run. By default "DirectRunner" is being used.
        Other possible options: DataflowRunner, SparkRunner, FlinkRunner.
        See: :class:`~providers.apache.beam.hooks.beam.BeamRunnerType`
        See: https://beam.apache.org/documentation/runners/capability-matrix/

    :type runner: str
    :param py_options: Additional python options, e.g., ["-m", "-v"].
    :type py_options: list[str]
    :param default_pipeline_options: Map of default pipeline options.
    :type default_pipeline_options: dict
    :param pipeline_options: Map of pipeline options.The key must be a dictionary.
        The value can contain different types:

        * If the value is None, the single option - ``--key`` (without value) will be added.
        * If the value is False, this option will be skipped
        * If the value is True, the single option - ``--key`` (without value) will be added.
        * If the value is list, the many options will be added for each key.
          If the value is ``['A', 'B']`` and the key is ``key`` then the ``--key=A --key-B`` options
          will be left
        * Other value types will be replaced with the Python textual representation.

        When defining labels (``labels`` option), you can also provide a dictionary.
    :type pipeline_options: dict
    :param py_interpreter: Python version of the beam pipeline.
        If None, this defaults to the python3.
        To track python versions supported by beam and related
        issues check: https://issues.apache.org/jira/browse/BEAM-1251
    :type py_interpreter: str
    :param py_requirements: Additional python package(s) to install.
        If a value is passed to this parameter, a new virtual environment has been created with
        additional packages installed.

        You could also install the apache_beam package if it is not installed on your system or you want
        to use a different version.
    :type py_requirements: List[str]
    :param py_system_site_packages: Whether to include system_site_packages in your virtualenv.
        See virtualenv documentation for more information.

        This option is only relevant if the ``py_requirements`` parameter is not None.
    :param gcp_conn_id: Optional.
        The connection ID to use connecting to Google Cloud Storage if python file is on GCS.
    :type gcp_conn_id: str
    :param delegate_to:  Optional.
        The account to impersonate using domain-wide delegation of authority,
        if any. For this to work, the service account making the request must have
        domain-wide delegation enabled.
    :type delegate_to: str
    :param dataflow_config: Dataflow configuration, used when runner type is set to DataflowRunner
    :type dataflow_config: Union[dict, providers.google.cloud.operators.dataflow.DataflowConfiguration]

    .. attribute:: template_fields
       :annotation: = ['py_file', 'runner', 'pipeline_options', 'default_pipeline_options', 'dataflow_config']


    .. attribute:: template_fields_renderers


    .. method:: execute(self, context)

       Execute the Apache Beam Pipeline.


    .. method:: on_kill(self)


 .. py:class:: BeamRunJavaPipelineOperator(*, jar: str, runner: str = 'DirectRunner', job_class: Optional[str] = None, default_pipeline_options: Optional[dict] = None, pipeline_options: Optional[dict] = None, gcp_conn_id: str = 'google_cloud_default', delegate_to: Optional[str] = None, dataflow_config: Optional[Union[DataflowConfiguration, dict]] = None, **kwargs)

    Bases: :class:`airflow.models.BaseOperator`, :class:`airflow.providers.apache.beam.operators.beam.BeamDataflowMixin`

    Launching Apache Beam pipelines written in Java.

    Note that both
    ``default_pipeline_options`` and ``pipeline_options`` will be merged to specify pipeline
    execution parameter, and ``default_pipeline_options`` is expected to save
    high-level pipeline_options, for instances, project and zone information, which
    apply to all Apache Beam operators in the DAG.

    .. seealso::
        For more information on how to use this operator, take a look at the guide:
        :ref:`howto/operator:BeamRunJavaPipelineOperator`

    .. seealso::
        For more detail on Apache Beam have a look at the reference:
        https://beam.apache.org/documentation/

    You need to pass the path to your jar file as a file reference with the ``jar``
    parameter, the jar needs to be a self executing jar (see documentation here:
    https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar).
    Use ``pipeline_options`` to pass on pipeline_options to your job.

    :param jar: The reference to a self executing Apache Beam jar (templated).
    :type jar: str
    :param runner: Runner on which pipeline will be run. By default "DirectRunner" is being used.
        See:
        https://beam.apache.org/documentation/runners/capability-matrix/
    :type runner: str
    :param job_class: The name of the Apache Beam pipeline class to be executed, it
        is often not the main class configured in the pipeline jar file.
    :type job_class: str
    :param default_pipeline_options: Map of default job pipeline_options.
    :type default_pipeline_options: dict
    :param pipeline_options: Map of job specific pipeline_options.The key must be a dictionary.
        The value can contain different types:

        * If the value is None, the single option - ``--key`` (without value) will be added.
        * If the value is False, this option will be skipped
        * If the value is True, the single option - ``--key`` (without value) will be added.
        * If the value is list, the many pipeline_options will be added for each key.
          If the value is ``['A', 'B']`` and the key is ``key`` then the ``--key=A --key-B`` pipeline_options
          will be left
        * Other value types will be replaced with the Python textual representation.

        When defining labels (``labels`` option), you can also provide a dictionary.
    :type pipeline_options: dict
    :param gcp_conn_id: The connection ID to use connecting to Google Cloud Storage if jar is on GCS
    :type gcp_conn_id: str
    :param delegate_to: The account to impersonate using domain-wide delegation of authority,
        if any. For this to work, the service account making the request must have
        domain-wide delegation enabled.
    :type delegate_to: str
    :param dataflow_config: Dataflow configuration, used when runner type is set to DataflowRunner
    :type dataflow_config: Union[dict, providers.google.cloud.operators.dataflow.DataflowConfiguration]

    .. attribute:: template_fields
       :annotation: = ['jar', 'runner', 'job_class', 'pipeline_options', 'default_pipeline_options', 'dataflow_config']


    .. attribute:: template_fields_renderers


    .. attribute:: ui_color
       :annotation: = #0273d4


    .. method:: execute(self, context)

       Execute the Apache Beam Pipeline.


    .. method:: on_kill(self)
	:mod:`airflow.providers.apache.beam.operators.beam`
	===================================================

	.. py:module:: airflow.providers.apache.beam.operators.beam

	.. autoapi-nested-parse::

	This module contains Apache Beam operators.



	Module Contents
	---------------

	.. py:class:: BeamDataflowMixin

	Helper class to store common, Dataflow specific logic for both
	:class:`~airflow.providers.apache.beam.operators.beam.BeamRunPythonPipelineOperator` and
	:class:`~airflow.providers.apache.beam.operators.beam.BeamRunJavaPipelineOperator`.

	.. attribute:: dataflow_hook
	:annotation: :Optional[DataflowHook]



	.. attribute:: dataflow_config
	:annotation: :Optional[DataflowConfiguration]




	.. method:: _set_dataflow(self, pipeline_options: dict, job_name_variable_key: Optional[str] = None)




	.. method:: __set_dataflow_hook(self)




	.. method:: __get_dataflow_job_name(self)




	.. method:: __get_dataflow_pipeline_options(self, pipeline_options: dict, job_name: str, job_name_key: Optional[str] = None)




	.. method:: __get_dataflow_process_callback(self)




	.. py:class:: BeamRunPythonPipelineOperator(, py_file: str, runner: str = 'DirectRunner', default_pipeline_options: Optional[dict] = None, pipeline_options: Optional[dict] = None, py_interpreter: str = 'python3', py_options: Optional[List[str]] = None, py_requirements: Optional[List[str]] = None, py_system_site_packages: bool = False, gcp_conn_id: str = 'google_cloud_default', delegate_to: Optional[str] = None, dataflow_config: Optional[Union[DataflowConfiguration, dict]] = None, *kwargs)

	Bases: :class:`airflow.models.BaseOperator`, :class:`airflow.providers.apache.beam.operators.beam.BeamDataflowMixin`

	Launching Apache Beam pipelines written in Python. Note that both
	``default_pipeline_options`` and ``pipeline_options`` will be merged to specify pipeline
	execution parameter, and ``default_pipeline_options`` is expected to save
	high-level options, for instances, project and zone information, which
	apply to all beam operators in the DAG.

	.. seealso::
	For more information on how to use this operator, take a look at the guide:
	:ref:`howto/operator:BeamRunPythonPipelineOperator`

	.. seealso::
	For more detail on Apache Beam have a look at the reference:
	https://beam.apache.org/documentation/

	:param py_file: Reference to the python Apache Beam pipeline file.py, e.g.,
	/some/local/file/path/to/your/python/pipeline/file. (templated)
	:type py_file: str
	:param runner: Runner on which pipeline will be run. By default "DirectRunner" is being used.
	Other possible options: DataflowRunner, SparkRunner, FlinkRunner.
	See: :class:`~providers.apache.beam.hooks.beam.BeamRunnerType`
	See: https://beam.apache.org/documentation/runners/capability-matrix/

	:type runner: str
	:param py_options: Additional python options, e.g., ["-m", "-v"].
	:type py_options: list[str]
	:param default_pipeline_options: Map of default pipeline options.
	:type default_pipeline_options: dict
	:param pipeline_options: Map of pipeline options.The key must be a dictionary.
	The value can contain different types:

	* If the value is None, the single option - ``--key`` (without value) will be added.
	* If the value is False, this option will be skipped
	* If the value is True, the single option - ``--key`` (without value) will be added.
	* If the value is list, the many options will be added for each key.
	If the value is ``['A', 'B']`` and the key is ``key`` then the ``--key=A --key-B`` options
	will be left
	* Other value types will be replaced with the Python textual representation.

	When defining labels (``labels`` option), you can also provide a dictionary.
	:type pipeline_options: dict
	:param py_interpreter: Python version of the beam pipeline.
	If None, this defaults to the python3.
	To track python versions supported by beam and related
	issues check: https://issues.apache.org/jira/browse/BEAM-1251
	:type py_interpreter: str
	:param py_requirements: Additional python package(s) to install.
	If a value is passed to this parameter, a new virtual environment has been created with
	additional packages installed.

	You could also install the apache_beam package if it is not installed on your system or you want
	to use a different version.
	:type py_requirements: List[str]
	:param py_system_site_packages: Whether to include system_site_packages in your virtualenv.
	See virtualenv documentation for more information.

	This option is only relevant if the ``py_requirements`` parameter is not None.
	:param gcp_conn_id: Optional.
	The connection ID to use connecting to Google Cloud Storage if python file is on GCS.
	:type gcp_conn_id: str
	:param delegate_to: Optional.
	The account to impersonate using domain-wide delegation of authority,
	if any. For this to work, the service account making the request must have
	domain-wide delegation enabled.
	:type delegate_to: str
	:param dataflow_config: Dataflow configuration, used when runner type is set to DataflowRunner
	:type dataflow_config: Union[dict, providers.google.cloud.operators.dataflow.DataflowConfiguration]

	.. attribute:: template_fields
	:annotation: = ['py_file', 'runner', 'pipeline_options', 'default_pipeline_options', 'dataflow_config']



	.. attribute:: template_fields_renderers





	.. method:: execute(self, context)

	Execute the Apache Beam Pipeline.




	.. method:: on_kill(self)




	.. py:class:: BeamRunJavaPipelineOperator(, jar: str, runner: str = 'DirectRunner', job_class: Optional[str] = None, default_pipeline_options: Optional[dict] = None, pipeline_options: Optional[dict] = None, gcp_conn_id: str = 'google_cloud_default', delegate_to: Optional[str] = None, dataflow_config: Optional[Union[DataflowConfiguration, dict]] = None, *kwargs)

	Bases: :class:`airflow.models.BaseOperator`, :class:`airflow.providers.apache.beam.operators.beam.BeamDataflowMixin`

	Launching Apache Beam pipelines written in Java.

	Note that both
	``default_pipeline_options`` and ``pipeline_options`` will be merged to specify pipeline
	execution parameter, and ``default_pipeline_options`` is expected to save
	high-level pipeline_options, for instances, project and zone information, which
	apply to all Apache Beam operators in the DAG.

	.. seealso::
	For more information on how to use this operator, take a look at the guide:
	:ref:`howto/operator:BeamRunJavaPipelineOperator`

	.. seealso::
	For more detail on Apache Beam have a look at the reference:
	https://beam.apache.org/documentation/

	You need to pass the path to your jar file as a file reference with the ``jar``
	parameter, the jar needs to be a self executing jar (see documentation here:
	https://beam.apache.org/documentation/runners/dataflow/#self-executing-jar).
	Use ``pipeline_options`` to pass on pipeline_options to your job.

	:param jar: The reference to a self executing Apache Beam jar (templated).
	:type jar: str
	:param runner: Runner on which pipeline will be run. By default "DirectRunner" is being used.
	See:
	https://beam.apache.org/documentation/runners/capability-matrix/
	:type runner: str
	:param job_class: The name of the Apache Beam pipeline class to be executed, it
	is often not the main class configured in the pipeline jar file.
	:type job_class: str
	:param default_pipeline_options: Map of default job pipeline_options.
	:type default_pipeline_options: dict
	:param pipeline_options: Map of job specific pipeline_options.The key must be a dictionary.
	The value can contain different types:

	* If the value is None, the single option - ``--key`` (without value) will be added.
	* If the value is False, this option will be skipped
	* If the value is True, the single option - ``--key`` (without value) will be added.
	* If the value is list, the many pipeline_options will be added for each key.
	If the value is ``['A', 'B']`` and the key is ``key`` then the ``--key=A --key-B`` pipeline_options
	will be left
	* Other value types will be replaced with the Python textual representation.

	When defining labels (``labels`` option), you can also provide a dictionary.
	:type pipeline_options: dict
	:param gcp_conn_id: The connection ID to use connecting to Google Cloud Storage if jar is on GCS
	:type gcp_conn_id: str
	:param delegate_to: The account to impersonate using domain-wide delegation of authority,
	if any. For this to work, the service account making the request must have
	domain-wide delegation enabled.
	:type delegate_to: str
	:param dataflow_config: Dataflow configuration, used when runner type is set to DataflowRunner
	:type dataflow_config: Union[dict, providers.google.cloud.operators.dataflow.DataflowConfiguration]

	.. attribute:: template_fields
	:annotation: = ['jar', 'runner', 'job_class', 'pipeline_options', 'default_pipeline_options', 'dataflow_config']



	.. attribute:: template_fields_renderers




	.. attribute:: ui_color
	:annotation: = #0273d4




	.. method:: execute(self, context)

	Execute the Apache Beam Pipeline.




	.. method:: on_kill(self)