flink-python/docs/user_guide/execution_mode.rst - flink - Git at Google

 .. ################################################################################
      Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
     limitations under the License.
    ################################################################################

 ==============
 Execution Mode
 ==============

 The Python API supports different runtime execution modes from which you can choose depending on the
 requirements of your use case and the characteristics of your job. The Python runtime execution mode
 defines how the Python user-defined functions will be executed.

 Prior to release-1.15, there is the only execution mode called ``PROCESS`` execution mode. The ``PROCESS``
 mode means that the Python user-defined functions will be executed in separate Python processes.

 In release-1.15, it has introduced a new execution mode called ``THREAD`` execution mode. The ``THREAD``
 mode means that the Python user-defined functions will be executed in JVM.

 **NOTE:** Multiple Python user-defined functions running in the same JVM are still affected by GIL.

 When can/should I use THREAD execution mode?
 =============================================

 The purpose of the introduction of ``THREAD`` mode is to overcome the overhead of serialization/deserialization
 and network communication introduced of inter-process communication in the ``PROCESS`` mode.
 So if performance is not your concern, or the computing logic of your Python user-defined functions is the performance bottleneck of the job,
 ``PROCESS`` mode will be the best choice as ``PROCESS`` mode provides the best isolation compared to ``THREAD`` mode.

 Configuring Python execution mode
 ==================================

 The execution mode can be configured via the ``python.execution-mode`` setting.
 There are two possible values:

 - ``PROCESS``: The Python user-defined functions will be executed in separate Python process. (default)
 - ``THREAD``: The Python user-defined functions will be executed in JVM.

 You could specify the execution mode in Python Table API or Python DataStream API jobs as following:

 .. code-block:: python

     ## Python Table API
     # Specify `PROCESS` mode
     table_env.get_config().set("python.execution-mode", "process")

     # Specify `THREAD` mode
     table_env.get_config().set("python.execution-mode", "thread")


     ## Python DataStream API

     config = Configuration()

     # Specify `PROCESS` mode
     config.set_string("python.execution-mode", "process")

     # Specify `THREAD` mode
     config.set_string("python.execution-mode", "thread")

     # Create the corresponding StreamExecutionEnvironment
     env = StreamExecutionEnvironment.get_execution_environment(config)

 Supported Cases
 ===============

 Python Table API
 ----------------

 The following table shows where the ``THREAD`` execution mode is supported in Python Table API.

 .. list-table::
     :header-rows: 1

     * - UDFs
       - ``PROCESS``
       - ``THREAD``
     * - Python UDF
       - Yes
       - Yes
     * - Python UDTF
       - Yes
       - Yes
     * - Python UDAF
       - Yes
       - No
     * - Pandas UDF & Pandas UDAF
       - Yes
       - No

 Python DataStream API
 ---------------------

 The following table shows where the ``PROCESS`` execution mode and the ``THREAD`` execution mode are supported in Python DataStream API.

 .. list-table::
     :header-rows: 1

     * - Operators
       - ``PROCESS``
       - ``THREAD``
     * - Map
       - Yes
       - Yes
     * - FlatMap
       - Yes
       - Yes
     * - Filter
       - Yes
       - Yes
     * - Reduce
       - Yes
       - Yes
     * - Union
       - Yes
       - Yes
     * - Connect
       - Yes
       - Yes
     * - CoMap
       - Yes
       - Yes
     * - CoFlatMap
       - Yes
       - Yes
     * - Process Function
       - Yes
       - Yes
     * - Window Apply
       - Yes
       - Yes
     * - Window Aggregate
       - Yes
       - Yes
     * - Window Reduce
       - Yes
       - Yes
     * - Window Process
       - Yes
       - Yes
     * - Side Output
       - Yes
       - Yes
     * - State
       - Yes
       - Yes
     * - Iterate
       - No
       - No
     * - Window CoGroup
       - No
       - No
     * - Window Join
       - No
       - No
     * - Interval Join
       - No
       - No
     * - Async I/O
       - No
       - No

 .. note::
     Currently, it still doesn't support to execute Python UDFs in ``THREAD`` execution mode in all places.
     It will fall back to ``PROCESS`` execution mode in these cases. So it may happen that you configure a job
     to execute in ``THREAD`` execution mode, however, it's actually executed in ``PROCESS`` execution mode.

 Execution Behavior
 ==================

 This section provides an overview of the execution behavior of ``THREAD`` execution mode and contrasts
 they with ``PROCESS`` execution mode. For more details, please refer to the FLIP that introduced this feature:
 `FLIP-206 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode>`_.

 PROCESS Execution Mode
 ----------------------

 In ``PROCESS`` execution mode, the Python user-defined functions will be executed in separate Python Worker process.
 The Java operator process communicates with the Python worker process using various Grpc services.

 THREAD Execution Mode
 ---------------------

 In ``THREAD`` execution mode, the Python user-defined functions will be executed in the same process
 as Java operators. PyFlink takes use of third part library `PEMJA <https://github.com/alibaba/pemja>`_
 to embed Python in Java Application.
	.. ################################################################################
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	################################################################################

	==============
	Execution Mode
	==============

	The Python API supports different runtime execution modes from which you can choose depending on the
	requirements of your use case and the characteristics of your job. The Python runtime execution mode
	defines how the Python user-defined functions will be executed.

	Prior to release-1.15, there is the only execution mode called ``PROCESS`` execution mode. The ``PROCESS``
	mode means that the Python user-defined functions will be executed in separate Python processes.

	In release-1.15, it has introduced a new execution mode called ``THREAD`` execution mode. The ``THREAD``
	mode means that the Python user-defined functions will be executed in JVM.

	NOTE: Multiple Python user-defined functions running in the same JVM are still affected by GIL.

	When can/should I use THREAD execution mode?
	=============================================

	The purpose of the introduction of ``THREAD`` mode is to overcome the overhead of serialization/deserialization
	and network communication introduced of inter-process communication in the ``PROCESS`` mode.
	So if performance is not your concern, or the computing logic of your Python user-defined functions is the performance bottleneck of the job,
	``PROCESS`` mode will be the best choice as ``PROCESS`` mode provides the best isolation compared to ``THREAD`` mode.

	Configuring Python execution mode
	==================================

	The execution mode can be configured via the ``python.execution-mode`` setting.
	There are two possible values:

	- ``PROCESS``: The Python user-defined functions will be executed in separate Python process. (default)
	- ``THREAD``: The Python user-defined functions will be executed in JVM.

	You could specify the execution mode in Python Table API or Python DataStream API jobs as following:

	.. code-block:: python

	## Python Table API
	# Specify `PROCESS` mode
	table_env.get_config().set("python.execution-mode", "process")

	# Specify `THREAD` mode
	table_env.get_config().set("python.execution-mode", "thread")


	## Python DataStream API

	config = Configuration()

	# Specify `PROCESS` mode
	config.set_string("python.execution-mode", "process")

	# Specify `THREAD` mode
	config.set_string("python.execution-mode", "thread")

	# Create the corresponding StreamExecutionEnvironment
	env = StreamExecutionEnvironment.get_execution_environment(config)

	Supported Cases
	===============

	Python Table API
	----------------

	The following table shows where the ``THREAD`` execution mode is supported in Python Table API.

	.. list-table::
	:header-rows: 1

	* - UDFs
	- ``PROCESS``
	- ``THREAD``
	* - Python UDF
	- Yes
	- Yes
	* - Python UDTF
	- Yes
	- Yes
	* - Python UDAF
	- Yes
	- No
	* - Pandas UDF & Pandas UDAF
	- Yes
	- No

	Python DataStream API
	---------------------

	The following table shows where the ``PROCESS`` execution mode and the ``THREAD`` execution mode are supported in Python DataStream API.

	.. list-table::
	:header-rows: 1

	* - Operators
	- ``PROCESS``
	- ``THREAD``
	* - Map
	- Yes
	- Yes
	* - FlatMap
	- Yes
	- Yes
	* - Filter
	- Yes
	- Yes
	* - Reduce
	- Yes
	- Yes
	* - Union
	- Yes
	- Yes
	* - Connect
	- Yes
	- Yes
	* - CoMap
	- Yes
	- Yes
	* - CoFlatMap
	- Yes
	- Yes
	* - Process Function
	- Yes
	- Yes
	* - Window Apply
	- Yes
	- Yes
	* - Window Aggregate
	- Yes
	- Yes
	* - Window Reduce
	- Yes
	- Yes
	* - Window Process
	- Yes
	- Yes
	* - Side Output
	- Yes
	- Yes
	* - State
	- Yes
	- Yes
	* - Iterate
	- No
	- No
	* - Window CoGroup
	- No
	- No
	* - Window Join
	- No
	- No
	* - Interval Join
	- No
	- No
	* - Async I/O
	- No
	- No

	.. note::
	Currently, it still doesn't support to execute Python UDFs in ``THREAD`` execution mode in all places.
	It will fall back to ``PROCESS`` execution mode in these cases. So it may happen that you configure a job
	to execute in ``THREAD`` execution mode, however, it's actually executed in ``PROCESS`` execution mode.

	Execution Behavior
	==================

	This section provides an overview of the execution behavior of ``THREAD`` execution mode and contrasts
	they with ``PROCESS`` execution mode. For more details, please refer to the FLIP that introduced this feature:
	`FLIP-206 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-206%3A+Support+PyFlink+Runtime+Execution+in+Thread+Mode>`_.

	PROCESS Execution Mode
	----------------------

	In ``PROCESS`` execution mode, the Python user-defined functions will be executed in separate Python Worker process.
	The Java operator process communicates with the Python worker process using various Grpc services.

	THREAD Execution Mode
	---------------------

	In ``THREAD`` execution mode, the Python user-defined functions will be executed in the same process
	as Java operators. PyFlink takes use of third part library `PEMJA <https://github.com/alibaba/pemja>`_
	to embed Python in Java Application.