Rebuild website

commit: 1e216e8661e121dfd4f57c19c66242699d4736ec [log] [tgz]
author: Dian Fu <dianfu@apache.org> Tue Aug 04 14:22:33 2020 +0800
committer: Dian Fu <dianfu@apache.org> Tue Aug 04 14:22:50 2020 +0800
tree: 698280dcfa1c616f6eb8a8ef09763da74205ad14
parent: c384ea86b4fae6c2df81fe00bcc7d198f1ea9a2d [diff]
diff --git a/content/2020/07/28/pyflink-pandas-udf-support-flink.html b/content/2020/07/28/pyflink-pandas-udf-support-flink.html
new file mode 100644
index 0000000..05b2e6e
--- /dev/null
+++ b/content/2020/07/28/pyflink-pandas-udf-support-flink.html

@@ -0,0 +1,463 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
+    <title>Apache Flink: PyFlink: The integration of Pandas into PyFlink</title>
+    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
+    <link rel="icon" href="/favicon.ico" type="image/x-icon">
+
+    <!-- Bootstrap -->
+    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
+    <link rel="stylesheet" href="/css/flink.css">
+    <link rel="stylesheet" href="/css/syntax.css">
+
+    <!-- Blog RSS feed -->
+    <link href="/blog/feed.xml" rel="alternate" type="application/rss+xml" title="Apache Flink Blog: RSS feed" />
+
+    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
+    <!-- We need to load Jquery in the header for custom google analytics event tracking-->
+    <script src="/js/jquery.min.js"></script>
+
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
+      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
+    <![endif]-->
+  </head>
+  <body>  
+    
+
+    <!-- Main content. -->
+    <div class="container">
+    <div class="row">
+
+      
+     <div id="sidebar" class="col-sm-3">
+        
+
+<!-- Top navbar. -->
+    <nav class="navbar navbar-default">
+        <!-- The logo. -->
+        <div class="navbar-header">
+          <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <div class="navbar-logo">
+            <a href="/">
+              <img alt="Apache Flink" src="/img/flink-header-logo.svg" width="147px" height="73px">
+            </a>
+          </div>
+        </div><!-- /.navbar-header -->
+
+        <!-- The navigation links. -->
+        <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
+          <ul class="nav navbar-nav navbar-main">
+
+            <!-- First menu section explains visitors what Flink is -->
+
+            <!-- What is Stream Processing? -->
+            <!--
+            <li><a href="/streamprocessing1.html">What is Stream Processing?</a></li>
+            -->
+
+            <!-- What is Flink? -->
+            <li><a href="/flink-architecture.html">What is Apache Flink?</a></li>
+
+            
+            <ul class="nav navbar-nav navbar-subnav">
+              <li >
+                  <a href="/flink-architecture.html">Architecture</a>
+              </li>
+              <li >
+                  <a href="/flink-applications.html">Applications</a>
+              </li>
+              <li >
+                  <a href="/flink-operations.html">Operations</a>
+              </li>
+            </ul>
+            
+
+            <!-- What is Stateful Functions? -->
+
+            <li><a href="/stateful-functions.html">What is Stateful Functions?</a></li>
+
+            <!-- Use cases -->
+            <li><a href="/usecases.html">Use Cases</a></li>
+
+            <!-- Powered by -->
+            <li><a href="/poweredby.html">Powered By</a></li>
+
+
+            &nbsp;
+            <!-- Second menu section aims to support Flink users -->
+
+            <!-- Downloads -->
+            <li><a href="/downloads.html">Downloads</a></li>
+
+            <!-- Getting Started -->
+            <li class="dropdown">
+              <a class="dropdown-toggle" data-toggle="dropdown" href="#">Getting Started<span class="caret"></span></a>
+              <ul class="dropdown-menu">
+                <li><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/getting-started/index.html" target="_blank">With Flink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+                <li><a href="https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/getting-started/project-setup.html" target="_blank">With Flink Stateful Functions <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+                <li><a href="/training.html">Training Course</a></li>
+              </ul>
+            </li>
+
+            <!-- Documentation -->
+            <li class="dropdown">
+              <a class="dropdown-toggle" data-toggle="dropdown" href="#">Documentation<span class="caret"></span></a>
+              <ul class="dropdown-menu">
+                <li><a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11" target="_blank">Flink 1.11 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+                <li><a href="https://ci.apache.org/projects/flink/flink-docs-master" target="_blank">Flink Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+                <li><a href="https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1" target="_blank">Flink Stateful Functions 2.1 (Latest stable release) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+                <li><a href="https://ci.apache.org/projects/flink/flink-statefun-docs-master" target="_blank">Flink Stateful Functions Master (Latest Snapshot) <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+              </ul>
+            </li>
+
+            <!-- getting help -->
+            <li><a href="/gettinghelp.html">Getting Help</a></li>
+
+            <!-- Blog -->
+            <li><a href="/blog/"><b>Flink Blog</b></a></li>
+
+
+            <!-- Flink-packages -->
+            <li>
+              <a href="https://flink-packages.org" target="_blank">flink-packages.org <small><span class="glyphicon glyphicon-new-window"></span></small></a>
+            </li>
+            &nbsp;
+
+            <!-- Third menu section aim to support community and contributors -->
+
+            <!-- Community -->
+            <li><a href="/community.html">Community &amp; Project Info</a></li>
+
+            <!-- Roadmap -->
+            <li><a href="/roadmap.html">Roadmap</a></li>
+
+            <!-- Contribute -->
+            <li><a href="/contributing/how-to-contribute.html">How to Contribute</a></li>
+            
+
+            <!-- GitHub -->
+            <li>
+              <a href="https://github.com/apache/flink" target="_blank">Flink on GitHub <small><span class="glyphicon glyphicon-new-window"></span></small></a>
+            </li>
+
+            &nbsp;
+
+            <!-- Language Switcher -->
+            <li>
+              
+                
+                  <a href="/zh/2020/07/28/pyflink-pandas-udf-support-flink.html">中文版</a>
+                
+              
+            </li>
+
+          </ul>
+
+          <ul class="nav navbar-nav navbar-bottom">
+          <hr />
+
+            <!-- Twitter -->
+            <li><a href="https://twitter.com/apacheflink" target="_blank">@ApacheFlink <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+
+            <!-- Visualizer -->
+            <li class=" hidden-md hidden-sm"><a href="/visualizer/" target="_blank">Plan Visualizer <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+
+          <hr />
+
+            <li><a href="https://apache.org" target="_blank">Apache Software Foundation <small><span class="glyphicon glyphicon-new-window"></span></small></a></li>
+
+            <li>
+              <style>
+                .smalllinks:link {
+                  display: inline-block !important; background: none; padding-top: 0px; padding-bottom: 0px; padding-right: 0px; min-width: 75px;
+                }
+              </style>
+
+              <a class="smalllinks" href="https://www.apache.org/licenses/" target="_blank">License</a> <small><span class="glyphicon glyphicon-new-window"></span></small>
+
+              <a class="smalllinks" href="https://www.apache.org/security/" target="_blank">Security</a> <small><span class="glyphicon glyphicon-new-window"></span></small>
+
+              <a class="smalllinks" href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Donate</a> <small><span class="glyphicon glyphicon-new-window"></span></small>
+
+              <a class="smalllinks" href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> <small><span class="glyphicon glyphicon-new-window"></span></small>
+            </li>
+
+          </ul>
+        </div><!-- /.navbar-collapse -->
+    </nav>
+
+      </div>
+      <div class="col-sm-9">
+      <div class="row-fluid">
+  <div class="col-sm-12">
+    <div class="row">
+      <h1>PyFlink: The integration of Pandas into PyFlink</h1>
+      <p><i></i></p>
+
+      <article>
+        <p>28 Jul 2020 Jincheng Sun (<a href="https://twitter.com/sunjincheng121">@sunjincheng121</a>) &amp; Markos Sfikas (<a href="https://twitter.com/MarkSfik">@MarkSfik</a>)</p>
+
+<p>Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities.</p>
+
+<center>
+<img src="/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png" width="450px" alt="Python Scientific Stack" />
+</center>
+<center>
+  <a href="https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52">Pic source: VanderPlas 2017, slide 52.</a>
+</center>
+<p><br /></p>
+
+<p>In an effort to meet the user needs and demands, the Flink community hopes to leverage and make better use of these tools.  Along this direction, the Flink community put some great effort in integrating Pandas into PyFlink with the latest Flink version 1.11. Some of the added features include <strong>support for Pandas UDF</strong> and the <strong>conversion between Pandas DataFrame and Table</strong>. Pandas UDF not only greatly improve the execution performance of Python UDF, but also make it more convenient for users to leverage libraries such as Pandas and NumPy in Python UDF. Additionally, providing support for the conversion between Pandas DataFrame and Table enables users to switch processing engines seamlessly without the need for an intermediate connector. In the remainder of this article, we will introduce how these functionalities work and how to use them with a step-by-step example.</p>
+
+<div class="alert alert-info">
+  <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+Currently, only Scalar Pandas UDFs are supported in PyFlink.</p>
+</div>
+
+<h1 id="pandas-udf-in-flink-111">Pandas UDF in Flink 1.11</h1>
+
+<p>Using scalar Python UDF was already possible in Flink 1.10 as described in a <a href="https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html">previous article on the Flink blog</a>. Scalar Python UDFs work based on three primary steps:</p>
+
+<ul>
+  <li>
+    <p>the Java operator serializes one input row to bytes and sends them to the Python worker;</p>
+  </li>
+  <li>
+    <p>the Python worker deserializes the input row and evaluates the Python UDF with it;</p>
+  </li>
+  <li>
+    <p>the resulting row is serialized and sent back to the Java operator</p>
+  </li>
+</ul>
+
+<p>While providing support for Python UDFs in PyFlink greatly improved the user experience, it had some drawbacks, namely resulting in:</p>
+
+<ul>
+  <li>
+    <p>High serialization/deserialization overhead</p>
+  </li>
+  <li>
+    <p>Difficulty when leveraging popular Python libraries used by data scientists — such as Pandas or NumPy — that provide high-performance data structure and functions.</p>
+  </li>
+</ul>
+
+<p>The introduction of Pandas UDF is used to address these drawbacks. For Pandas UDF, a batch of rows is transferred between the JVM and PVM in a columnar format (<a href="https://arrow.apache.org/docs/format/Columnar.html">Arrow memory format</a>). The batch of rows will be converted into a collection of Pandas Series and will be transferred to the Pandas UDF to then leverage popular Python libraries (such as Pandas, or NumPy) for the Python UDF implementation.</p>
+
+<center>
+<img src="/img/blog/2020-07-28-pyflink-pandas/vm-communication.png" width="550px" alt="VM Communication" />
+</center>
+
+<p>The performance of vectorized UDFs is usually much higher when compared to the normal Python UDF, as the serialization/deserialization overhead is minimized by falling back to <a href="https://arrow.apache.org/">Apache Arrow</a>, while handling <code>pandas.Series</code> as input/output allows us to take full advantage of the Pandas and NumPy libraries, making it a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature engineering, distributed model application).</p>
+
+<h1 id="conversion-between-pyflink-table-and-pandas-dataframe">Conversion between PyFlink Table and Pandas DataFrame</h1>
+
+<p>Pandas DataFrame is the de-facto standard for working with tabular data in the Python community while PyFlink Table is Flink’s representation of the tabular data in Python. Enabling the conversion between PyFlink Table and Pandas DataFrame allows switching between PyFlink and Pandas seamlessly when processing data in Python. Users can process data by utilizing one execution engine and switch to a different one effortlessly. For example, in case users already have a Pandas DataFrame at hand and want to perform some expensive transformation, they can easily convert it to a PyFlink Table and leverage the power of the Flink engine. On the other hand, users can also convert a PyFlink Table to a Pandas DataFrame and perform the same transformation with the rich functionalities provided by the Pandas ecosystem.</p>
+
+<h1 id="examples">Examples</h1>
+
+<p>Using Python in Apache Flink requires installing PyFlink, which is available on <a href="https://pypi.org/project/apache-flink/">PyPI</a> and can be easily installed using <code>pip</code>. Before installing PyFlink, check the working version of Python running in your system using:</p>
+
+<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span>python --version
+Python 3.7.6</code></pre></div>
+
+<div class="alert alert-info">
+  <p><span class="label label-info" style="display: inline-block"><span class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> Note</span>
+Please note that Python 3.5 or higher is required to install and run PyFlink</p>
+</div>
+
+<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span>python -m pip install apache-flink</code></pre></div>
+
+<h2 id="using-pandas-udf">Using Pandas UDF</h2>
+
+<p>Pandas UDFs take <code>pandas.Series</code> as the input and return a <code>pandas.Series</code> of the same length as the output. Pandas UDFs can be used at the exact same place where non-Pandas functions are currently being utilized. To mark a UDF as a Pandas UDF, you only need to add an extra parameter udf_type=”pandas” in the udf decorator:</p>
+
+<div class="highlight"><pre><code class="language-python"><span class="nd">@udf</span><span class="p">(</span><span class="n">input_types</span><span class="o">=</span><span class="p">[</span><span class="n">DataTypes</span><span class="o">.</span><span class="n">STRING</span><span class="p">(),</span> <span class="n">DataTypes</span><span class="o">.</span><span class="n">FLOAT</span><span class="p">()],</span>
+     <span class="n">result_type</span><span class="o">=</span><span class="n">DataTypes</span><span class="o">.</span><span class="n">FLOAT</span><span class="p">(),</span> <span class="n">udf_type</span><span class="o">=</span><span class="s">&#39;pandas&#39;</span><span class="p">)</span>
+<span class="k">def</span> <span class="nf">interpolate</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">temperature</span><span class="p">):</span>
+    <span class="c"># takes id: pandas.Series and temperature: pandas.Series as input</span>
+    <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">&#39;id&#39;</span><span class="p">:</span> <span class="nb">id</span><span class="p">,</span> <span class="s">&#39;temperature&#39;</span><span class="p">:</span> <span class="n">temperature</span><span class="p">})</span>
+
+    <span class="c"># use interpolate() to interpolate the missing temperature</span>
+    <span class="n">interpolated_df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">&#39;id&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
+        <span class="k">lambda</span> <span class="n">group</span><span class="p">:</span> <span class="n">group</span><span class="o">.</span><span class="n">interpolate</span><span class="p">(</span><span class="n">limit_direction</span><span class="o">=</span><span class="s">&#39;both&#39;</span><span class="p">))</span>
+
+    <span class="c"># output temperature: pandas.Series</span>
+    <span class="k">return</span> <span class="n">interpolated_df</span><span class="p">[</span><span class="s">&#39;temperature&#39;</span><span class="p">]</span></code></pre></div>
+
+<p>The Pandas UDF above uses the Pandas <code>dataframe.interpolate()</code> function to interpolate the missing temperature data for each equipment id. This is a common IoT scenario whereby each equipment/device reports it’s id and temperature to be analyzed, but the temperature field may be null due to various reasons.
+With the function, you can register and use it in the same way as the <a href="https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html">normal Python UDF</a>. Below is a complete example of how to use the Pandas UDF in PyFlink.</p>
+
+<div class="highlight"><pre><code class="language-python"><span class="kn">from</span> <span class="nn">pyflink.datastream</span> <span class="kn">import</span> <span class="n">StreamExecutionEnvironment</span>
+<span class="kn">from</span> <span class="nn">pyflink.table</span> <span class="kn">import</span> <span class="n">StreamTableEnvironment</span><span class="p">,</span> <span class="n">DataTypes</span>
+<span class="kn">from</span> <span class="nn">pyflink.table.udf</span> <span class="kn">import</span> <span class="n">udf</span>
+<span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
+
+<span class="n">env</span> <span class="o">=</span> <span class="n">StreamExecutionEnvironment</span><span class="o">.</span><span class="n">get_execution_environment</span><span class="p">()</span>
+<span class="n">env</span><span class="o">.</span><span class="n">set_parallelism</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
+<span class="n">t_env</span> <span class="o">=</span> <span class="n">StreamTableEnvironment</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
+<span class="n">t_env</span><span class="o">.</span><span class="n">get_config</span><span class="p">()</span><span class="o">.</span><span class="n">get_configuration</span><span class="p">()</span><span class="o">.</span><span class="n">set_boolean</span><span class="p">(</span><span class="s">&quot;python.fn-execution.memory.managed&quot;</span><span class="p">,</span> <span class="bp">True</span><span class="p">)</span>
+
+<span class="nd">@udf</span><span class="p">(</span><span class="n">input_types</span><span class="o">=</span><span class="p">[</span><span class="n">DataTypes</span><span class="o">.</span><span class="n">STRING</span><span class="p">(),</span> <span class="n">DataTypes</span><span class="o">.</span><span class="n">FLOAT</span><span class="p">()],</span>
+     <span class="n">result_type</span><span class="o">=</span><span class="n">DataTypes</span><span class="o">.</span><span class="n">FLOAT</span><span class="p">(),</span> <span class="n">udf_type</span><span class="o">=</span><span class="s">&#39;pandas&#39;</span><span class="p">)</span>
+<span class="k">def</span> <span class="nf">interpolate</span><span class="p">(</span><span class="nb">id</span><span class="p">,</span> <span class="n">temperature</span><span class="p">):</span>
+    <span class="c"># takes id: pandas.Series and temperature: pandas.Series as input</span>
+    <span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">&#39;id&#39;</span><span class="p">:</span> <span class="nb">id</span><span class="p">,</span> <span class="s">&#39;temperature&#39;</span><span class="p">:</span> <span class="n">temperature</span><span class="p">})</span>
+
+    <span class="c"># use interpolate() to interpolate the missing temperature</span>
+    <span class="n">interpolated_df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="s">&#39;id&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span>
+        <span class="k">lambda</span> <span class="n">group</span><span class="p">:</span> <span class="n">group</span><span class="o">.</span><span class="n">interpolate</span><span class="p">(</span><span class="n">limit_direction</span><span class="o">=</span><span class="s">&#39;both&#39;</span><span class="p">))</span>
+
+    <span class="c"># output temperature: pandas.Series</span>
+    <span class="k">return</span> <span class="n">interpolated_df</span><span class="p">[</span><span class="s">&#39;temperature&#39;</span><span class="p">]</span>
+
+<span class="n">t_env</span><span class="o">.</span><span class="n">register_function</span><span class="p">(</span><span class="s">&quot;interpolate&quot;</span><span class="p">,</span> <span class="n">interpolate</span><span class="p">)</span>
+
+<span class="n">my_source_ddl</span> <span class="o">=</span> <span class="s">&quot;&quot;&quot;</span>
+<span class="s">    create table mySource (</span>
+<span class="s">        id INT,</span>
+<span class="s">        temperature FLOAT </span>
+<span class="s">    ) with (</span>
+<span class="s">        &#39;connector.type&#39; = &#39;filesystem&#39;,</span>
+<span class="s">        &#39;format.type&#39; = &#39;csv&#39;,</span>
+<span class="s">        &#39;connector.path&#39; = &#39;/tmp/input&#39;</span>
+<span class="s">    )</span>
+<span class="s">&quot;&quot;&quot;</span>
+
+<span class="n">my_sink_ddl</span> <span class="o">=</span> <span class="s">&quot;&quot;&quot;</span>
+<span class="s">    create table mySink (</span>
+<span class="s">        id INT,</span>
+<span class="s">        temperature FLOAT </span>
+<span class="s">    ) with (</span>
+<span class="s">        &#39;connector.type&#39; = &#39;filesystem&#39;,</span>
+<span class="s">        &#39;format.type&#39; = &#39;csv&#39;,</span>
+<span class="s">        &#39;connector.path&#39; = &#39;/tmp/output&#39;</span>
+<span class="s">    )</span>
+<span class="s">&quot;&quot;&quot;</span>
+
+<span class="n">t_env</span><span class="o">.</span><span class="n">execute_sql</span><span class="p">(</span><span class="n">my_source_ddl</span><span class="p">)</span>
+<span class="n">t_env</span><span class="o">.</span><span class="n">execute_sql</span><span class="p">(</span><span class="n">my_sink_ddl</span><span class="p">)</span>
+
+<span class="n">t_env</span><span class="o">.</span><span class="n">from_path</span><span class="p">(</span><span class="s">&#39;mySource&#39;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;id, interpolate(id, temperature) as temperature&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">insert_into</span><span class="p">(</span><span class="s">&#39;mySink&#39;</span><span class="p">)</span>
+
+<span class="n">t_env</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s">&quot;pandas_udf_demo&quot;</span><span class="p">)</span></code></pre></div>
+
+<p>To submit the job:</p>
+
+<ul>
+  <li>Firstly, you need to prepare the input data in the “/tmp/input” file. For example,</li>
+</ul>
+
+<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span><span class="nb">echo</span> -e  <span class="s2">&quot;1,98.0\n1,\n1,100.0\n2,99.0&quot;</span> &gt; /tmp/input</code></pre></div>
+
+<ul>
+  <li>Next, you can run this example on the command line,</li>
+</ul>
+
+<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span>python pandas_udf_demo.py</code></pre></div>
+
+<p>The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, see more details <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html#job-submission-examples">here</a>.</p>
+
+<ul>
+  <li>Finally, you can see the execution result on the command line. As you can see, all the temperature data with an empty value has been interpolated:</li>
+</ul>
+
+<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span> cat /tmp/output
+1,98.0
+1,99.0
+1,100.0
+2,99.0</code></pre></div>
+
+<h2 id="conversion-between-pyflink-table-and-pandas-dataframe-1">Conversion between PyFlink Table and Pandas DataFrame</h2>
+
+<p>You can use the <code>from_pandas()</code> method to create a PyFlink Table from a Pandas DataFrame or use the <code>to_pandas()</code> method to convert a PyFlink Table to a Pandas DataFrame.</p>
+
+<div class="highlight"><pre><code class="language-python"><span class="kn">from</span> <span class="nn">pyflink.datastream</span> <span class="kn">import</span> <span class="n">StreamExecutionEnvironment</span>
+<span class="kn">from</span> <span class="nn">pyflink.table</span> <span class="kn">import</span> <span class="n">StreamTableEnvironment</span>
+<span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>
+<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
+
+<span class="n">env</span> <span class="o">=</span> <span class="n">StreamExecutionEnvironment</span><span class="o">.</span><span class="n">get_execution_environment</span><span class="p">()</span>
+<span class="n">t_env</span> <span class="o">=</span> <span class="n">StreamTableEnvironment</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
+
+<span class="c"># Create a PyFlink Table</span>
+<span class="n">pdf</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
+<span class="n">table</span> <span class="o">=</span> <span class="n">t_env</span><span class="o">.</span><span class="n">from_pandas</span><span class="p">(</span><span class="n">pdf</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;a&quot;</span><span class="p">,</span> <span class="s">&quot;b&quot;</span><span class="p">])</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="s">&quot;a &gt; 0.5&quot;</span><span class="p">)</span>
+
+<span class="c"># Convert the PyFlink Table to a Pandas DataFrame</span>
+<span class="n">pdf</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">to_pandas</span><span class="p">()</span>
+<span class="k">print</span><span class="p">(</span><span class="n">pdf</span><span class="p">)</span></code></pre></div>
+
+<h1 id="conclusion--upcoming-work">Conclusion &amp; Upcoming work</h1>
+
+<p>In this article, we introduce the integration of Pandas in Flink 1.11, including Pandas UDF and the conversion between Table and Pandas. In fact, in the latest Apache Flink release, there are many excellent features added to PyFlink, such as support of User-defined Table functions and User-defined Metrics for Python UDFs. What’s more, from Flink 1.11, you can build PyFlink with Cython support and “Cythonize” your Python UDFs to substantially improve code execution speed (up to 30x faster, compared to Python UDFs in Flink 1.10).</p>
+
+<p>Future work by the community will focus on adding more features and bringing additional optimizations with follow up releases.  Such optimizations and additions include a Python DataStream API and more integration with the Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more information and updates with the upcoming releases!</p>
+
+<center>
+<img src="/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif" width="600px" alt="Mission of PyFlink" />
+</center>
+
+      </article>
+    </div>
+
+    <div class="row">
+      <div id="disqus_thread"></div>
+      <script type="text/javascript">
+        /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+        var disqus_shortname = 'stratosphere-eu'; // required: replace example with your forum shortname
+
+        /* * * DON'T EDIT BELOW THIS LINE * * */
+        (function() {
+            var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+            dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
+             (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+        })();
+      </script>
+    </div>
+  </div>
+</div>
+      </div>
+    </div>
+
+    <hr />
+
+    <div class="row">
+      <div class="footer text-center col-sm-12">
+        <p>Copyright © 2014-2019 <a href="http://apache.org">The Apache Software Foundation</a>. All Rights Reserved.</p>
+        <p>Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p>
+        <p><a href="/privacy-policy.html">Privacy Policy</a> &middot; <a href="/blog/feed.xml">RSS feed</a></p>
+      </div>
+    </div>
+    </div><!-- /.container -->
+
+    <!-- Include all compiled plugins (below), or include individual files as needed -->
+    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.matchHeight/0.7.0/jquery.matchHeight-min.js"></script>
+    <script src="/js/codetabs.js"></script>
+    <script src="/js/stickysidebar.js"></script>
+
+    <!-- Google Analytics -->
+    <script>
+      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-52545728-1', 'auto');
+      ga('send', 'pageview');
+    </script>
+  </body>
+</html>

diff --git a/content/blog/feed.xml b/content/blog/feed.xml
index 3bf09d8..c794bc5 100644
--- a/content/blog/feed.xml
+++ b/content/blog/feed.xml

@@ -672,6 +672,215 @@
 </item>
 
 <item>
+<title>PyFlink: The integration of Pandas into PyFlink</title>
+<description>&lt;p&gt;Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities.&lt;/p&gt;
+
+&lt;center&gt;
+&lt;img src=&quot;/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png&quot; width=&quot;450px&quot; alt=&quot;Python Scientific Stack&quot; /&gt;
+&lt;/center&gt;
+&lt;center&gt;
+  &lt;a href=&quot;https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52&quot;&gt;Pic source: VanderPlas 2017, slide 52.&lt;/a&gt;
+&lt;/center&gt;
+&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
+
+&lt;p&gt;In an effort to meet the user needs and demands, the Flink community hopes to leverage and make better use of these tools.  Along this direction, the Flink community put some great effort in integrating Pandas into PyFlink with the latest Flink version 1.11. Some of the added features include &lt;strong&gt;support for Pandas UDF&lt;/strong&gt; and the &lt;strong&gt;conversion between Pandas DataFrame and Table&lt;/strong&gt;. Pandas UDF not only greatly improve the execution performance of Python UDF, but also make it more convenient for users to leverage libraries such as Pandas and NumPy in Python UDF. Additionally, providing support for the conversion between Pandas DataFrame and Table enables users to switch processing engines seamlessly without the need for an intermediate connector. In the remainder of this article, we will introduce how these functionalities work and how to use them with a step-by-step example.&lt;/p&gt;
+
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
+Currently, only Scalar Pandas UDFs are supported in PyFlink.&lt;/p&gt;
+&lt;/div&gt;
+
+&lt;h1 id=&quot;pandas-udf-in-flink-111&quot;&gt;Pandas UDF in Flink 1.11&lt;/h1&gt;
+
+&lt;p&gt;Using scalar Python UDF was already possible in Flink 1.10 as described in a &lt;a href=&quot;https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html&quot;&gt;previous article on the Flink blog&lt;/a&gt;. Scalar Python UDFs work based on three primary steps:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;the Java operator serializes one input row to bytes and sends them to the Python worker;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;the Python worker deserializes the input row and evaluates the Python UDF with it;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;the resulting row is serialized and sent back to the Java operator&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;While providing support for Python UDFs in PyFlink greatly improved the user experience, it had some drawbacks, namely resulting in:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;High serialization/deserialization overhead&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Difficulty when leveraging popular Python libraries used by data scientists — such as Pandas or NumPy — that provide high-performance data structure and functions.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;The introduction of Pandas UDF is used to address these drawbacks. For Pandas UDF, a batch of rows is transferred between the JVM and PVM in a columnar format (&lt;a href=&quot;https://arrow.apache.org/docs/format/Columnar.html&quot;&gt;Arrow memory format&lt;/a&gt;). The batch of rows will be converted into a collection of Pandas Series and will be transferred to the Pandas UDF to then leverage popular Python libraries (such as Pandas, or NumPy) for the Python UDF implementation.&lt;/p&gt;
+
+&lt;center&gt;
+&lt;img src=&quot;/img/blog/2020-07-28-pyflink-pandas/vm-communication.png&quot; width=&quot;550px&quot; alt=&quot;VM Communication&quot; /&gt;
+&lt;/center&gt;
+
+&lt;p&gt;The performance of vectorized UDFs is usually much higher when compared to the normal Python UDF, as the serialization/deserialization overhead is minimized by falling back to &lt;a href=&quot;https://arrow.apache.org/&quot;&gt;Apache Arrow&lt;/a&gt;, while handling &lt;code&gt;pandas.Series&lt;/code&gt; as input/output allows us to take full advantage of the Pandas and NumPy libraries, making it a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature engineering, distributed model application).&lt;/p&gt;
+
+&lt;h1 id=&quot;conversion-between-pyflink-table-and-pandas-dataframe&quot;&gt;Conversion between PyFlink Table and Pandas DataFrame&lt;/h1&gt;
+
+&lt;p&gt;Pandas DataFrame is the de-facto standard for working with tabular data in the Python community while PyFlink Table is Flink’s representation of the tabular data in Python. Enabling the conversion between PyFlink Table and Pandas DataFrame allows switching between PyFlink and Pandas seamlessly when processing data in Python. Users can process data by utilizing one execution engine and switch to a different one effortlessly. For example, in case users already have a Pandas DataFrame at hand and want to perform some expensive transformation, they can easily convert it to a PyFlink Table and leverage the power of the Flink engine. On the other hand, users can also convert a PyFlink Table to a Pandas DataFrame and perform the same transformation with the rich functionalities provided by the Pandas ecosystem.&lt;/p&gt;
+
+&lt;h1 id=&quot;examples&quot;&gt;Examples&lt;/h1&gt;
+
+&lt;p&gt;Using Python in Apache Flink requires installing PyFlink, which is available on &lt;a href=&quot;https://pypi.org/project/apache-flink/&quot;&gt;PyPI&lt;/a&gt; and can be easily installed using &lt;code&gt;pip&lt;/code&gt;. Before installing PyFlink, check the working version of Python running in your system using:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python --version
+Python 3.7.6&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;div class=&quot;alert alert-info&quot;&gt;
+  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
+Please note that Python 3.5 or higher is required to install and run PyFlink&lt;/p&gt;
+&lt;/div&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python -m pip install apache-flink&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;h2 id=&quot;using-pandas-udf&quot;&gt;Using Pandas UDF&lt;/h2&gt;
+
+&lt;p&gt;Pandas UDFs take &lt;code&gt;pandas.Series&lt;/code&gt; as the input and return a &lt;code&gt;pandas.Series&lt;/code&gt; of the same length as the output. Pandas UDFs can be used at the exact same place where non-Pandas functions are currently being utilized. To mark a UDF as a Pandas UDF, you only need to add an extra parameter udf_type=”pandas” in the udf decorator:&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt;
+     &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;pandas&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+    &lt;span class=&quot;c&quot;&gt;# takes id: pandas.Series and temperature: pandas.Series as input&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
+
+    &lt;span class=&quot;c&quot;&gt;# use interpolate() to interpolate the missing temperature&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupby&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;limit_direction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;both&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
+
+    &lt;span class=&quot;c&quot;&gt;# output temperature: pandas.Series&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;The Pandas UDF above uses the Pandas &lt;code&gt;dataframe.interpolate()&lt;/code&gt; function to interpolate the missing temperature data for each equipment id. This is a common IoT scenario whereby each equipment/device reports it’s id and temperature to be analyzed, but the temperature field may be null due to various reasons.
+With the function, you can register and use it in the same way as the &lt;a href=&quot;https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html&quot;&gt;normal Python UDF&lt;/a&gt;. Below is a complete example of how to use the Pandas UDF in PyFlink.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.datastream&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table.udf&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pd&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_execution_environment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_parallelism&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_configuration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_boolean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;python.fn-execution.memory.managed&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt;
+     &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;pandas&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+    &lt;span class=&quot;c&quot;&gt;# takes id: pandas.Series and temperature: pandas.Series as input&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
+
+    &lt;span class=&quot;c&quot;&gt;# use interpolate() to interpolate the missing temperature&lt;/span&gt;
+    &lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupby&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;limit_direction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;both&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
+
+    &lt;span class=&quot;c&quot;&gt;# output temperature: pandas.Series&lt;/span&gt;
+    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;interpolate&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;my_source_ddl&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;    create table mySource (&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        id INT,&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        temperature FLOAT &lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;    ) with (&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        &amp;#39;connector.type&amp;#39; = &amp;#39;filesystem&amp;#39;,&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        &amp;#39;format.type&amp;#39; = &amp;#39;csv&amp;#39;,&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        &amp;#39;connector.path&amp;#39; = &amp;#39;/tmp/input&amp;#39;&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;    )&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;my_sink_ddl&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;    create table mySink (&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        id INT,&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        temperature FLOAT &lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;    ) with (&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        &amp;#39;connector.type&amp;#39; = &amp;#39;filesystem&amp;#39;,&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        &amp;#39;format.type&amp;#39; = &amp;#39;csv&amp;#39;,&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;        &amp;#39;connector.path&amp;#39; = &amp;#39;/tmp/output&amp;#39;&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;    )&lt;/span&gt;
+&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute_sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_source_ddl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute_sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_sink_ddl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySource&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;\
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;id, interpolate(id, temperature) as temperature&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; \
+    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;insert_into&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;pandas_udf_demo&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;To submit the job:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Firstly, you need to prepare the input data in the “/tmp/input” file. For example,&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; -e  &lt;span class=&quot;s2&quot;&gt;&amp;quot;1,98.0\n1,\n1,100.0\n2,99.0&amp;quot;&lt;/span&gt; &amp;gt; /tmp/input&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Next, you can run this example on the command line,&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python pandas_udf_demo.py&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;p&gt;The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, see more details &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html#job-submission-examples&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Finally, you can see the execution result on the command line. As you can see, all the temperature data with an empty value has been interpolated:&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt; cat /tmp/output
+1,98.0
+1,99.0
+1,100.0
+2,99.0&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;h2 id=&quot;conversion-between-pyflink-table-and-pandas-dataframe-1&quot;&gt;Conversion between PyFlink Table and Pandas DataFrame&lt;/h2&gt;
+
+&lt;p&gt;You can use the &lt;code&gt;from_pandas()&lt;/code&gt; method to create a PyFlink Table from a Pandas DataFrame or use the &lt;code&gt;to_pandas()&lt;/code&gt; method to convert a PyFlink Table to a Pandas DataFrame.&lt;/p&gt;
+
+&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.datastream&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pd&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;np&lt;/span&gt;
+
+&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_execution_environment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+&lt;span class=&quot;c&quot;&gt;# Create a PyFlink Table&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_pandas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;b&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;a &amp;gt; 0.5&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+&lt;span class=&quot;c&quot;&gt;# Convert the PyFlink Table to a Pandas DataFrame&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_pandas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+
+&lt;h1 id=&quot;conclusion--upcoming-work&quot;&gt;Conclusion &amp;amp; Upcoming work&lt;/h1&gt;
+
+&lt;p&gt;In this article, we introduce the integration of Pandas in Flink 1.11, including Pandas UDF and the conversion between Table and Pandas. In fact, in the latest Apache Flink release, there are many excellent features added to PyFlink, such as support of User-defined Table functions and User-defined Metrics for Python UDFs. What’s more, from Flink 1.11, you can build PyFlink with Cython support and “Cythonize” your Python UDFs to substantially improve code execution speed (up to 30x faster, compared to Python UDFs in Flink 1.10).&lt;/p&gt;
+
+&lt;p&gt;Future work by the community will focus on adding more features and bringing additional optimizations with follow up releases.  Such optimizations and additions include a Python DataStream API and more integration with the Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more information and updates with the upcoming releases!&lt;/p&gt;
+
+&lt;center&gt;
+&lt;img src=&quot;/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif&quot; width=&quot;600px&quot; alt=&quot;Mission of PyFlink&quot; /&gt;
+&lt;/center&gt;
+</description>
+<pubDate>Tue, 28 Jul 2020 14:00:00 +0200</pubDate>
+<link>https://flink.apache.org/2020/07/28/pyflink-pandas-udf-support-flink.html</link>
+<guid isPermaLink="true">/2020/07/28/pyflink-pandas-udf-support-flink.html</guid>
+</item>
+
+<item>
 <title>Flink SQL Demo: Building an End-to-End Streaming Application</title>
 <description>&lt;p&gt;Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.&lt;/p&gt;
 
@@ -17072,152 +17281,5 @@
 <guid isPermaLink="true">/news/2015/12/18/a-year-in-review.html</guid>
 </item>
 
-<item>
-<title>Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</title>
-<description>&lt;p&gt;&lt;a href=&quot;https://storm.apache.org&quot;&gt;Apache Storm&lt;/a&gt; was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics.
-Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly.
-Only shortly afterwards, Twitter acquired Backtype.
-Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industry standard for big data stream processing.
-In 2013, Storm entered the Apache incubator program, followed by its graduation to top-level in 2014.&lt;/p&gt;
-
-&lt;p&gt;Apache Flink is a stream processing engine that improves upon older technologies like Storm in several dimensions,
-including &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html&quot;&gt;strong consistency guarantees&lt;/a&gt; (“exactly once”),
-a higher level &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html&quot;&gt;DataStream API&lt;/a&gt;,
-support for &lt;a href=&quot;http://flink.apache.org/news/2015/12/04/Introducing-windows.html&quot;&gt;event time and a rich windowing system&lt;/a&gt;,
-as well as &lt;a href=&quot;https://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/&quot;&gt;superior throughput with competitive low latency&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;While Flink offers several technical benefits over Storm, an existing investment on a codebase of applications developed for Storm often makes it difficult to switch engines.
-For these reasons, as part of the Flink 0.10 release, Flink ships with a Storm compatibility package that allows users to:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Run &lt;strong&gt;unmodified&lt;/strong&gt; Storm topologies using Apache Flink benefiting from superior performance.&lt;/li&gt;
-  &lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; Storm code (spouts and bolts) as operators inside Flink DataStream programs.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Only minor code changes are required in order to submit the program to Flink instead of Storm.
-This minimizes the work for developers to run existing Storm topologies while leveraging Apache Flink’s fast and robust execution engine.&lt;/p&gt;
-
-&lt;p&gt;We note that the Storm compatibility package is continuously improving and does not cover the full spectrum of Storm’s API.
-However, it is powerful enough to cover many use cases.&lt;/p&gt;
-
-&lt;h2 id=&quot;executing-storm-topologies-with-flink&quot;&gt;Executing Storm topologies with Flink&lt;/h2&gt;
-
-&lt;center&gt;
-&lt;img src=&quot;/img/blog/flink-storm.png&quot; style=&quot;height:200px;margin:15px&quot; /&gt;
-&lt;/center&gt;
-
-&lt;p&gt;The easiest way to use the Storm compatibility package is by executing a whole Storm topology in Flink.
-For this, you only need to replace the dependency &lt;code&gt;storm-core&lt;/code&gt; by &lt;code&gt;flink-storm&lt;/code&gt; in your Storm project and &lt;strong&gt;change two lines of code&lt;/strong&gt; in your original Storm program.&lt;/p&gt;
-
-&lt;p&gt;The following example shows a simple Storm-Word-Count-Program that can be executed in Flink.
-First, the program is assembled the Storm way without any code change to Spouts, Bolts, or the topology itself.&lt;/p&gt;
-
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// assemble topology, the Storm way&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;TopologyBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TopologyBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setSpout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;source&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormFileSpout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputFilePath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setBolt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;tokenizer&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormBoltTokenizer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
-       &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;shuffleGrouping&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;source&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setBolt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormBoltCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
-       &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fieldsGrouping&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;tokenizer&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Fields&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setBolt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sink&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormBoltFileSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputFilePath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
-       &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;shuffleGrouping&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-
-&lt;p&gt;In order to execute the topology, we need to translate it to a &lt;code&gt;FlinkTopology&lt;/code&gt; and submit it to a local or remote Flink cluster, very similar to submitting the application to a Storm cluster.&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;ref1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
-
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// transform Storm topology to Flink program&lt;/span&gt;
-&lt;span class=&quot;c1&quot;&gt;// replaces: StormTopology topology = builder.createTopology();&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;Config&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runLocal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
-	&lt;span class=&quot;c1&quot;&gt;// use FlinkLocalCluster instead of LocalCluster&lt;/span&gt;
-	&lt;span class=&quot;n&quot;&gt;FlinkLocalCluster&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cluster&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkLocalCluster&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getLocalCluster&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
-	&lt;span class=&quot;n&quot;&gt;cluster&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;submitTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WordCount&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
-	&lt;span class=&quot;c1&quot;&gt;// use FlinkSubmitter instead of StormSubmitter&lt;/span&gt;
-	&lt;span class=&quot;n&quot;&gt;FlinkSubmitter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;submitTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WordCount&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-
-&lt;p&gt;As a shorter Flink-style alternative that replaces the Storm-style submission code, you can also use context-based job execution:&lt;/p&gt;
-
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// transform Storm topology to Flink program (as above)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// executes locally by default or remotely if submitted with Flink&amp;#39;s command-line client&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-
-&lt;p&gt;After the code is packaged in a jar file (e.g., &lt;code&gt;StormWordCount.jar&lt;/code&gt;), it can be easily submitted to Flink via&lt;/p&gt;
-
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;bin/flink run StormWordCount.jar
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-
-&lt;p&gt;The used Spouts and Bolts as well as the topology assemble code is not changed at all!
-Only the translation and submission step have to be changed to the Storm-API compatible Flink pendants.
-This allows for minimal code changes and easy adaption to Flink.&lt;/p&gt;
-
-&lt;h3 id=&quot;embedding-spouts-and-bolts-in-flink-programs&quot;&gt;Embedding Spouts and Bolts in Flink programs&lt;/h3&gt;
-
-&lt;p&gt;It is also possible to use Spouts and Bolts within a regular Flink DataStream program.
-The compatibility package provides wrapper classes for Spouts and Bolts which are implemented as a Flink &lt;code&gt;SourceFunction&lt;/code&gt; and &lt;code&gt;StreamOperator&lt;/code&gt; respectively.
-Those wrappers automatically translate incoming Flink POJO and &lt;code&gt;TupleXX&lt;/code&gt; records into Storm’s &lt;code&gt;Tuple&lt;/code&gt; type and emitted &lt;code&gt;Values&lt;/code&gt; back into either POJOs or &lt;code&gt;TupleXX&lt;/code&gt; types for further processing by Flink operators.
-As Storm is type agnostic, it is required to specify the output type of embedded Spouts/Bolts manually to get a fully typed Flink streaming program.&lt;/p&gt;
-
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// use regular Flink streaming environment&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// use Spout as source&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
-  &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// Flink provided wrapper including original Spout&lt;/span&gt;
-                &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SpoutWrapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FileSpout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;localFilePath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; 
-                &lt;span class=&quot;c1&quot;&gt;// specify output type manually&lt;/span&gt;
-                &lt;span class=&quot;n&quot;&gt;TypeExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getForObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)));&lt;/span&gt;
-&lt;span class=&quot;c1&quot;&gt;// FileSpout cannot be parallelized&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setParallelism&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// further processing with Flink&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Tokenizer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// use Bolt for counting&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;counts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
-  &lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;transform&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
-                   &lt;span class=&quot;c1&quot;&gt;// specify output type manually&lt;/span&gt;
-                   &lt;span class=&quot;n&quot;&gt;TypeExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getForObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
-                   &lt;span class=&quot;c1&quot;&gt;// Flink provided wrapper including original Bolt&lt;/span&gt;
-                   &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BoltWrapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;BoltCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// write result to file via Flink sink&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;counts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeAsText&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// start Flink job&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WordCount with Spout source and Bolt counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-
-&lt;p&gt;Although some boilerplate code is needed (we plan to address this soon!), the actual embedded Spout and Bolt code can be used unmodified.
-We also note that the resulting program is fully typed, and type errors will be found by Flink’s type extractor even if the original Spouts and Bolts are not.&lt;/p&gt;
-
-&lt;h2 id=&quot;outlook&quot;&gt;Outlook&lt;/h2&gt;
-
-&lt;p&gt;The Storm compatibility package is currently in beta and undergoes continuous development.
-We are currently working on providing consistency guarantees for stateful Bolts.
-Furthermore, we want to provide a better API integration for embedded Spouts and Bolts by providing a “StormExecutionEnvironment” as a special extension of Flink’s &lt;code&gt;StreamExecutionEnvironment&lt;/code&gt;.
-We are also investigating the integration of Storm’s higher-level programming API Trident.&lt;/p&gt;
-
-&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
-
-&lt;p&gt;Flink’s compatibility package for Storm allows using unmodified Spouts and Bolts within Flink.
-This enables you to even embed third-party Spouts and Bolts where the source code is not available.
-While you can embed Spouts/Bolts in a Flink program and mix-and-match them with Flink operators, running whole topologies is the easiest way to get started and can be achieved with almost no code changes.&lt;/p&gt;
-
-&lt;p&gt;If you want to try out Flink’s Storm compatibility package checkout our &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/storm_compatibility.html&quot;&gt;Documentation&lt;/a&gt;.&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;p&gt;&lt;sup id=&quot;fn1&quot;&gt;1. We confess, there are three lines changed compared to a Storm project &lt;img class=&quot;emoji&quot; style=&quot;width:16px;height:16px;align:absmiddle&quot; src=&quot;/img/blog/smirk.png&quot; /&gt;—because the example covers local &lt;em&gt;and&lt;/em&gt; remote execution. &lt;a href=&quot;#ref1&quot; title=&quot;Back to text.&quot;&gt;↩&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
-
-</description>
-<pubDate>Fri, 11 Dec 2015 11:00:00 +0100</pubDate>
-<link>https://flink.apache.org/news/2015/12/11/storm-compatibility.html</link>
-<guid isPermaLink="true">/news/2015/12/11/storm-compatibility.html</guid>
-</item>
-
 </channel>
 </rss>

diff --git a/content/blog/index.html b/content/blog/index.html
index 009ad78..af9a77b 100644
--- a/content/blog/index.html
+++ b/content/blog/index.html

@@ -209,6 +209,19 @@
     <hr>
     
     <article>
+      <h2 class="blog-title"><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></h2>
+
+      <p>28 Jul 2020
+       Jincheng Sun (<a href="https://twitter.com/sunjincheng121">@sunjincheng121</a>) &amp; Markos Sfikas (<a href="https://twitter.com/MarkSfik">@MarkSfik</a>)</p>
+
+      <p>The Apache Flink community put some great effort into integrating Pandas with PyFlink in the latest Flink version 1.11. Some of the added features include support for Pandas UDF and the conversion between Pandas DataFrame and Table. In this article, we will introduce how these functionalities work and how to use them with a step-by-step example.</p>
+
+      <p><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></h2>
 
       <p>28 Jul 2020
@@ -330,19 +343,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2020/06/11/community-update.html">Flink Community Update - June'20</a></h2>
-
-      <p>11 Jun 2020
-       Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p>
-
-      <p>And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020.</p>
-
-      <p><a href="/news/2020/06/11/community-update.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -385,6 +385,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page10/index.html b/content/blog/page10/index.html
index 676b25e..591ef5b 100644
--- a/content/blog/page10/index.html
+++ b/content/blog/page10/index.html

@@ -196,6 +196,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2016/08/08/release-1.1.0.html">Announcing Apache Flink 1.1.0</a></h2>
+
+      <p>08 Aug 2016
+      </p>
+
+      <p><div class="alert alert-success"><strong>Important</strong>: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use <strong>1.1.1</strong> or <strong>1.1.1-hadoop1</strong> as the Flink version.</div>
+
+</p>
+
+      <p><a href="/news/2016/08/08/release-1.1.0.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2016/05/24/stream-sql.html">Stream Processing for Everyone with SQL and Apache Flink</a></h2>
 
       <p>24 May 2016 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>)
@@ -325,19 +340,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2015/12/11/storm-compatibility.html">Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</a></h2>
-
-      <p>11 Dec 2015 by Matthias J. Sax (<a href="https://twitter.com/">@MatthiasJSax</a>)
-      </p>
-
-      <p>In this blog post, we describe Flink's compatibility package for <a href="https://storm.apache.org">Apache Storm</a> that allows to embed Spouts (sources) and Bolts (operators) in a regular Flink streaming job. Furthermore, the compatibility package provides a Storm compatible API in order to execute whole Storm topologies with (almost) no code adaption.</p>
-
-      <p><a href="/news/2015/12/11/storm-compatibility.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -380,6 +382,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page11/index.html b/content/blog/page11/index.html
index 9f40763..27263a4 100644
--- a/content/blog/page11/index.html
+++ b/content/blog/page11/index.html

@@ -196,6 +196,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2015/12/11/storm-compatibility.html">Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</a></h2>
+
+      <p>11 Dec 2015 by Matthias J. Sax (<a href="https://twitter.com/">@MatthiasJSax</a>)
+      </p>
+
+      <p>In this blog post, we describe Flink's compatibility package for <a href="https://storm.apache.org">Apache Storm</a> that allows to embed Spouts (sources) and Bolts (operators) in a regular Flink streaming job. Furthermore, the compatibility package provides a Storm compatible API in order to execute whole Storm topologies with (almost) no code adaption.</p>
+
+      <p><a href="/news/2015/12/11/storm-compatibility.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2015/12/04/Introducing-windows.html">Introducing Stream Windows in Apache Flink</a></h2>
 
       <p>04 Dec 2015 by Fabian Hueske (<a href="https://twitter.com/">@fhueske</a>)
@@ -333,20 +346,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Juggling with Bits and Bytes</a></h2>
-
-      <p>11 May 2015 by Fabian Hüske (<a href="https://twitter.com/">@fhueske</a>)
-      </p>
-
-      <p><p>Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.</p>
-<p>In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.</p></p>
-
-      <p><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -389,6 +388,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page12/index.html b/content/blog/page12/index.html
index ba6dd47..fb9e164 100644
--- a/content/blog/page12/index.html
+++ b/content/blog/page12/index.html

@@ -196,6 +196,20 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Juggling with Bits and Bytes</a></h2>
+
+      <p>11 May 2015 by Fabian Hüske (<a href="https://twitter.com/">@fhueske</a>)
+      </p>
+
+      <p><p>Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.</p>
+<p>In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.</p></p>
+
+      <p><a href="/news/2015/05/11/Juggling-with-Bits-and-Bytes.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2015/04/13/release-0.9.0-milestone1.html">Announcing Flink 0.9.0-milestone1 preview release</a></h2>
 
       <p>13 Apr 2015
@@ -340,21 +354,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2014/11/04/release-0.7.0.html">Apache Flink 0.7.0 available</a></h2>
-
-      <p>04 Nov 2014
-      </p>
-
-      <p><p>We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them!</p>
-
-</p>
-
-      <p><a href="/news/2014/11/04/release-0.7.0.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -397,6 +396,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page13/index.html b/content/blog/page13/index.html
index 3da2d59..1450726 100644
--- a/content/blog/page13/index.html
+++ b/content/blog/page13/index.html

@@ -196,6 +196,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2014/11/04/release-0.7.0.html">Apache Flink 0.7.0 available</a></h2>
+
+      <p>04 Nov 2014
+      </p>
+
+      <p><p>We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them!</p>
+
+</p>
+
+      <p><a href="/news/2014/11/04/release-0.7.0.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2014/10/03/upcoming_events.html">Upcoming Events</a></h2>
 
       <p>03 Oct 2014
@@ -285,6 +300,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page2/index.html b/content/blog/page2/index.html
index 59e48c1..2a294d5 100644
--- a/content/blog/page2/index.html
+++ b/content/blog/page2/index.html

@@ -196,6 +196,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2020/06/11/community-update.html">Flink Community Update - June'20</a></h2>
+
+      <p>11 Jun 2020
+       Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p>
+
+      <p>And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020.</p>
+
+      <p><a href="/news/2020/06/11/community-update.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2020/06/09/release-statefun-2.1.0.html">Stateful Functions 2.1.0 Release Announcement</a></h2>
 
       <p>09 Jun 2020
@@ -321,19 +334,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2020/04/01/community-update.html">Flink Community Update - April'20</a></h2>
-
-      <p>01 Apr 2020
-       Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p>
-
-      <p>While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.</p>
-
-      <p><a href="/news/2020/04/01/community-update.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -376,6 +376,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page3/index.html b/content/blog/page3/index.html
index 3f87858..7a9feec 100644
--- a/content/blog/page3/index.html
+++ b/content/blog/page3/index.html

@@ -196,6 +196,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2020/04/01/community-update.html">Flink Community Update - April'20</a></h2>
+
+      <p>01 Apr 2020
+       Marta Paes (<a href="https://twitter.com/morsapaes">@morsapaes</a>)</p>
+
+      <p>While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.</p>
+
+      <p><a href="/news/2020/04/01/community-update.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/features/2020/03/27/flink-for-data-warehouse.html">Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</a></h2>
 
       <p>27 Mar 2020
@@ -318,21 +331,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2019/12/11/release-1.8.3.html">Apache Flink 1.8.3 Released</a></h2>
-
-      <p>11 Dec 2019
-       Hequn Cheng </p>
-
-      <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series.</p>
-
-</p>
-
-      <p><a href="/news/2019/12/11/release-1.8.3.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -375,6 +373,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page4/index.html b/content/blog/page4/index.html
index e9e5df6..a1d94fe 100644
--- a/content/blog/page4/index.html
+++ b/content/blog/page4/index.html

@@ -196,6 +196,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2019/12/11/release-1.8.3.html">Apache Flink 1.8.3 Released</a></h2>
+
+      <p>11 Dec 2019
+       Hequn Cheng </p>
+
+      <p><p>The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series.</p>
+
+</p>
+
+      <p><a href="/news/2019/12/11/release-1.8.3.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2019/12/09/flink-kubernetes-kudo.html">Running Apache Flink on Kubernetes with KUDO</a></h2>
 
       <p>09 Dec 2019
@@ -321,19 +336,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/2019/06/26/broadcast-state.html">A Practical Guide to Broadcast State in Apache Flink</a></h2>
-
-      <p>26 Jun 2019
-       Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p>
-
-      <p>Apache Flink has multiple types of operator state, one of which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream.</p>
-
-      <p><a href="/2019/06/26/broadcast-state.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -376,6 +378,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page5/index.html b/content/blog/page5/index.html
index dac3fb9..e105256 100644
--- a/content/blog/page5/index.html
+++ b/content/blog/page5/index.html

@@ -196,6 +196,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/2019/06/26/broadcast-state.html">A Practical Guide to Broadcast State in Apache Flink</a></h2>
+
+      <p>26 Jun 2019
+       Fabian Hueske (<a href="https://twitter.com/fhueske">@fhueske</a>)</p>
+
+      <p>Apache Flink has multiple types of operator state, one of which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream.</p>
+
+      <p><a href="/2019/06/26/broadcast-state.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/2019/06/05/flink-network-stack.html">A Deep-Dive into Flink's Network Stack</a></h2>
 
       <p>05 Jun 2019
@@ -320,21 +333,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2019/02/25/release-1.6.4.html">Apache Flink 1.6.4 Released</a></h2>
-
-      <p>25 Feb 2019
-      </p>
-
-      <p><p>The Apache Flink community released the fourth bugfix version of the Apache Flink 1.6 series.</p>
-
-</p>
-
-      <p><a href="/news/2019/02/25/release-1.6.4.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -377,6 +375,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page6/index.html b/content/blog/page6/index.html
index 4bfda30..fffc301 100644
--- a/content/blog/page6/index.html
+++ b/content/blog/page6/index.html

@@ -196,6 +196,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2019/02/25/release-1.6.4.html">Apache Flink 1.6.4 Released</a></h2>
+
+      <p>25 Feb 2019
+      </p>
+
+      <p><p>The Apache Flink community released the fourth bugfix version of the Apache Flink 1.6 series.</p>
+
+</p>
+
+      <p><a href="/news/2019/02/25/release-1.6.4.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2019/02/15/release-1.7.2.html">Apache Flink 1.7.2 Released</a></h2>
 
       <p>15 Feb 2019
@@ -330,21 +345,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2018/09/20/release-1.5.4.html">Apache Flink 1.5.4 Released</a></h2>
-
-      <p>20 Sep 2018
-      </p>
-
-      <p><p>The Apache Flink community released the fourth bugfix version of the Apache Flink 1.5 series.</p>
-
-</p>
-
-      <p><a href="/news/2018/09/20/release-1.5.4.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -387,6 +387,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page7/index.html b/content/blog/page7/index.html
index 7fb0917..1c0e8ec 100644
--- a/content/blog/page7/index.html
+++ b/content/blog/page7/index.html

@@ -196,6 +196,21 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2018/09/20/release-1.5.4.html">Apache Flink 1.5.4 Released</a></h2>
+
+      <p>20 Sep 2018
+      </p>
+
+      <p><p>The Apache Flink community released the fourth bugfix version of the Apache Flink 1.5 series.</p>
+
+</p>
+
+      <p><a href="/news/2018/09/20/release-1.5.4.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2018/08/21/release-1.5.3.html">Apache Flink 1.5.3 Released</a></h2>
 
       <p>21 Aug 2018
@@ -328,19 +343,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/features/2018/01/30/incremental-checkpointing.html">Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</a></h2>
-
-      <p>30 Jan 2018
-       Stefan Ricther (<a href="https://twitter.com/StefanRRicther">@StefanRRicther</a>) &amp; Chris Ward (<a href="https://twitter.com/chrischinch">@chrischinch</a>)</p>
-
-      <p>Flink 1.3.0 introduced incremental checkpointing, making it possible for applications with large state to generate checkpoints more efficiently.</p>
-
-      <p><a href="/features/2018/01/30/incremental-checkpointing.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -383,6 +385,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page8/index.html b/content/blog/page8/index.html
index 4151269..908fb48 100644
--- a/content/blog/page8/index.html
+++ b/content/blog/page8/index.html

@@ -196,6 +196,19 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/features/2018/01/30/incremental-checkpointing.html">Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</a></h2>
+
+      <p>30 Jan 2018
+       Stefan Ricther (<a href="https://twitter.com/StefanRRicther">@StefanRRicther</a>) &amp; Chris Ward (<a href="https://twitter.com/chrischinch">@chrischinch</a>)</p>
+
+      <p>Flink 1.3.0 introduced incremental checkpointing, making it possible for applications with large state to generate checkpoints more efficiently.</p>
+
+      <p><a href="/features/2018/01/30/incremental-checkpointing.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2017/12/21/2017-year-in-review.html">Apache Flink in 2017: Year in Review</a></h2>
 
       <p>21 Dec 2017
@@ -331,20 +344,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></h2>
-
-      <p>04 Apr 2017 by Fabian Hueske, Shaoxuan Wang, and Xiaowei Jiang
-      </p>
-
-      <p><p>Flink's relational APIs, the Table API and SQL, are unified APIs for stream and batch processing, meaning that a query produces the same result when being evaluated on streaming or static data.</p>
-<p>In this blog post we discuss the future of these APIs and introduce the concept of Dynamic Tables. Dynamic tables will significantly expand the scope of the Table API and SQL on streams and enable many more advanced use cases. We discuss how streams and dynamic tables relate to each other and explain the semantics of continuously evaluating queries on dynamic tables.</p></p>
-
-      <p><a href="/news/2017/04/04/dynamic-tables.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -387,6 +386,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/blog/page9/index.html b/content/blog/page9/index.html
index e26b9e5..cbe1955 100644
--- a/content/blog/page9/index.html
+++ b/content/blog/page9/index.html

@@ -196,6 +196,20 @@
     <!-- Blog posts -->
     
     <article>
+      <h2 class="blog-title"><a href="/news/2017/04/04/dynamic-tables.html">Continuous Queries on Dynamic Tables</a></h2>
+
+      <p>04 Apr 2017 by Fabian Hueske, Shaoxuan Wang, and Xiaowei Jiang
+      </p>
+
+      <p><p>Flink's relational APIs, the Table API and SQL, are unified APIs for stream and batch processing, meaning that a query produces the same result when being evaluated on streaming or static data.</p>
+<p>In this blog post we discuss the future of these APIs and introduce the concept of Dynamic Tables. Dynamic tables will significantly expand the scope of the Table API and SQL on streams and enable many more advanced use cases. We discuss how streams and dynamic tables relate to each other and explain the semantics of continuously evaluating queries on dynamic tables.</p></p>
+
+      <p><a href="/news/2017/04/04/dynamic-tables.html">Continue reading &raquo;</a></p>
+    </article>
+
+    <hr>
+    
+    <article>
       <h2 class="blog-title"><a href="/news/2017/03/29/table-sql-api-update.html">From Streams to Tables and Back Again: An Update on Flink's Table & SQL API</a></h2>
 
       <p>29 Mar 2017 by Timo Walther (<a href="https://twitter.com/">@twalthr</a>)
@@ -324,21 +338,6 @@
 
     <hr>
     
-    <article>
-      <h2 class="blog-title"><a href="/news/2016/08/08/release-1.1.0.html">Announcing Apache Flink 1.1.0</a></h2>
-
-      <p>08 Aug 2016
-      </p>
-
-      <p><div class="alert alert-success"><strong>Important</strong>: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use <strong>1.1.1</strong> or <strong>1.1.1-hadoop1</strong> as the Flink version.</div>
-
-</p>
-
-      <p><a href="/news/2016/08/08/release-1.1.0.html">Continue reading &raquo;</a></p>
-    </article>
-
-    <hr>
-    
 
     <!-- Pagination links -->
     
@@ -381,6 +380,16 @@
       
 
       
+      <li><a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></li>
+
+      
+        
+      
+    
+      
+      
+
+      
       <li><a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></li>
 
       

diff --git a/content/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif b/content/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif
new file mode 100644
index 0000000..dda05ff
--- /dev/null
+++ b/content/img/blog/2020-07-28-pyflink-pandas/mission-of-pyFlink.gif
Binary files differ

diff --git a/content/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png b/content/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png
new file mode 100644
index 0000000..4d84179
--- /dev/null
+++ b/content/img/blog/2020-07-28-pyflink-pandas/python-scientific-stack.png
Binary files differ

diff --git a/content/img/blog/2020-07-28-pyflink-pandas/vm-communication.png b/content/img/blog/2020-07-28-pyflink-pandas/vm-communication.png
new file mode 100644
index 0000000..228787a
--- /dev/null
+++ b/content/img/blog/2020-07-28-pyflink-pandas/vm-communication.png
Binary files differ

diff --git a/content/index.html b/content/index.html
index f0a9067..ae18b0c 100644
--- a/content/index.html
+++ b/content/index.html

@@ -571,6 +571,9 @@
         <dt> <a href="/news/2020/07/30/demo-fraud-detection-3.html">Advanced Flink Application Patterns Vol.3: Custom Window Processing</a></dt>
         <dd>In this series of blog posts you will learn about powerful Flink patterns for building streaming applications.</dd>
       
+        <dt> <a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></dt>
+        <dd>The Apache Flink community put some great effort into integrating Pandas with PyFlink in the latest Flink version 1.11. Some of the added features include support for Pandas UDF and the conversion between Pandas DataFrame and Table. In this article, we will introduce how these functionalities work and how to use them with a step-by-step example.</dd>
+      
         <dt> <a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></dt>
         <dd>Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.</dd>
       
@@ -581,11 +584,6 @@
         <dd><p>With an ever-growing number of people working with data, it’s a common practice for companies to build self-service platforms with the goal of democratizing their access across different teams and — especially — to enable users from any background to be independent in their data needs. In such environments, metadata management becomes a crucial aspect. Without it, users often work blindly, spending too much time searching for datasets and their location, figuring out data formats and similar cumbersome tasks.</p>
 
 </dd>
-      
-        <dt> <a href="/news/2020/07/21/release-1.11.1.html">Apache Flink 1.11.1 Released</a></dt>
-        <dd><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.11 series.</p>
-
-</dd>
     
   </dl>
 

diff --git a/content/zh/index.html b/content/zh/index.html
index cc1f2e1..c4e878b 100644
--- a/content/zh/index.html
+++ b/content/zh/index.html

@@ -568,6 +568,9 @@
         <dt> <a href="/news/2020/07/30/demo-fraud-detection-3.html">Advanced Flink Application Patterns Vol.3: Custom Window Processing</a></dt>
         <dd>In this series of blog posts you will learn about powerful Flink patterns for building streaming applications.</dd>
       
+        <dt> <a href="/2020/07/28/pyflink-pandas-udf-support-flink.html">PyFlink: The integration of Pandas into PyFlink</a></dt>
+        <dd>The Apache Flink community put some great effort into integrating Pandas with PyFlink in the latest Flink version 1.11. Some of the added features include support for Pandas UDF and the conversion between Pandas DataFrame and Table. In this article, we will introduce how these functionalities work and how to use them with a step-by-step example.</dd>
+      
         <dt> <a href="/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html">Flink SQL Demo: Building an End-to-End Streaming Application</a></dt>
         <dd>Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.</dd>
       
@@ -578,11 +581,6 @@
         <dd><p>With an ever-growing number of people working with data, it’s a common practice for companies to build self-service platforms with the goal of democratizing their access across different teams and — especially — to enable users from any background to be independent in their data needs. In such environments, metadata management becomes a crucial aspect. Without it, users often work blindly, spending too much time searching for datasets and their location, figuring out data formats and similar cumbersome tasks.</p>
 
 </dd>
-      
-        <dt> <a href="/news/2020/07/21/release-1.11.1.html">Apache Flink 1.11.1 Released</a></dt>
-        <dd><p>The Apache Flink community released the first bugfix version of the Apache Flink 1.11 series.</p>
-
-</dd>
     
   </dl>
commit	1e216e8661e121dfd4f57c19c66242699d4736ec	[log] [tgz]
author	Dian Fu <dianfu@apache.org>	Tue Aug 04 14:22:33 2020 +0800
committer	Dian Fu <dianfu@apache.org>	Tue Aug 04 14:22:50 2020 +0800
tree	698280dcfa1c616f6eb8a8ef09763da74205ad14
parent	c384ea86b4fae6c2df81fe00bcc7d198f1ea9a2d [diff]