| |
| |
| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <title>Apache Zeppelin 0.12.0 Documentation: Flink Interpreter for Apache Zeppelin</title> |
| <meta name="description" content="Apache Flink is an open source platform for distributed stream and batch data processing."> |
| <meta name="author" content="The Apache Software Foundation"> |
| |
| <!-- Enable responsive viewport --> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <!-- Le HTML5 shim, for IE6-8 support of HTML elements --> |
| <!--[if lt IE 9]> |
| <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script> |
| <![endif]--> |
| |
| <link href="/docs/0.12.0/assets/themes/zeppelin/font-awesome.min.css" rel="stylesheet"> |
| |
| <!-- Le styles --> |
| <link href="/docs/0.12.0/assets/themes/zeppelin/bootstrap/css/bootstrap.css" rel="stylesheet"> |
| <link href="/docs/0.12.0/assets/themes/zeppelin/css/style.css?body=1" rel="stylesheet" type="text/css"> |
| <link href="/docs/0.12.0/assets/themes/zeppelin/css/syntax.css" rel="stylesheet" type="text/css" media="screen" /> |
| <!-- Le fav and touch icons --> |
| <!-- Update these with your own images |
| <link rel="shortcut icon" href="images/favicon.ico"> |
| <link rel="apple-touch-icon" href="images/apple-touch-icon.png"> |
| <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png"> |
| <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png"> |
| --> |
| |
| <!-- Js --> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/jquery-1.10.2.min.js"></script> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/bootstrap/js/bootstrap.min.js"></script> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/js/docs.js"></script> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/js/anchor.min.js"></script> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/js/toc.js"></script> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/js/lunr.min.js"></script> |
| <script src="/docs/0.12.0/assets/themes/zeppelin/js/search.js"></script> |
| |
| <!-- atom & rss feed --> |
| <link href="/docs/0.12.0/atom.xml" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed"> |
| <link href="/docs/0.12.0/rss.xml" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed"> |
| |
| <!-- Matomo --> |
| <script> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| _paq.push(["setDoNotTrack", true]); |
| _paq.push(["disableCookies"]); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '69']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| </head> |
| |
| <body> |
| |
| <div id="menu" class="navbar navbar-inverse navbar-fixed-top" role="navigation"> |
| <div class="container navbar-container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <div class="navbar-brand"> |
| <a class="navbar-brand-main" href="http://zeppelin.apache.org"> |
| <img src="/docs/0.12.0/assets/themes/zeppelin/img/zeppelin_logo.png" width="50" |
| style="margin-top: -2px;" alt="I'm zeppelin"> |
| <span style="margin-left: 5px; font-size: 27px;">Zeppelin</span> |
| <a class="navbar-brand-version" href="/docs/0.12.0" |
| style="font-size: 15px; color: white;"> 0.12.0 |
| </a> |
| </a> |
| </div> |
| </div> |
| <nav class="navbar-collapse collapse" role="navigation"> |
| <ul class="nav navbar-nav"> |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Quick Start <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li class="title"><span>Getting Started</span></li> |
| <li><a href="/docs/0.12.0/quickstart/install.html">Install</a></li> |
| <li><a href="/docs/0.12.0/quickstart/explore_ui.html">Explore UI</a></li> |
| <li><a href="/docs/0.12.0/quickstart/tutorial.html">Tutorial</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Run Mode</span></li> |
| <li><a href="/docs/0.12.0/quickstart/kubernetes.html">Kubernetes</a></li> |
| <li><a href="/docs/0.12.0/quickstart/docker.html">Docker</a></li> |
| <li><a href="/docs/0.12.0/quickstart/yarn.html">Yarn</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.12.0/quickstart/spark_with_zeppelin.html">Spark with Zeppelin</a></li> |
| <li><a href="/docs/0.12.0/quickstart/flink_with_zeppelin.html">Flink with Zeppelin</a></li> |
| <li><a href="/docs/0.12.0/quickstart/sql_with_zeppelin.html">SQL with Zeppelin</a></li> |
| <li><a href="/docs/0.12.0/quickstart/python_with_zeppelin.html">Python with Zeppelin</a></li> |
| <li><a href="/docs/0.12.0/quickstart/r_with_zeppelin.html">R with Zeppelin</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Usage<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Dynamic Form</span></li> |
| <li><a href="/docs/0.12.0/usage/dynamic_form/intro.html">What is Dynamic Form?</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Display System</span></li> |
| <li><a href="/docs/0.12.0/usage/display_system/basic.html#text">Text Display</a></li> |
| <li><a href="/docs/0.12.0/usage/display_system/basic.html#html">HTML Display</a></li> |
| <li><a href="/docs/0.12.0/usage/display_system/basic.html#table">Table Display</a></li> |
| <li><a href="/docs/0.12.0/usage/display_system/basic.html#network">Network Display</a></li> |
| <li><a href="/docs/0.12.0/usage/display_system/angular_backend.html">Angular Display using Backend API</a></li> |
| <li><a href="/docs/0.12.0/usage/display_system/angular_frontend.html">Angular Display using Frontend API</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Interpreter</span></li> |
| <li><a href="/docs/0.12.0/usage/interpreter/overview.html">Overview</a></li> |
| <li><a href="/docs/0.12.0/usage/interpreter/interpreter_binding_mode.html">Interpreter Binding Mode</a></li> |
| <li><a href="/docs/0.12.0/usage/interpreter/user_impersonation.html">User Impersonation</a></li> |
| <li><a href="/docs/0.12.0/usage/interpreter/dependency_management.html">Dependency Management</a></li> |
| <li><a href="/docs/0.12.0/usage/interpreter/installation.html">Installing Interpreters</a></li> |
| <!--<li><a href="/docs/0.12.0/usage/interpreter/dynamic_loading.html">Dynamic Interpreter Loading (Experimental)</a></li>--> |
| <li><a href="/docs/0.12.0/usage/interpreter/execution_hooks.html">Execution Hooks (Experimental)</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Other Features</span></li> |
| <li><a href="/docs/0.12.0/usage/other_features/publishing_paragraphs.html">Publishing Paragraphs</a></li> |
| <li><a href="/docs/0.12.0/usage/other_features/personalized_mode.html">Personalized Mode</a></li> |
| <li><a href="/docs/0.12.0/usage/other_features/customizing_homepage.html">Customizing Zeppelin Homepage</a></li> |
| <li><a href="/docs/0.12.0/usage/other_features/notebook_actions.html">Notebook Actions</a></li> |
| <li><a href="/docs/0.12.0/usage/other_features/cron_scheduler.html">Cron Scheduler</a></li> |
| <li><a href="/docs/0.12.0/usage/other_features/zeppelin_context.html">Zeppelin Context</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>REST API</span></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/interpreter.html">Interpreter API</a></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/zeppelin_server.html">Zeppelin Server API</a></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/notebook.html">Notebook API</a></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/notebook_repository.html">Notebook Repository API</a></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/configuration.html">Configuration API</a></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/credential.html">Credential API</a></li> |
| <li><a href="/docs/0.12.0/usage/rest_api/helium.html">Helium API</a></li> |
| <li class="title"><span>Zeppelin SDK</span></li> |
| <li><a href="/docs/0.12.0/usage/zeppelin_sdk/client_api.html">Client API</a></li> |
| <li><a href="/docs/0.12.0/usage/zeppelin_sdk/session_api.html">Session API</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Setup<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Basics</span></li> |
| <li><a href="/docs/0.12.0/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> |
| <li><a href="/docs/0.12.0/setup/basics/hadoop_integration.html">Hadoop Integration</a></li> |
| <li><a href="/docs/0.12.0/setup/basics/multi_user_support.html">Multi-user Support</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Deployment</span></li> |
| <!--<li><a href="/docs/0.12.0/setup/deployment/docker.html">Docker Image for Zeppelin</a></li>--> |
| <li><a href="/docs/0.12.0/setup/deployment/spark_cluster_mode.html#spark-standalone-mode">Spark Cluster Mode: Standalone</a></li> |
| <li><a href="/docs/0.12.0/setup/deployment/spark_cluster_mode.html#spark-on-yarn-mode">Spark Cluster Mode: YARN</a></li> |
| <li><a href="/docs/0.12.0/setup/deployment/spark_cluster_mode.html#spark-on-mesos-mode">Spark Cluster Mode: Mesos</a></li> |
| <li><a href="/docs/0.12.0/setup/deployment/flink_and_spark_cluster.html">Zeppelin with Flink, Spark Cluster</a></li> |
| <li><a href="/docs/0.12.0/setup/deployment/cdh.html">Zeppelin on CDH</a></li> |
| <li><a href="/docs/0.12.0/setup/deployment/virtual_machine.html">Zeppelin on VM: Vagrant</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Security</span></li> |
| <li><a href="/docs/0.12.0/setup/security/authentication_nginx.html">HTTP Basic Auth using NGINX</a></li> |
| <li><a href="/docs/0.12.0/setup/security/shiro_authentication.html">Shiro Authentication</a></li> |
| <li><a href="/docs/0.12.0/setup/security/notebook_authorization.html">Notebook Authorization</a></li> |
| <li><a href="/docs/0.12.0/setup/security/datasource_authorization.html">Data Source Authorization</a></li> |
| <li><a href="/docs/0.12.0/setup/security/http_security_headers.html">HTTP Security Headers</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Notebook Storage</span></li> |
| <li><a href="/docs/0.12.0/setup/storage/storage.html#notebook-storage-in-local-git-repository">Git Storage</a></li> |
| <li><a href="/docs/0.12.0/setup/storage/storage.html#notebook-storage-in-s3">S3 Storage</a></li> |
| <li><a href="/docs/0.12.0/setup/storage/storage.html#notebook-storage-in-azure">Azure Storage</a></li> |
| <li><a href="/docs/0.12.0/setup/storage/storage.html#notebook-storage-in-google-cloud-storage">Google Cloud Storage</a></li> |
| <li><a href="/docs/0.12.0/setup/storage/storage.html#notebook-storage-in-oss">OSS Storage</a></li> |
| <li><a href="/docs/0.12.0/setup/storage/storage.html#notebook-storage-in-mongodb">MongoDB Storage</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Operation</span></li> |
| <li><a href="/docs/0.12.0/setup/operation/configuration.html">Configuration</a></li> |
| <li><a href="/docs/0.12.0/setup/operation/monitoring.html">Monitoring</a></li> |
| <li><a href="/docs/0.12.0/setup/operation/proxy_setting.html">Proxy Setting</a></li> |
| <li><a href="/docs/0.12.0/setup/operation/upgrading.html">Upgrading</a></li> |
| <li><a href="/docs/0.12.0/setup/operation/trouble_shooting.html">Trouble Shooting</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Interpreter <b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Interpreters</span></li> |
| <li><a href="/docs/0.12.0/usage/interpreter/overview.html">Overview</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.12.0/interpreter/spark.html">Spark</a></li> |
| <li><a href="/docs/0.12.0/interpreter/flink.html">Flink</a></li> |
| <li><a href="/docs/0.12.0/interpreter/jdbc.html">JDBC</a></li> |
| <li><a href="/docs/0.12.0/interpreter/python.html">Python</a></li> |
| <li><a href="/docs/0.12.0/interpreter/r.html">R</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.12.0/interpreter/alluxio.html">Alluxio</a></li> |
| <li><a href="/docs/0.12.0/interpreter/bigquery.html">BigQuery</a></li> |
| <li><a href="/docs/0.12.0/interpreter/cassandra.html">Cassandra</a></li> |
| <li><a href="/docs/0.12.0/interpreter/elasticsearch.html">Elasticsearch</a></li> |
| <li><a href="/docs/0.12.0/interpreter/groovy.html">Groovy</a></li> |
| <li><a href="/docs/0.12.0/interpreter/hbase.html">HBase</a></li> |
| <li><a href="/docs/0.12.0/interpreter/hdfs.html">HDFS</a></li> |
| <li><a href="/docs/0.12.0/interpreter/hive.html">Hive</a></li> |
| <li><a href="/docs/0.12.0/interpreter/influxdb.html">influxDB</a></li> |
| <li><a href="/docs/0.12.0/interpreter/java.html">Java</a></li> |
| <li><a href="/docs/0.12.0/interpreter/jupyter.html">Jupyter</a></li> |
| <li><a href="/docs/0.12.0/interpreter/livy.html">Livy</a></li> |
| <li><a href="/docs/0.12.0/interpreter/mahout.html">Mahout</a></li> |
| <li><a href="/docs/0.12.0/interpreter/markdown.html">Markdown</a></li> |
| <li><a href="/docs/0.12.0/interpreter/mongodb.html">MongoDB</a></li> |
| <li><a href="/docs/0.12.0/interpreter/neo4j.html">Neo4j</a></li> |
| <li><a href="/docs/0.12.0/interpreter/postgresql.html">Postgresql, HAWQ</a></li> |
| <li><a href="/docs/0.12.0/interpreter/shell.html">Shell</a></li> |
| <li><a href="/docs/0.12.0/interpreter/sparql.html">Sparql</a></li> |
| </ul> |
| </li> |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">More<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu" style="right: 0; left: auto;"> |
| <li class="title"><span>Extending Zeppelin</span></li> |
| <li><a href="/docs/0.12.0/development/writing_zeppelin_interpreter.html">Writing Zeppelin Interpreter</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Helium (Experimental)</span></li> |
| <li><a href="/docs/0.12.0/development/helium/overview.html">Overview</a></li> |
| <li><a href="/docs/0.12.0/development/helium/writing_application.html">Writing Helium Application</a></li> |
| <li><a href="/docs/0.12.0/development/helium/writing_spell.html">Writing Helium Spell</a></li> |
| <li><a href="/docs/0.12.0/development/helium/writing_visualization_basic.html">Writing Helium Visualization: Basics</a></li> |
| <li><a href="/docs/0.12.0/development/helium/writing_visualization_transformation.html">Writing Helium Visualization: Transformation</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Contributing to Zeppelin</span></li> |
| <li><a href="/docs/0.12.0/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> |
| <li><a href="/docs/0.12.0/development/contribution/useful_developer_tools.html">Useful Developer Tools</a></li> |
| <li><a href="/docs/0.12.0/development/contribution/how_to_contribute_code.html">How to Contribute (code)</a></li> |
| <li><a href="/docs/0.12.0/development/contribution/how_to_contribute_website.html">How to Contribute (website)</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>External Resources</span></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="https://zeppelin.apache.org/community.html">Mailing List</a></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home">Apache Zeppelin Wiki</a></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="http://stackoverflow.com/questions/tagged/apache-zeppelin">Stackoverflow Questions about Zeppelin</a></li> |
| </ul> |
| </li> |
| <li> |
| <a href="/docs/0.12.0/search.html" class="nav-search-link"> |
| <span class="fa fa-search nav-search-icon"></span> |
| </a> |
| </li> |
| </ul> |
| </nav><!--/.navbar-collapse --> |
| </div> |
| </div> |
| |
| |
| |
| <div class="content"> |
| |
| |
| <!--<div class="hero-unit Flink Interpreter for Apache Zeppelin"> |
| <h1></h1> |
| </div> |
| --> |
| |
| <div class="row"> |
| <div class="col-md-12"> |
| <!-- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <h1>Flink interpreter for Apache Zeppelin</h1> |
| |
| <div id="toc"></div> |
| |
| <h2>Overview</h2> |
| |
| <p><a href="https://flink.apache.org">Apache Flink</a> is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. |
| Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.</p> |
| |
| <p>In Zeppelin 0.9, we refactor the Flink interpreter in Zeppelin to support the latest version of Flink. <strong>Currently, only Flink 1.15+ is supported, old versions of flink won't work.</strong> |
| Apache Flink is supported in Zeppelin with the Flink interpreter group which consists of the five interpreters listed below.</p> |
| |
| <table class="table-configuration"> |
| <tr> |
| <th>Name</th> |
| <th>Class</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td>%flink</td> |
| <td>FlinkInterpreter</td> |
| <td>Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment</td> |
| </tr> |
| <tr> |
| <td>%flink.pyflink</td> |
| <td>PyFlinkInterpreter</td> |
| <td>Provides a python environment</td> |
| </tr> |
| <tr> |
| <td>%flink.ipyflink</td> |
| <td>IPyFlinkInterpreter</td> |
| <td>Provides an ipython environment</td> |
| </tr> |
| <tr> |
| <td>%flink.ssql</td> |
| <td>FlinkStreamSqlInterpreter</td> |
| <td>Provides a stream sql environment</td> |
| </tr> |
| <tr> |
| <td>%flink.bsql</td> |
| <td>FlinkBatchSqlInterpreter</td> |
| <td>Provides a batch sql environment</td> |
| </tr> |
| </table> |
| |
| <h2>Main Features</h2> |
| |
| <table class="table-configuration"> |
| <tr> |
| <th>Feature</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td>Support multiple versions of Flink</td> |
| <td>You can run different versions of Flink in one Zeppelin instance</td> |
| </tr> |
| <tr> |
| <td>Support multiple languages</td> |
| <td>Scala, Python, SQL are supported, besides that you can also collaborate across languages, e.g. you can write Scala UDF and use it in PyFlink</td> |
| </tr> |
| <tr> |
| <td>Support multiple execution modes</td> |
| <td>Local | Remote | Yarn | Yarn Application</td> |
| </tr> |
| <tr> |
| <td>Support Hive</td> |
| <td>Hive catalog is supported</td> |
| </tr> |
| <tr> |
| <td>Interactive development</td> |
| <td>Interactive development user experience increase your productivity</td> |
| </tr> |
| <tr> |
| <td>Enhancement on Flink SQL</td> |
| <td>* Support both streaming sql and batch sql in one notebook <br/> |
| * Support sql comment (single line comment/multiple line comment) <br/> |
| * Support advanced configuration (jobName, parallelism) <br/> |
| * Support multiple insert statements |
| </td> |
| </tr> |
| <td>Multi-tenancy</td> |
| <td>Multiple user can work in one Zeppelin instance without affecting each other.</td> |
| </tr> |
| |
| </tr> |
| <td>Rest API Support</td> |
| <td>You can not only submit Flink job via Zeppelin notebook UI, but also can do that via its rest api (You can use Zeppelin as Flink job server).</td> |
| </tr> |
| </table> |
| |
| <h2>Play Flink in Zeppelin docker</h2> |
| |
| <p>For beginner, we would suggest you to play Flink in Zeppelin docker. |
| First you need to download Flink, because there's no Flink binary distribution shipped with Zeppelin. |
| e.g. Here we download Flink 1.12.2 to<code>/mnt/disk1/flink-1.12.2</code>, |
| and we mount it to Zeppelin docker container and run the following command to start Zeppelin docker.</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash">docker run <span class="nt">-u</span> <span class="si">$(</span><span class="nb">id</span> <span class="nt">-u</span><span class="si">)</span> <span class="nt">-p</span> 8080:8080 <span class="nt">-p</span> 8081:8081 <span class="nt">--rm</span> <span class="nt">-v</span> /mnt/disk1/flink-1.12.2:/opt/flink <span class="nt">-e</span> <span class="nv">FLINK_HOME</span><span class="o">=</span>/opt/flink <span class="nt">--name</span> zeppelin apache/zeppelin:0.10.0 |
| </code></pre></div> |
| <p>After running the above command, you can open <code>http://localhost:8080</code> to play Flink in Zeppelin. We only verify the flink local mode in Zeppelin docker, other modes may not due to network issues. |
| <code>-p 8081:8081</code> is to expose Flink web ui, so that you can access Flink web ui via <code>http://localhost:8081</code>.</p> |
| |
| <p>Here's screenshot of running note <code>Flink Tutorial/5. Streaming Data Analytics</code></p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_docker_tutorial.gif"></p> |
| |
| <p>You can also mount notebook folder to replace the built-in zeppelin tutorial notebook. |
| e.g. Here's a repo of Flink sql cookbook on Zeppelin: <a href="https://github.com/zjffdu/flink-sql-cookbook-on-zeppelin/">https://github.com/zjffdu/flink-sql-cookbook-on-zeppelin/</a></p> |
| |
| <p>You can clone this repo and mount it to docker,</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">docker run -u $(id -u) -p 8080:8080 --rm -v /mnt/disk1/flink-sql-cookbook-on-zeppelin:/notebook -v /mnt/disk1/flink-1.12.2:/opt/flink -e FLINK_HOME=/opt/flink -e ZEPPELIN_NOTEBOOK_DIR='/notebook' --name zeppelin apache/zeppelin:0.10.0 |
| </code></pre></div> |
| <h2>Prerequisites</h2> |
| |
| <p>Download Flink 1.15 or afterwards (Only Scala 2.12 is supported)</p> |
| |
| <h3>Version-specific notes for Flink</h3> |
| |
| <p>Flink 1.15 is scala free and has changed its binary distribution, the following extra steps is required. |
| * Move FLINK<em>HOME/opt/flink-table-planner</em>2.12-1.15.0.jar to FLINK<em>HOME/lib |
| * Move FLINK</em>HOME/lib/flink-table-planner-loader-1.15.0.jar to FLINK<em>HOME/opt |
| * Download flink-table-api-scala-bridge</em>2.12-1.15.0.jar and flink-table-api-scala<em>2.12-1.15.0.jar to FLINK</em>HOME/lib</p> |
| |
| <p>Flink 1.16 introduces new <code>ClientResourceManager</code> for sql client, you need to move <code>FLINK_HOME/opt/flink-sql-client-1.16.0.jar</code> to <code>FLINK_HOME/lib</code></p> |
| |
| <h2>Flink on Zeppelin Architecture</h2> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_architecture.png"></p> |
| |
| <p>The above diagram is the architecture of Flink on Zeppelin. Flink interpreter on the left side is actually a Flink client |
| which is responsible for compiling and managing Flink job lifecycle, such as submit, cancel job, |
| monitoring job progress and so on. The Flink cluster on the right side is the place where executing Flink job. |
| It could be a MiniCluster (local mode), Standalone cluster (remote mode), |
| Yarn session cluster (yarn mode) or Yarn application session cluster (yarn-application mode)</p> |
| |
| <p>There are 2 important components in Flink interpreter: Scala shell & Python shell</p> |
| |
| <ul> |
| <li>Scala shell is the entry point of Flink interpreter, it would create all the entry points of Flink program, such as ExecutionEnvironment,StreamExecutionEnvironment and TableEnvironment. Scala shell is responsible for compiling and running Scala code and sql.</li> |
| <li>Python shell is the entry point of PyFlink, it is responsible for compiling and running Python code.</li> |
| </ul> |
| |
| <h2>Configuration</h2> |
| |
| <p>The Flink interpreter can be configured with properties provided by Zeppelin (as following table). |
| You can also add and set other Flink properties which are not listed in the table. For a list of additional properties, refer to <a href="https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html">Flink Available Properties</a>. |
| <table class="table-configuration"> |
| <tr> |
| <th>Property</th> |
| <th>Default</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td><code>FLINK_HOME</code></td> |
| <td></td> |
| <td>Location of Flink installation. It is must be specified, otherwise you can not use Flink in Zeppelin</td> |
| </tr> |
| <tr> |
| <td><code>HADOOP_CONF_DIR</code></td> |
| <td></td> |
| <td>Location of hadoop conf, this is must be set if running in yarn mode</td> |
| </tr> |
| <tr> |
| <td><code>HIVE_CONF_DIR</code></td> |
| <td></td> |
| <td>Location of hive conf, this is must be set if you want to connect to hive metastore</td> |
| </tr> |
| <tr> |
| <td>flink.execution.mode</td> |
| <td>local</td> |
| <td>Execution mode of Flink, e.g. local | remote | yarn | yarn-application</td> |
| </tr> |
| <tr> |
| <td>flink.execution.remote.host</td> |
| <td></td> |
| <td>Host name of running JobManager. Only used for remote mode</td> |
| </tr> |
| <tr> |
| <td>flink.execution.remote.port</td> |
| <td></td> |
| <td>Port of running JobManager. Only used for remote mode</td> |
| </tr> |
| <tr> |
| <td>jobmanager.memory.process.size</td> |
| <td>1024m</td> |
| <td>Total memory size of JobManager, e.g. 1024m. It is official <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/">Flink property</a></td> |
| </tr> |
| <tr> |
| <td>taskmanager.memory.process.size</td> |
| <td>1024m</td> |
| <td>Total memory size of TaskManager, e.g. 1024m. It is official <a href="https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/">Flink property</a></td> |
| </tr> |
| <tr> |
| <td>taskmanager.numberOfTaskSlots</td> |
| <td>1</td> |
| <td>Number of slot per TaskManager</td> |
| </tr> |
| <tr> |
| <td>local.number-taskmanager</td> |
| <td>4</td> |
| <td>Total number of TaskManagers in local mode</td> |
| </tr> |
| <tr> |
| <td>yarn.application.name</td> |
| <td>Zeppelin Flink Session</td> |
| <td>Yarn app name</td> |
| </tr> |
| <tr> |
| <td>yarn.application.queue</td> |
| <td>default</td> |
| <td>queue name of yarn app</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.uiWebUrl</td> |
| <td></td> |
| <td>User specified Flink JobManager url, it could be used in remote mode where Flink cluster is already started, or could be used as url template, e.g. https://knox-server:8443/gateway/cluster-topo/yarn/proxy/{{applicationId}}/ where {{applicationId}} is placeholder of yarn app id</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.run.asLoginUser</td> |
| <td>true</td> |
| <td>Whether run Flink job as the Zeppelin login user, it is only applied when running Flink job in hadoop yarn cluster and shiro is enabled</td> |
| </tr> |
| <tr> |
| <td>flink.udf.jars</td> |
| <td></td> |
| <td>Flink udf jars (comma separated), Zeppelin will register udf in these jars automatically for user. These udf jars could be either local files or hdfs files if you have hadoop installed. The udf name is the class name.</td> |
| </tr> |
| <tr> |
| <td>flink.udf.jars.packages</td> |
| <td></td> |
| <td>Packages (comma separated) that would be searched for the udf defined in <code>flink.udf.jars</code>. Specifying this can reduce the number of classes to scan, otherwise all the classes in udf jar will be scanned.</td> |
| </tr> |
| <tr> |
| <td>flink.execution.jars</td> |
| <td></td> |
| <td>Additional user jars (comma separated), these jars could be either local files or hdfs files if you have hadoop installed. It can be used to specify Flink connector jars or udf jars (no udf class auto-registration like <code>flink.udf.jars</code>)</td> |
| </tr> |
| <tr> |
| <td>flink.execution.packages</td> |
| <td></td> |
| <td>Additional user packages (comma separated), e.g. <code>org.apache.flink:flink-json:1.10.0</code></td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.concurrentBatchSql.max</td> |
| <td>10</td> |
| <td>Max concurrent sql of Batch Sql (<code>%flink.bsql</code>)</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.concurrentStreamSql.max</td> |
| <td>10</td> |
| <td>Max concurrent sql of Stream Sql (<code>%flink.ssql</code>)</td> |
| </tr> |
| <tr> |
| <td>zeppelin.pyflink.python</td> |
| <td>python</td> |
| <td>Python binary executable for PyFlink</td> |
| </tr> |
| <tr> |
| <td>table.exec.resource.default-parallelism</td> |
| <td>1</td> |
| <td>Default parallelism for Flink sql job</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.scala.color</td> |
| <td>true</td> |
| <td>Whether display Scala shell output in colorful format</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.scala.shell.tmp<em>dir</td> |
| <td></td> |
| <td>Temp folder for storing scala shell compiled jar</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.enableHive</td> |
| <td>false</td> |
| <td>Whether enable hive</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.hive.version</td> |
| <td>2.3.7</td> |
| <td>Hive version that you would like to connect</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.module.enableHive</td> |
| <td>false</td> |
| <td>Whether enable hive module, hive udf take precedence over Flink udf if hive module is enabled.</td> |
| </tr> |
| <tr> |
| <td>zeppelin.flink.maxResult</td> |
| <td>1000</td> |
| <td>max number of row returned by sql interpreter</td> |
| </tr> |
| <tr> |
| <td>`zeppelin.flink.job.check</em>interval<code></td> |
| <td>1000</td> |
| <td>Check interval (in milliseconds) to check Flink job progress</td> |
| </tr> |
| <tr> |
| <td></code>flink.interpreter.close.shutdown<em>cluster<code></td> |
| <td>true</td> |
| <td>Whether shutdown Flink cluster when closing interpreter</td> |
| </tr> |
| <tr> |
| <td></code>zeppelin.interpreter.close.cancel</em>job`</td> |
| <td>true</td> |
| <td>Whether cancel Flink job when closing interpreter</td> |
| </tr> |
| </table></p> |
| |
| <h2>Interpreter Binding Mode</h2> |
| |
| <p>The default <a href="../usage/interpreter/interpreter_binding_mode.html">interpreter binding mode</a> is <code>globally shared</code>. That means all notes share the same Flink interpreter which means they share the same Flink cluster. |
| In practice, we would recommend you to use <code>isolated per note</code> which means each note has own Flink interpreter without affecting each other (Each one has his own Flink cluster). </p> |
| |
| <h2>Execution Mode</h2> |
| |
| <p>Flink in Zeppelin supports 4 execution modes (<code>flink.execution.mode</code>):</p> |
| |
| <ul> |
| <li>Local</li> |
| <li>Remote</li> |
| <li>Yarn</li> |
| <li>Yarn Application</li> |
| </ul> |
| |
| <h3>Local Mode</h3> |
| |
| <p>Running Flink in local mode will start a MiniCluster in local JVM. By default, the local MiniCluster use port 8081, so make sure this port is available in your machine, |
| otherwise you can configure <code>rest.port</code> to specify another port. You can also specify <code>local.number-taskmanager</code> and <code>flink.tm.slot</code> to customize the number of TM and number of slots per TM. |
| Because by default it is only 4 TM with 1 slot in this MiniCluster which may not be enough for some cases.</p> |
| |
| <h3>Remote Mode</h3> |
| |
| <p>Running Flink in remote mode will connect to an existing Flink cluster which could be standalone cluster or yarn session cluster. Besides specifying <code>flink.execution.mode</code> to be <code>remote</code>, you also need to specify |
| <code>flink.execution.remote.host</code> and <code>flink.execution.remote.port</code> to point to Flink job manager's rest api address.</p> |
| |
| <h3>Yarn Mode</h3> |
| |
| <p>In order to run Flink in Yarn mode, you need to make the following settings:</p> |
| |
| <ul> |
| <li>Set <code>flink.execution.mode</code> to be <code>yarn</code></li> |
| <li>Set <code>HADOOP_CONF_DIR</code> in Flink's interpreter setting or <code>zeppelin-env.sh</code>.</li> |
| <li>Make sure <code>hadoop</code> command is on your <code>PATH</code>. Because internally Flink will call command <code>hadoop classpath</code> and load all the hadoop related jars in the Flink interpreter process</li> |
| </ul> |
| |
| <p>In this mode, Zeppelin would launch a Flink yarn session cluster for you and destroy it when you shutdown your Flink interpreter.</p> |
| |
| <h3>Yarn Application Mode</h3> |
| |
| <p>In the above yarn mode, there will be a separated Flink interpreter process on the Zeppelin server host. However, this may run out of resources when there are too many interpreter processes. |
| So in practise, we would recommend you to use yarn application mode if you are using Flink 1.11 or afterwards (yarn application mode is only supported after Flink 1.11). |
| In this mode Flink interpreter runs in the JobManager which is in yarn container. |
| In order to run Flink in yarn application mode, you need to make the following settings:</p> |
| |
| <ul> |
| <li>Set <code>flink.execution.mode</code> to be <code>yarn-application</code></li> |
| <li>Set <code>HADOOP_CONF_DIR</code> in Flink's interpreter setting or <code>zeppelin-env.sh</code>.</li> |
| <li>Make sure <code>hadoop</code> command is on your <code>PATH</code>. Because internally flink will call command <code>hadoop classpath</code> and load all the hadoop related jars in Flink interpreter process</li> |
| </ul> |
| |
| <h2>Flink Scala</h2> |
| |
| <p>Scala is the default language of Flink on Zeppelin(<code>%flink</code>), and it is also the entry point of Flink interpreter. Underneath Flink interpreter will create Scala shell |
| which would create several built-in variables, including ExecutionEnvironment,StreamExecutionEnvironment and so on. |
| So don't create these Flink environment variables again, otherwise you might hit weird issues. The Scala code you write in Zeppelin will be submitted to this Scala shell.<br> |
| Here are the builtin variables created in Flink Scala shell.</p> |
| |
| <ul> |
| <li>senv (StreamExecutionEnvironment),</li> |
| <li>benv (ExecutionEnvironment)</li> |
| <li>stenv (StreamTableEnvironment for blink planner (aka. new planner))</li> |
| <li>btenv (BatchTableEnvironment for blink planner (aka. new planner))</li> |
| <li>z (ZeppelinContext)</li> |
| </ul> |
| |
| <h3>Blink/Flink Planner</h3> |
| |
| <p>After Zeppelin 0.11, we remove the support of flink planner (aka. old planner) which is also removed after Flink 1.14.</p> |
| |
| <h3>Stream WordCount Example</h3> |
| |
| <p>You can write whatever Scala code in Zeppelin. </p> |
| |
| <p>e.g. in the following example, we write a classical streaming wordcount example.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_streaming_wordcount.png" width="80%"></p> |
| |
| <h3>Code Completion</h3> |
| |
| <p>You can type tab for code completion.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_scala_codecompletion.png" width="80%"></p> |
| |
| <h3>ZeppelinContext</h3> |
| |
| <p><code>ZeppelinContext</code> provides some additional functions and utilities. |
| See <a href="../usage/other_features/zeppelin_context.html">Zeppelin-Context</a> for more details. |
| For Flink interpreter, you can use <code>z</code> to display Flink <code>Dataset/Table</code>. </p> |
| |
| <p>e.g. you can use <code>z.show</code> to display DataSet, Batch Table, Stream Table.</p> |
| |
| <ul> |
| <li>z.show(DataSet)</li> |
| </ul> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png"></p> |
| |
| <ul> |
| <li>z.show(Batch Table)</li> |
| </ul> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png"></p> |
| |
| <ul> |
| <li>z.show(Stream Table)</li> |
| </ul> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif"></p> |
| |
| <h2>Flink SQL</h2> |
| |
| <p>In Zeppelin, there are 2 kinds of Flink sql interpreter you can use</p> |
| |
| <ul> |
| <li><code>%flink.ssql</code> |
| Streaming Sql interpreter which launch Flink streaming job via <code>StreamTableEnvironment</code></li> |
| <li><code>%flink.bsql</code> |
| Batch Sql interpreter which launch Flink batch job via <code>BatchTableEnvironment</code></li> |
| </ul> |
| |
| <p>Flink Sql interpreter in Zeppelin is equal to Flink Sql-client + many other enhancement features.</p> |
| |
| <h3>Enhancement SQL Features</h3> |
| |
| <h4>Support batch SQL and streaming sql together.</h4> |
| |
| <p>In Flink Sql-client, either you run streaming sql or run batch sql in one session. You can not run them together. |
| But in Zeppelin, you can do that. <code>%flink.ssql</code> is used for running streaming sql, while <code>%flink.bsql</code> is used for running batch sql. |
| Batch/Streaming Flink jobs run in the same Flink session cluster.</p> |
| |
| <h4>Support multiple statements</h4> |
| |
| <p>You can write multiple sql statements in one paragraph, each sql statement is separated by semicolon. </p> |
| |
| <h4>Comment support</h4> |
| |
| <p>2 kinds of sql comments are supported in Zeppelin:</p> |
| |
| <ul> |
| <li>Single line comment start with <code>--</code></li> |
| <li>Multiple line comment around with <code>/* */</code></li> |
| </ul> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_sql_comment.png"></p> |
| |
| <h4>Job parallelism setting</h4> |
| |
| <p>You can set the sql parallelism via paragraph local property: <code>parallelism</code></p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_sql_parallelism.png"></p> |
| |
| <h4>Support multiple insert</h4> |
| |
| <p>Sometimes you have multiple insert statements which read the same source, |
| but write to different sinks. By default, each insert statement would launch a separated Flink job, |
| but you can set paragraph local property: <code>runAsOne</code> to be <code>true</code> to run them in one single Flink job.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_sql_multiple_insert.png"></p> |
| |
| <h4>Set job name</h4> |
| |
| <p>You can set Flink job name for insert statement via setting paragraph local property: <code>jobName</code>. To be noticed, |
| you can only set job name for insert statement. Select statement is not supported yet. |
| And this kind of setting only works for single insert statement. It doesn't work for multiple insert we mentioned above.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_sql_jobname.png"></p> |
| |
| <h3>Streaming Data Visualization</h3> |
| |
| <p>Zeppelin can visualize the select sql result of Flink streaming job. Overall it supports 3 modes:</p> |
| |
| <ul> |
| <li>Single</li> |
| <li>Update</li> |
| <li>Append</li> |
| </ul> |
| |
| <h4>Single Mode</h4> |
| |
| <p>Single mode is for the case when the result of sql statement is always one row, |
| such as the following example. The output format is HTML, |
| and you can specify paragraph local property <code>template</code> for the final output content template. |
| You can use <code>{i}</code> as placeholder for the <code>ith</code> column of result.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif"></p> |
| |
| <h4>Update Mode</h4> |
| |
| <p>Update mode is suitable for the case when the output is more than one rows, and will always be updated continuously. |
| Here’s one example where we use group by.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif"></p> |
| |
| <h4>Append Mode</h4> |
| |
| <p>Append mode is suitable for the scenario where output data is always appended. |
| E.g. the following example which use tumble window.</p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif"></p> |
| |
| <h2>PyFlink</h2> |
| |
| <p>PyFlink is Python entry point of Flink on Zeppelin, internally Flink interpreter will create Python shell which |
| would create Flink's environment variables (including ExecutionEnvironment, StreamExecutionEnvironment and so on). |
| To be noticed, the java environment behind Pyflink is created in Scala shell. |
| That means underneath Scala shell and Python shell share the same environment. |
| These are variables created in Python shell.</p> |
| |
| <ul> |
| <li><code>s_env</code> (StreamExecutionEnvironment),</li> |
| <li><code>b_env</code> (ExecutionEnvironment)</li> |
| <li><code>st_env</code> (StreamTableEnvironment for blink planner (aka. new planner))</li> |
| <li><code>bt_env</code> (BatchTableEnvironment for blink planner (aka. new planner))</li> |
| </ul> |
| |
| <h3>Configure PyFlink</h3> |
| |
| <p>There are 3 things you need to configure to make Pyflink work in Zeppelin.</p> |
| |
| <ul> |
| <li>Install pyflink |
| e.g. ( pip install apache-flink==1.11.1 ). |
| If you need to use Pyflink udf, then you to install pyflink on all the task manager nodes. That means if you are using yarn, then all the yarn nodes need to install pyflink.</li> |
| <li>Copy <code>python</code> folder under <code>${FLINK_HOME}/opt</code> to <code>${FLINK_HOME/lib</code>.</li> |
| <li>Set <code>zeppelin.pyflink.python</code> as the python executable path. By default, it is the python in <code>PATH</code>. In case you have multiple versions of python installed, you need to configure <code>zeppelin.pyflink.python</code> as the python version you want to use.</li> |
| </ul> |
| |
| <h3>How to use PyFlink</h3> |
| |
| <p>There are 2 ways to use PyFlink in Zeppelin</p> |
| |
| <ul> |
| <li><code>%flink.pyflink</code></li> |
| <li><code>%flink.ipyflink</code></li> |
| </ul> |
| |
| <p><code>%flink.pyflink</code> is much simple and easy, you don't need to do anything except the above setting, |
| but its function is also limited. We suggest you to use <code>%flink.ipyflink</code> which provides almost the same user experience like jupyter.</p> |
| |
| <h3>Configure IPyFlink</h3> |
| |
| <p>If you don't have anaconda installed, then you need to install the following 3 libraries.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">pip install jupyter |
| pip install grpcio |
| pip install protobuf |
| </code></pre></div> |
| <p>If you have anaconda installed, then you only need to install following 2 libraries.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">pip install grpcio |
| pip install protobuf |
| </code></pre></div> |
| <p><code>ZeppelinContext</code> is also available in PyFlink, you can use it almost the same as in Flink Scala.</p> |
| |
| <p>Check the <a href="python.html">Python doc</a> for more features of IPython.</p> |
| |
| <h2>Third party dependencies</h2> |
| |
| <p>It is very common to have third party dependencies when you write Flink job in whatever languages (Scala, Python, Sql). |
| It is very easy to add dependencies in IDE (e.g. add dependency in pom.xml), |
| but how can you do that in Zeppelin ? Mainly there are 2 settings you can use to add third party dependencies</p> |
| |
| <ul> |
| <li>flink.execution.packages</li> |
| <li>flink.execution.jars</li> |
| </ul> |
| |
| <h3>flink.execution.packages</h3> |
| |
| <p>This is the recommended way of adding dependencies. Its implementation is the same as adding |
| dependencies in <code>pom.xml</code>. Underneath it would download all the packages and its transitive dependencies |
| from maven repository, then put them on the classpath. Here's one example of how to add kafka connector of Flink 1.10 via <a href="../usage/interpreter/overview.html#inline-generic-configuration">inline configuration</a>.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">%flink.conf |
| |
| flink.execution.packages org.apache.flink:flink-connector-kafka_2.11:1.10.0,org.apache.flink:flink-connector-kafka-base_2.11:1.10.0,org.apache.flink:flink-json:1.10.0 |
| </code></pre></div> |
| <p>The format is <code>artifactGroup:artifactId:version</code>, if you have multiple packages, |
| then separate them with comma. <code>flink.execution.packages</code> requires internet accessible. |
| So if you can not access internet, you need to use <code>flink.execution.jars</code> instead.</p> |
| |
| <h3>flink.execution.jars</h3> |
| |
| <p>If your Zeppelin machine can not access internet or your dependencies are not deployed to maven repository, |
| then you can use <code>flink.execution.jars</code> to specify the jar files you depend on (each jar file is separated with comma)</p> |
| |
| <p>Here's one example of how to add kafka dependencies(including kafka connector and its transitive dependencies) via <code>flink.execution.jars</code></p> |
| <div class="highlight"><pre><code class="language-" data-lang="">%flink.conf |
| |
| flink.execution.jars /usr/lib/flink-kafka/target/flink-kafka-1.0-SNAPSHOT.jar |
| </code></pre></div> |
| <h2>Flink UDF</h2> |
| |
| <p>There are 4 ways you can define UDF in Zeppelin.</p> |
| |
| <ul> |
| <li>Write Scala UDF</li> |
| <li>Write PyFlink UDF</li> |
| <li>Create UDF via SQL</li> |
| <li>Configure udf jar via flink.udf.jars</li> |
| </ul> |
| |
| <h3>Scala UDF</h3> |
| <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="o">%</span><span class="n">flink</span> |
| |
| <span class="k">class</span> <span class="nc">ScalaUpper</span> <span class="k">extends</span> <span class="nc">ScalarFunction</span> <span class="o">{</span> |
| <span class="k">def</span> <span class="nf">eval</span><span class="o">(</span><span class="n">str</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span> <span class="k">=</span> <span class="nv">str</span><span class="o">.</span><span class="py">toUpperCase</span> |
| <span class="o">}</span> |
| |
| <span class="nv">btenv</span><span class="o">.</span><span class="py">registerFunction</span><span class="o">(</span><span class="s">"scala_upper"</span><span class="o">,</span> <span class="k">new</span> <span class="nc">ScalaUpper</span><span class="o">())</span> |
| </code></pre></div> |
| <p>It is very straightforward to define scala udf almost the same as what you do in IDE. |
| After creating udf class, you need to register it via <code>btenv</code>. |
| You can also register it via <code>stenv</code> which share the same Catalog with <code>btenv</code>.</p> |
| |
| <h3>Python UDF</h3> |
| <div class="highlight"><pre><code class="language-python" data-lang="python"> |
| <span class="o">%</span><span class="n">flink</span><span class="p">.</span><span class="n">pyflink</span> |
| |
| <span class="k">class</span> <span class="nc">PythonUpper</span><span class="p">(</span><span class="n">ScalarFunction</span><span class="p">):</span> |
| <span class="k">def</span> <span class="nf">eval</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span> |
| <span class="k">return</span> <span class="n">s</span><span class="p">.</span><span class="n">upper</span><span class="p">()</span> |
| |
| <span class="n">bt_env</span><span class="p">.</span><span class="n">register_function</span><span class="p">(</span><span class="s">"python_upper"</span><span class="p">,</span> <span class="n">udf</span><span class="p">(</span><span class="n">PythonUpper</span><span class="p">(),</span> <span class="n">DataTypes</span><span class="p">.</span><span class="n">STRING</span><span class="p">(),</span> <span class="n">DataTypes</span><span class="p">.</span><span class="n">STRING</span><span class="p">()))</span> |
| |
| </code></pre></div> |
| <p>It is also very straightforward to define Python udf almost the same as what you do in IDE. |
| After creating udf class, you need to register it via <code>bt_env</code>. |
| You can also register it via <code>st_env</code> which share the same Catalog with <code>bt_env</code>.</p> |
| |
| <h3>UDF via SQL</h3> |
| |
| <p>Some simple udf can be written in Zeppelin. But if the udf logic is very complicated, |
| then it is better to write it in IDE, then register it in Zeppelin as following</p> |
| <div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="o">%</span><span class="n">flink</span><span class="p">.</span><span class="n">ssql</span> |
| |
| <span class="k">CREATE</span> <span class="k">FUNCTION</span> <span class="n">myupper</span> <span class="k">AS</span> <span class="s1">'org.apache.zeppelin.flink.udf.JavaUpper'</span><span class="p">;</span> |
| </code></pre></div> |
| <p>But this kind of approach requires the udf jar must be on <code>CLASSPATH</code>, |
| so you need to configure <code>flink.execution.jars</code> to include this udf jar on <code>CLASSPATH</code>, such as following:</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">%flink.conf |
| |
| flink.execution.jars /usr/lib/flink-udf-1.0-SNAPSHOT.jar |
| </code></pre></div> |
| <h3>flink.udf.jars</h3> |
| |
| <p>The above 3 approaches all have some limitations:</p> |
| |
| <ul> |
| <li>It is suitable to write simple Scala udf or Python udf in Zeppelin, but not suitable to write very complicated udf in Zeppelin. Because notebook doesn't provide advanced features compared to IDE, such as package management, code navigation and etc.</li> |
| <li>It is not easy to share the udf between notes or users, you have to run the paragraph of defining udf in each flink interpreter.</li> |
| </ul> |
| |
| <p>So when you have many udfs or udf logic is very complicated and you don't want to register them by yourself every time, then you can use <code>flink.udf.jars</code></p> |
| |
| <ul> |
| <li>Step 1. Create a udf project in your IDE, write your udf there.</li> |
| <li>Step 2. Set <code>flink.udf.jars</code> to point to the udf jar you build from your udf project</li> |
| </ul> |
| |
| <p>For example,</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">%flink.conf |
| |
| flink.execution.jars /usr/lib/flink-udf-1.0-SNAPSHOT.jar |
| </code></pre></div> |
| <p>Zeppelin would scan this jar, find out all the udf classes and then register them automatically for you. |
| The udf name is the class name. For example, here's the output of show functions after specifing the above udf jars in <code>flink.udf.jars</code></p> |
| |
| <p><img src="/docs/0.12.0/assets/themes/zeppelin/img/docs-img/flink_udf_jars.png"></p> |
| |
| <p>By default, Zeppelin would scan all the classes in this jar, |
| so it would be pretty slow if your jar is very big specially when your udf jar has other dependencies. |
| So in this case we would recommend you to specify <code>flink.udf.jars.packages</code> to specify the package to scan, |
| this can reduce the number of classes to scan and make the udf detection much faster.</p> |
| |
| <h2>How to use Hive</h2> |
| |
| <p>In order to use Hive in Flink, you have to make the following settings.</p> |
| |
| <ul> |
| <li>Set <code>zeppelin.flink.enableHive</code> to be true</li> |
| <li>Set <code>zeppelin.flink.hive.version</code> to be the hive version you are using.</li> |
| <li>Set <code>HIVE_CONF_DIR</code> to be the location where <code>hive-site.xml</code> is located. Make sure hive metastore is started and you have configured <code>hive.metastore.uris</code> in <code>hive-site.xml</code></li> |
| <li>Copy the following dependencies to the lib folder of flink installation. |
| |
| <ul> |
| <li>flink-connector-hive_2.11–*.jar</li> |
| <li>flink-hadoop-compatibility_2.11–*.jar</li> |
| <li>hive-exec-2.x.jar (for hive 1.x, you need to copy hive-exec-1.x.jar, hive-metastore-1.x.jar, libfb303–0.9.2.jar and libthrift-0.9.2.jar)</li> |
| </ul></li> |
| </ul> |
| |
| <h2>Paragraph local properties</h2> |
| |
| <p>In the section of <code>Streaming Data Visualization</code>, we demonstrate the different visualization type via paragraph local properties: <code>type</code>. |
| In this section, we will list and explain all the supported local properties in Flink interpreter.</p> |
| |
| <table class="table-configuration"> |
| <tr> |
| <th>Property</th> |
| <th>Default</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td>type</td> |
| <td></td> |
| <td>Used in %flink.ssql to specify the streaming visualization type (single, update, append)</td> |
| </tr> |
| <tr> |
| <td>refreshInterval</td> |
| <td>3000</td> |
| <td>Used in `%flink.ssql` to specify frontend refresh interval for streaming data visualization.</td> |
| </tr> |
| <tr> |
| <td>template</td> |
| <td>{0}</td> |
| <td>Used in `%flink.ssql` to specify html template for `single` type of streaming data visualization, And you can use `{i}` as placeholder for the {i}th column of the result.</td> |
| </tr> |
| <tr> |
| <td>parallelism</td> |
| <td></td> |
| <td>Used in %flink.ssql & %flink.bsql to specify the flink sql job parallelism</td> |
| </tr> |
| <tr> |
| <td>maxParallelism</td> |
| <td></td> |
| <td>Used in %flink.ssql & %flink.bsql to specify the flink sql job max parallelism in case you want to change parallelism later. For more details, refer this [link](https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/parallel.html#setting-the-maximum-parallelism) </td> |
| </tr> |
| <tr> |
| <td>savepointDir</td> |
| <td></td> |
| <td>If you specify it, then when you cancel your flink job in Zeppelin, it would also do savepoint and store state in this directory. And when you resume your job, it would resume from this savepoint.</td> |
| </tr> |
| <tr> |
| <td>execution.savepoint.path</td> |
| <td></td> |
| <td>When you resume your job, it would resume from this savepoint path.</td> |
| </tr> |
| <tr> |
| <td>resumeFromSavepoint</td> |
| <td></td> |
| <td>Resume flink job from savepoint if you specify savepointDir.</td> |
| </tr> |
| <tr> |
| <td>resumeFromLatestCheckpoint</td> |
| <td></td> |
| <td>Resume flink job from latest checkpoint if you enable checkpoint.</td> |
| </tr> |
| <tr> |
| <td>runAsOne</td> |
| <td>false</td> |
| <td>All the insert into sql will run in a single flink job if this is true.</td> |
| </tr> |
| </table> |
| |
| <h2>Tutorial Notes</h2> |
| |
| <p>Zeppelin is shipped with several Flink tutorial notes which may be helpful for you. You can check for more features in the tutorial notes.</p> |
| |
| <h2>Community</h2> |
| |
| <p><a href="http://zeppelin.apache.org/community.html">Join our community</a> to discuss with others.</p> |
| |
| </div> |
| </div> |
| |
| |
| <hr> |
| <footer> |
| <!-- <p>© 2025 The Apache Software Foundation</p>--> |
| </footer> |
| </div> |
| |
| |
| |
| |
| |
| |
| |
| </body> |
| </html> |
| |