| { |
| "paragraphs": [ |
| { |
| "title": "Introduction", |
| "text": "%md\n\n# Introduction\n\n[Apache Flink](https://flink.apache.org/) is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode. \n\nThere\u0027re 3 things you need to do before using flink in Zeppelin.\n\n* Download [Flink 1.10 or afterwards](https://flink.apache.org/downloads.html) (Only 1.10 afterwards are supported), unpack it and set `FLINK_HOME` in Flink interpreter setting to this location.\n* Copy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\n* If you want to run yarn mode, you need to set `HADOOP_CONF_DIR` in flink interpreter setting. And make sure `hadoop` is in your `PATH`, because internally Flink will call command `hadoop classpath` and put all the hadoop related jars on the classpath of Flink interpreter process.\n\nThere\u0027re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\n\n\n* `%flink`\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment \n* `%flink.pyflink`\t- Provides a python environment \n* `%flink.ipyflink`\t- Provides an ipython environment \n* `%flink.ssql`\t - Provides a stream sql environment \n* `%flink.bsql`\t- Provides a batch sql environment ", |
| "user": "anonymous", |
| "dateUpdated": "2021-07-26 04:41:59.563", |
| "progress": 0, |
| "config": { |
| "colWidth": 12.0, |
| "fontSize": 9.0, |
| "enabled": true, |
| "results": {}, |
| "editorSetting": { |
| "language": "markdown", |
| "editOnDblClick": true, |
| "completionKey": "TAB", |
| "completionSupport": false |
| }, |
| "editorMode": "ace/mode/markdown", |
| "title": false, |
| "editorHide": true, |
| "tableHide": false |
| }, |
| "settings": { |
| "params": {}, |
| "forms": {} |
| }, |
| "results": { |
| "code": "SUCCESS", |
| "msg": [ |
| { |
| "type": "HTML", |
| "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003e\u003ca href\u003d\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;re 3 things you need to do before using flink in Zeppelin.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDownload \u003ca href\u003d\"https://flink.apache.org/downloads.html\"\u003eFlink 1.10 or afterwards\u003c/a\u003e (Only 1.10 afterwards are supported), unpack it and set \u003ccode\u003eFLINK_HOME\u003c/code\u003e in Flink interpreter setting to this location.\u003c/li\u003e\n\u003cli\u003eCopy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\u003c/li\u003e\n\u003cli\u003eIf you want to run yarn mode, you need to set \u003ccode\u003eHADOOP_CONF_DIR\u003c/code\u003e in flink interpreter setting. And make sure \u003ccode\u003ehadoop\u003c/code\u003e is in your \u003ccode\u003ePATH\u003c/code\u003e, because internally Flink will call command \u003ccode\u003ehadoop classpath\u003c/code\u003e and put all the hadoop related jars on the classpath of Flink interpreter process.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThere\u0026rsquo;re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003e%flink\u003c/code\u003e\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.pyflink\u003c/code\u003e\t- Provides a python environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ipyflink\u003c/code\u003e\t- Provides an ipython environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ssql\u003c/code\u003e\t - Provides a stream sql environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.bsql\u003c/code\u003e\t- Provides a batch sql environment\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e" |
| } |
| ] |
| }, |
| "apps": [], |
| "runtimeInfos": {}, |
| "progressUpdateIntervalMs": 500, |
| "jobName": "paragraph_1580997898536_-1239502599", |
| "id": "paragraph_1580997898536_-1239502599", |
| "dateCreated": "2020-02-06 22:04:58.536", |
| "dateStarted": "2021-07-26 04:41:59.562", |
| "dateFinished": "2021-07-26 04:41:59.580", |
| "status": "FINISHED" |
| }, |
| { |
| "title": "Batch WordCount", |
| "text": "%flink\n\nval data \u003d benv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .groupBy(0)\n .sum(1)\n .print()\n", |
| "user": "anonymous", |
| "dateUpdated": "2021-07-26 04:39:20.665", |
| "progress": 0, |
| "config": { |
| "colWidth": 6.0, |
| "fontSize": 9.0, |
| "enabled": true, |
| "results": {}, |
| "editorSetting": { |
| "language": "scala", |
| "editOnDblClick": false, |
| "completionKey": "TAB", |
| "completionSupport": true |
| }, |
| "editorMode": "ace/mode/scala", |
| "title": true |
| }, |
| "settings": { |
| "params": {}, |
| "forms": {} |
| }, |
| "results": { |
| "code": "SUCCESS", |
| "msg": [ |
| { |
| "type": "TEXT", |
| "data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d org.apache.flink.api.scala.DataSet@2d73af10\n(flink,1)\n(hadoop,1)\n(hello,3)\n(world,1)\n" |
| } |
| ] |
| }, |
| "apps": [], |
| "runtimeInfos": { |
| "jobUrl": { |
| "propertyName": "jobUrl", |
| "label": "FLINK JOB", |
| "tooltip": "View in Flink web UI", |
| "group": "flink", |
| "values": [ |
| { |
| "jobUrl": "http://localhost:8083#/job/3e3862199e3e9ce97cfa7f115d63242f" |
| } |
| ], |
| "interpreterSettingId": "flink" |
| } |
| }, |
| "progressUpdateIntervalMs": 500, |
| "jobName": "paragraph_1580998080340_1531975932", |
| "id": "paragraph_1580998080340_1531975932", |
| "dateCreated": "2020-02-06 22:08:00.340", |
| "dateStarted": "2021-07-26 04:39:20.670", |
| "dateFinished": "2021-07-26 04:39:42.257", |
| "status": "FINISHED" |
| }, |
| { |
| "title": "Streaming WordCount", |
| "text": "%flink\n\nval data \u003d senv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .keyBy(0)\n .sum(1)\n .print\n\nsenv.execute()", |
| "user": "anonymous", |
| "dateUpdated": "2021-07-26 04:39:42.299", |
| "progress": 0, |
| "config": { |
| "colWidth": 6.0, |
| "fontSize": 9.0, |
| "enabled": true, |
| "results": {}, |
| "editorSetting": { |
| "language": "scala", |
| "editOnDblClick": false, |
| "completionKey": "TAB", |
| "completionSupport": true |
| }, |
| "editorMode": "ace/mode/scala", |
| "title": true |
| }, |
| "settings": { |
| "params": {}, |
| "forms": {} |
| }, |
| "results": { |
| "code": "SUCCESS", |
| "msg": [ |
| { |
| "type": "TEXT", |
| "data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[String]\u001b[0m \u003d org.apache.flink.streaming.api.scala.DataStream@5814d5e0\n\u001b[33mwarning: \u001b[0mthere was one deprecation warning; re-run with -deprecation for details\n\u001b[1m\u001b[34mres2\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)]\u001b[0m \u003d org.apache.flink.streaming.api.datastream.DataStreamSink@5c76bd7d\n(hello,1)\n(world,1)\n(hello,2)\n(flink,1)\n(hello,3)\n(hadoop,1)\n\u001b[1m\u001b[34mres3\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.common.JobExecutionResult\u001b[0m \u003d\nProgram execution finished\nJob with JobID 27af9c543e20d02098d66e0900526361 has finished.\nJob Runtime: 152 ms\n" |
| } |
| ] |
| }, |
| "apps": [], |
| "runtimeInfos": { |
| "jobUrl": { |
| "propertyName": "jobUrl", |
| "label": "FLINK JOB", |
| "tooltip": "View in Flink web UI", |
| "group": "flink", |
| "values": [ |
| { |
| "jobUrl": "http://localhost:8083#/job/27af9c543e20d02098d66e0900526361" |
| } |
| ], |
| "interpreterSettingId": "flink" |
| } |
| }, |
| "progressUpdateIntervalMs": 500, |
| "jobName": "paragraph_1580998084555_-697674675", |
| "id": "paragraph_1580998084555_-697674675", |
| "dateCreated": "2020-02-06 22:08:04.555", |
| "dateStarted": "2021-07-26 04:39:42.303", |
| "dateFinished": "2021-07-26 04:39:48.021", |
| "status": "FINISHED" |
| }, |
| { |
| "text": "%flink\n", |
| "user": "anonymous", |
| "dateUpdated": "2020-02-25 11:10:14.096", |
| "progress": 0, |
| "config": {}, |
| "settings": { |
| "params": {}, |
| "forms": {} |
| }, |
| "apps": [], |
| "runtimeInfos": {}, |
| "progressUpdateIntervalMs": 500, |
| "jobName": "paragraph_1582600214095_1825730071", |
| "id": "paragraph_1582600214095_1825730071", |
| "dateCreated": "2020-02-25 11:10:14.096", |
| "status": "READY" |
| } |
| ], |
| "name": "1. Flink Basics", |
| "id": "2F2YS7PCE", |
| "defaultInterpreterGroup": "flink", |
| "version": "0.9.0-SNAPSHOT", |
| "noteParams": {}, |
| "noteForms": {}, |
| "angularObjects": {}, |
| "config": { |
| "isZeppelinNotebookCronEnable": false |
| }, |
| "info": { |
| "isRunning": false |
| } |
| } |