blob: e7177be3c3ee28bb9b7edad9e272da9ba74754bf [file] [log] [blame]
{
"paragraphs": [
{
"title": "Introduction",
"text": "%md\n\n# Introduction\n\n[Apache Flink](https://flink.apache.org/) is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode. \n\nThere\u0027re 3 things you need to do before using flink in Zeppelin.\n\n* Download [Flink 1.10 or afterwards](https://flink.apache.org/downloads.html) (Only 1.10 afterwards are supported), unpack it and set `FLINK_HOME` in Flink interpreter setting to this location.\n* Copy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\n* If you want to run yarn mode, you need to set `HADOOP_CONF_DIR` in flink interpreter setting. And make sure `hadoop` is in your `PATH`, because internally Flink will call command `hadoop classpath` and put all the hadoop related jars on the classpath of Flink interpreter process.\n\nThere\u0027re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\n\n\n* `%flink`\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment \n* `%flink.pyflink`\t- Provides a python environment \n* `%flink.ipyflink`\t- Provides an ipython environment \n* `%flink.ssql`\t - Provides a stream sql environment \n* `%flink.bsql`\t- Provides a batch sql environment ",
"user": "anonymous",
"dateUpdated": "2021-07-26 04:41:59.563",
"progress": 0,
"config": {
"colWidth": 12.0,
"fontSize": 9.0,
"enabled": true,
"results": {},
"editorSetting": {
"language": "markdown",
"editOnDblClick": true,
"completionKey": "TAB",
"completionSupport": false
},
"editorMode": "ace/mode/markdown",
"title": false,
"editorHide": true,
"tableHide": false
},
"settings": {
"params": {},
"forms": {}
},
"results": {
"code": "SUCCESS",
"msg": [
{
"type": "HTML",
"data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003e\u003ca href\u003d\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;re 3 things you need to do before using flink in Zeppelin.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDownload \u003ca href\u003d\"https://flink.apache.org/downloads.html\"\u003eFlink 1.10 or afterwards\u003c/a\u003e (Only 1.10 afterwards are supported), unpack it and set \u003ccode\u003eFLINK_HOME\u003c/code\u003e in Flink interpreter setting to this location.\u003c/li\u003e\n\u003cli\u003eCopy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\u003c/li\u003e\n\u003cli\u003eIf you want to run yarn mode, you need to set \u003ccode\u003eHADOOP_CONF_DIR\u003c/code\u003e in flink interpreter setting. And make sure \u003ccode\u003ehadoop\u003c/code\u003e is in your \u003ccode\u003ePATH\u003c/code\u003e, because internally Flink will call command \u003ccode\u003ehadoop classpath\u003c/code\u003e and put all the hadoop related jars on the classpath of Flink interpreter process.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThere\u0026rsquo;re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003e%flink\u003c/code\u003e\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.pyflink\u003c/code\u003e\t- Provides a python environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ipyflink\u003c/code\u003e\t- Provides an ipython environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ssql\u003c/code\u003e\t - Provides a stream sql environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.bsql\u003c/code\u003e\t- Provides a batch sql environment\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
}
]
},
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1580997898536_-1239502599",
"id": "paragraph_1580997898536_-1239502599",
"dateCreated": "2020-02-06 22:04:58.536",
"dateStarted": "2021-07-26 04:41:59.562",
"dateFinished": "2021-07-26 04:41:59.580",
"status": "FINISHED"
},
{
"title": "Batch WordCount",
"text": "%flink\n\nval data \u003d benv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .groupBy(0)\n .sum(1)\n .print()\n",
"user": "anonymous",
"dateUpdated": "2021-07-26 04:39:20.665",
"progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
"enabled": true,
"results": {},
"editorSetting": {
"language": "scala",
"editOnDblClick": false,
"completionKey": "TAB",
"completionSupport": true
},
"editorMode": "ace/mode/scala",
"title": true
},
"settings": {
"params": {},
"forms": {}
},
"results": {
"code": "SUCCESS",
"msg": [
{
"type": "TEXT",
"data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d org.apache.flink.api.scala.DataSet@2d73af10\n(flink,1)\n(hadoop,1)\n(hello,3)\n(world,1)\n"
}
]
},
"apps": [],
"runtimeInfos": {
"jobUrl": {
"propertyName": "jobUrl",
"label": "FLINK JOB",
"tooltip": "View in Flink web UI",
"group": "flink",
"values": [
{
"jobUrl": "http://localhost:8083#/job/3e3862199e3e9ce97cfa7f115d63242f"
}
],
"interpreterSettingId": "flink"
}
},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1580998080340_1531975932",
"id": "paragraph_1580998080340_1531975932",
"dateCreated": "2020-02-06 22:08:00.340",
"dateStarted": "2021-07-26 04:39:20.670",
"dateFinished": "2021-07-26 04:39:42.257",
"status": "FINISHED"
},
{
"title": "Streaming WordCount",
"text": "%flink\n\nval data \u003d senv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .keyBy(0)\n .sum(1)\n .print\n\nsenv.execute()",
"user": "anonymous",
"dateUpdated": "2021-07-26 04:39:42.299",
"progress": 0,
"config": {
"colWidth": 6.0,
"fontSize": 9.0,
"enabled": true,
"results": {},
"editorSetting": {
"language": "scala",
"editOnDblClick": false,
"completionKey": "TAB",
"completionSupport": true
},
"editorMode": "ace/mode/scala",
"title": true
},
"settings": {
"params": {},
"forms": {}
},
"results": {
"code": "SUCCESS",
"msg": [
{
"type": "TEXT",
"data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[String]\u001b[0m \u003d org.apache.flink.streaming.api.scala.DataStream@5814d5e0\n\u001b[33mwarning: \u001b[0mthere was one deprecation warning; re-run with -deprecation for details\n\u001b[1m\u001b[34mres2\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)]\u001b[0m \u003d org.apache.flink.streaming.api.datastream.DataStreamSink@5c76bd7d\n(hello,1)\n(world,1)\n(hello,2)\n(flink,1)\n(hello,3)\n(hadoop,1)\n\u001b[1m\u001b[34mres3\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.common.JobExecutionResult\u001b[0m \u003d\nProgram execution finished\nJob with JobID 27af9c543e20d02098d66e0900526361 has finished.\nJob Runtime: 152 ms\n"
}
]
},
"apps": [],
"runtimeInfos": {
"jobUrl": {
"propertyName": "jobUrl",
"label": "FLINK JOB",
"tooltip": "View in Flink web UI",
"group": "flink",
"values": [
{
"jobUrl": "http://localhost:8083#/job/27af9c543e20d02098d66e0900526361"
}
],
"interpreterSettingId": "flink"
}
},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1580998084555_-697674675",
"id": "paragraph_1580998084555_-697674675",
"dateCreated": "2020-02-06 22:08:04.555",
"dateStarted": "2021-07-26 04:39:42.303",
"dateFinished": "2021-07-26 04:39:48.021",
"status": "FINISHED"
},
{
"text": "%flink\n",
"user": "anonymous",
"dateUpdated": "2020-02-25 11:10:14.096",
"progress": 0,
"config": {},
"settings": {
"params": {},
"forms": {}
},
"apps": [],
"runtimeInfos": {},
"progressUpdateIntervalMs": 500,
"jobName": "paragraph_1582600214095_1825730071",
"id": "paragraph_1582600214095_1825730071",
"dateCreated": "2020-02-25 11:10:14.096",
"status": "READY"
}
],
"name": "1. Flink Basics",
"id": "2F2YS7PCE",
"defaultInterpreterGroup": "flink",
"version": "0.9.0-SNAPSHOT",
"noteParams": {},
"noteForms": {},
"angularObjects": {},
"config": {
"isZeppelinNotebookCronEnable": false
},
"info": {
"isRunning": false
}
}