notebook/Flink Tutorial/1. Flink Basics_2F2YS7PCE.zpln - zeppelin - Git at Google

 {
   "paragraphs": [
     {
       "title": "Introduction",
       "text": "%md\n\n# Introduction\n\n[Apache Flink](https://flink.apache.org/) is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode. \n\nThere\u0027re 3 things you need to do before using flink in Zeppelin.\n\n* Download [Flink 1.10 or afterwards](https://flink.apache.org/downloads.html)  (Only 1.10 afterwards are supported), unpack it and set `FLINK_HOME` in Flink interpreter setting to this location.\n* Copy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\n* If you want to run yarn mode, you need to set `HADOOP_CONF_DIR` in flink interpreter setting. And make sure `hadoop` is in your `PATH`, because internally Flink will call command `hadoop classpath` and put all the hadoop related jars on the classpath of Flink interpreter process.\n\nThere\u0027re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\n\n\n* `%flink`\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment \n* `%flink.pyflink`\t- Provides a python environment \n* `%flink.ipyflink`\t- Provides an ipython environment \n* `%flink.ssql`\t - Provides a stream sql environment \n* `%flink.bsql`\t- Provides a batch sql environment ",
       "user": "anonymous",
       "dateUpdated": "2021-07-26 04:41:59.563",
       "progress": 0,
       "config": {
         "colWidth": 12.0,
         "fontSize": 9.0,
         "enabled": true,
         "results": {},
         "editorSetting": {
           "language": "markdown",
           "editOnDblClick": true,
           "completionKey": "TAB",
           "completionSupport": false
         },
         "editorMode": "ace/mode/markdown",
         "title": false,
         "editorHide": true,
         "tableHide": false
       },
       "settings": {
         "params": {},
         "forms": {}
       },
       "results": {
         "code": "SUCCESS",
         "msg": [
           {
             "type": "HTML",
             "data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003e\u003ca href\u003d\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;re 3 things you need to do before using flink in Zeppelin.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDownload \u003ca href\u003d\"https://flink.apache.org/downloads.html\"\u003eFlink 1.10 or afterwards\u003c/a\u003e  (Only 1.10 afterwards are supported), unpack it and set \u003ccode\u003eFLINK_HOME\u003c/code\u003e in Flink interpreter setting to this location.\u003c/li\u003e\n\u003cli\u003eCopy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\u003c/li\u003e\n\u003cli\u003eIf you want to run yarn mode, you need to set \u003ccode\u003eHADOOP_CONF_DIR\u003c/code\u003e in flink interpreter setting. And make sure \u003ccode\u003ehadoop\u003c/code\u003e is in your \u003ccode\u003ePATH\u003c/code\u003e, because internally Flink will call command \u003ccode\u003ehadoop classpath\u003c/code\u003e and put all the hadoop related jars on the classpath of Flink interpreter process.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThere\u0026rsquo;re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003e%flink\u003c/code\u003e\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.pyflink\u003c/code\u003e\t- Provides a python environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ipyflink\u003c/code\u003e\t- Provides an ipython environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ssql\u003c/code\u003e\t - Provides a stream sql environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.bsql\u003c/code\u003e\t- Provides a batch sql environment\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
           }
         ]
       },
       "apps": [],
       "runtimeInfos": {},
       "progressUpdateIntervalMs": 500,
       "jobName": "paragraph_1580997898536_-1239502599",
       "id": "paragraph_1580997898536_-1239502599",
       "dateCreated": "2020-02-06 22:04:58.536",
       "dateStarted": "2021-07-26 04:41:59.562",
       "dateFinished": "2021-07-26 04:41:59.580",
       "status": "FINISHED"
     },
     {
       "title": "Batch WordCount",
       "text": "%flink\n\nval data \u003d benv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n             .map(w \u003d\u003e (w, 1))\n             .groupBy(0)\n             .sum(1)\n             .print()\n",
       "user": "anonymous",
       "dateUpdated": "2021-07-26 04:39:20.665",
       "progress": 0,
       "config": {
         "colWidth": 6.0,
         "fontSize": 9.0,
         "enabled": true,
         "results": {},
         "editorSetting": {
           "language": "scala",
           "editOnDblClick": false,
           "completionKey": "TAB",
           "completionSupport": true
         },
         "editorMode": "ace/mode/scala",
         "title": true
       },
       "settings": {
         "params": {},
         "forms": {}
       },
       "results": {
         "code": "SUCCESS",
         "msg": [
           {
             "type": "TEXT",
             "data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d org.apache.flink.api.scala.DataSet@2d73af10\n(flink,1)\n(hadoop,1)\n(hello,3)\n(world,1)\n"
           }
         ]
       },
       "apps": [],
       "runtimeInfos": {
         "jobUrl": {
           "propertyName": "jobUrl",
           "label": "FLINK JOB",
           "tooltip": "View in Flink web UI",
           "group": "flink",
           "values": [
             {
               "jobUrl": "http://localhost:8083#/job/3e3862199e3e9ce97cfa7f115d63242f"
             }
           ],
           "interpreterSettingId": "flink"
         }
       },
       "progressUpdateIntervalMs": 500,
       "jobName": "paragraph_1580998080340_1531975932",
       "id": "paragraph_1580998080340_1531975932",
       "dateCreated": "2020-02-06 22:08:00.340",
       "dateStarted": "2021-07-26 04:39:20.670",
       "dateFinished": "2021-07-26 04:39:42.257",
       "status": "FINISHED"
     },
     {
       "title": "Streaming WordCount",
       "text": "%flink\n\nval data \u003d senv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n  .map(w \u003d\u003e (w, 1))\n  .keyBy(0)\n  .sum(1)\n  .print\n\nsenv.execute()",
       "user": "anonymous",
       "dateUpdated": "2021-07-26 04:39:42.299",
       "progress": 0,
       "config": {
         "colWidth": 6.0,
         "fontSize": 9.0,
         "enabled": true,
         "results": {},
         "editorSetting": {
           "language": "scala",
           "editOnDblClick": false,
           "completionKey": "TAB",
           "completionSupport": true
         },
         "editorMode": "ace/mode/scala",
         "title": true
       },
       "settings": {
         "params": {},
         "forms": {}
       },
       "results": {
         "code": "SUCCESS",
         "msg": [
           {
             "type": "TEXT",
             "data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[String]\u001b[0m \u003d org.apache.flink.streaming.api.scala.DataStream@5814d5e0\n\u001b[33mwarning: \u001b[0mthere was one deprecation warning; re-run with -deprecation for details\n\u001b[1m\u001b[34mres2\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)]\u001b[0m \u003d org.apache.flink.streaming.api.datastream.DataStreamSink@5c76bd7d\n(hello,1)\n(world,1)\n(hello,2)\n(flink,1)\n(hello,3)\n(hadoop,1)\n\u001b[1m\u001b[34mres3\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.common.JobExecutionResult\u001b[0m \u003d\nProgram execution finished\nJob with JobID 27af9c543e20d02098d66e0900526361 has finished.\nJob Runtime: 152 ms\n"
           }
         ]
       },
       "apps": [],
       "runtimeInfos": {
         "jobUrl": {
           "propertyName": "jobUrl",
           "label": "FLINK JOB",
           "tooltip": "View in Flink web UI",
           "group": "flink",
           "values": [
             {
               "jobUrl": "http://localhost:8083#/job/27af9c543e20d02098d66e0900526361"
             }
           ],
           "interpreterSettingId": "flink"
         }
       },
       "progressUpdateIntervalMs": 500,
       "jobName": "paragraph_1580998084555_-697674675",
       "id": "paragraph_1580998084555_-697674675",
       "dateCreated": "2020-02-06 22:08:04.555",
       "dateStarted": "2021-07-26 04:39:42.303",
       "dateFinished": "2021-07-26 04:39:48.021",
       "status": "FINISHED"
     },
     {
       "text": "%flink\n",
       "user": "anonymous",
       "dateUpdated": "2020-02-25 11:10:14.096",
       "progress": 0,
       "config": {},
       "settings": {
         "params": {},
         "forms": {}
       },
       "apps": [],
       "runtimeInfos": {},
       "progressUpdateIntervalMs": 500,
       "jobName": "paragraph_1582600214095_1825730071",
       "id": "paragraph_1582600214095_1825730071",
       "dateCreated": "2020-02-25 11:10:14.096",
       "status": "READY"
     }
   ],
   "name": "1. Flink Basics",
   "id": "2F2YS7PCE",
   "defaultInterpreterGroup": "flink",
   "version": "0.9.0-SNAPSHOT",
   "noteParams": {},
   "noteForms": {},
   "angularObjects": {},
   "config": {
     "isZeppelinNotebookCronEnable": false
   },
   "info": {
     "isRunning": false
   }
 }
	{
	"paragraphs": [
	{
	"title": "Introduction",
	"text": "%md\n\n# Introduction\n\n[Apache Flink](https://flink.apache.org/) is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode. \n\nThere\u0027re 3 things you need to do before using flink in Zeppelin.\n\n* Download [Flink 1.10 or afterwards](https://flink.apache.org/downloads.html) (Only 1.10 afterwards are supported), unpack it and set `FLINK_HOME` in Flink interpreter setting to this location.\n* Copy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\n* If you want to run yarn mode, you need to set `HADOOP_CONF_DIR` in flink interpreter setting. And make sure `hadoop` is in your `PATH`, because internally Flink will call command `hadoop classpath` and put all the hadoop related jars on the classpath of Flink interpreter process.\n\nThere\u0027re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\n\n\n* `%flink`\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment \n* `%flink.pyflink`\t- Provides a python environment \n* `%flink.ipyflink`\t- Provides an ipython environment \n* `%flink.ssql`\t - Provides a stream sql environment \n* `%flink.bsql`\t- Provides a batch sql environment ",
	"user": "anonymous",
	"dateUpdated": "2021-07-26 04:41:59.563",
	"progress": 0,
	"config": {
	"colWidth": 12.0,
	"fontSize": 9.0,
	"enabled": true,
	"results": {},
	"editorSetting": {
	"language": "markdown",
	"editOnDblClick": true,
	"completionKey": "TAB",
	"completionSupport": false
	},
	"editorMode": "ace/mode/markdown",
	"title": false,
	"editorHide": true,
	"tableHide": false
	},
	"settings": {
	"params": {},
	"forms": {}
	},
	"results": {
	"code": "SUCCESS",
	"msg": [
	{
	"type": "HTML",
	"data": "\u003cdiv class\u003d\"markdown-body\"\u003e\n\u003ch1\u003eIntroduction\u003c/h1\u003e\n\u003cp\u003e\u003ca href\u003d\"https://flink.apache.org/\"\u003eApache Flink\u003c/a\u003e is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. This is Flink tutorial for running classical wordcount in both batch and streaming mode.\u003c/p\u003e\n\u003cp\u003eThere\u0026rsquo;re 3 things you need to do before using flink in Zeppelin.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDownload \u003ca href\u003d\"https://flink.apache.org/downloads.html\"\u003eFlink 1.10 or afterwards\u003c/a\u003e (Only 1.10 afterwards are supported), unpack it and set \u003ccode\u003eFLINK_HOME\u003c/code\u003e in Flink interpreter setting to this location.\u003c/li\u003e\n\u003cli\u003eCopy flink-python_2.11–x.x.x.jar from flink opt folder to flink lib folder (it is used by Pyflink which is supported in Zeppelin)\u003c/li\u003e\n\u003cli\u003eIf you want to run yarn mode, you need to set \u003ccode\u003eHADOOP_CONF_DIR\u003c/code\u003e in flink interpreter setting. And make sure \u003ccode\u003ehadoop\u003c/code\u003e is in your \u003ccode\u003ePATH\u003c/code\u003e, because internally Flink will call command \u003ccode\u003ehadoop classpath\u003c/code\u003e and put all the hadoop related jars on the classpath of Flink interpreter process.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThere\u0026rsquo;re 6 sub interpreters in Flink interpreter, each is used for different purpose. However they are in the the JVM and share the same ExecutionEnviroment/StremaExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ccode\u003e%flink\u003c/code\u003e\t- Creates ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment and provides a Scala environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.pyflink\u003c/code\u003e\t- Provides a python environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ipyflink\u003c/code\u003e\t- Provides an ipython environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.ssql\u003c/code\u003e\t - Provides a stream sql environment\u003c/li\u003e\n\u003cli\u003e\u003ccode\u003e%flink.bsql\u003c/code\u003e\t- Provides a batch sql environment\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/div\u003e"
	}
	]
	},
	"apps": [],
	"runtimeInfos": {},
	"progressUpdateIntervalMs": 500,
	"jobName": "paragraph_1580997898536_-1239502599",
	"id": "paragraph_1580997898536_-1239502599",
	"dateCreated": "2020-02-06 22:04:58.536",
	"dateStarted": "2021-07-26 04:41:59.562",
	"dateFinished": "2021-07-26 04:41:59.580",
	"status": "FINISHED"
	},
	{
	"title": "Batch WordCount",
	"text": "%flink\n\nval data \u003d benv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .groupBy(0)\n .sum(1)\n .print()\n",
	"user": "anonymous",
	"dateUpdated": "2021-07-26 04:39:20.665",
	"progress": 0,
	"config": {
	"colWidth": 6.0,
	"fontSize": 9.0,
	"enabled": true,
	"results": {},
	"editorSetting": {
	"language": "scala",
	"editOnDblClick": false,
	"completionKey": "TAB",
	"completionSupport": true
	},
	"editorMode": "ace/mode/scala",
	"title": true
	},
	"settings": {
	"params": {},
	"forms": {}
	},
	"results": {
	"code": "SUCCESS",
	"msg": [
	{
	"type": "TEXT",
	"data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.scala.DataSet[String]\u001b[0m \u003d org.apache.flink.api.scala.DataSet@2d73af10\n(flink,1)\n(hadoop,1)\n(hello,3)\n(world,1)\n"
	}
	]
	},
	"apps": [],
	"runtimeInfos": {
	"jobUrl": {
	"propertyName": "jobUrl",
	"label": "FLINK JOB",
	"tooltip": "View in Flink web UI",
	"group": "flink",
	"values": [
	{
	"jobUrl": "http://localhost:8083#/job/3e3862199e3e9ce97cfa7f115d63242f"
	}
	],
	"interpreterSettingId": "flink"
	}
	},
	"progressUpdateIntervalMs": 500,
	"jobName": "paragraph_1580998080340_1531975932",
	"id": "paragraph_1580998080340_1531975932",
	"dateCreated": "2020-02-06 22:08:00.340",
	"dateStarted": "2021-07-26 04:39:20.670",
	"dateFinished": "2021-07-26 04:39:42.257",
	"status": "FINISHED"
	},
	{
	"title": "Streaming WordCount",
	"text": "%flink\n\nval data \u003d senv.fromElements(\"hello world\", \"hello flink\", \"hello hadoop\")\ndata.flatMap(line \u003d\u003e line.split(\"\\\\s\"))\n .map(w \u003d\u003e (w, 1))\n .keyBy(0)\n .sum(1)\n .print\n\nsenv.execute()",
	"user": "anonymous",
	"dateUpdated": "2021-07-26 04:39:42.299",
	"progress": 0,
	"config": {
	"colWidth": 6.0,
	"fontSize": 9.0,
	"enabled": true,
	"results": {},
	"editorSetting": {
	"language": "scala",
	"editOnDblClick": false,
	"completionKey": "TAB",
	"completionSupport": true
	},
	"editorMode": "ace/mode/scala",
	"title": true
	},
	"settings": {
	"params": {},
	"forms": {}
	},
	"results": {
	"code": "SUCCESS",
	"msg": [
	{
	"type": "TEXT",
	"data": "\u001b[1m\u001b[34mdata\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.scala.DataStream[String]\u001b[0m \u003d org.apache.flink.streaming.api.scala.DataStream@5814d5e0\n\u001b[33mwarning: \u001b[0mthere was one deprecation warning; re-run with -deprecation for details\n\u001b[1m\u001b[34mres2\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)]\u001b[0m \u003d org.apache.flink.streaming.api.datastream.DataStreamSink@5c76bd7d\n(hello,1)\n(world,1)\n(hello,2)\n(flink,1)\n(hello,3)\n(hadoop,1)\n\u001b[1m\u001b[34mres3\u001b[0m: \u001b[1m\u001b[32morg.apache.flink.api.common.JobExecutionResult\u001b[0m \u003d\nProgram execution finished\nJob with JobID 27af9c543e20d02098d66e0900526361 has finished.\nJob Runtime: 152 ms\n"
	}
	]
	},
	"apps": [],
	"runtimeInfos": {
	"jobUrl": {
	"propertyName": "jobUrl",
	"label": "FLINK JOB",
	"tooltip": "View in Flink web UI",
	"group": "flink",
	"values": [
	{
	"jobUrl": "http://localhost:8083#/job/27af9c543e20d02098d66e0900526361"
	}
	],
	"interpreterSettingId": "flink"
	}
	},
	"progressUpdateIntervalMs": 500,
	"jobName": "paragraph_1580998084555_-697674675",
	"id": "paragraph_1580998084555_-697674675",
	"dateCreated": "2020-02-06 22:08:04.555",
	"dateStarted": "2021-07-26 04:39:42.303",
	"dateFinished": "2021-07-26 04:39:48.021",
	"status": "FINISHED"
	},
	{
	"text": "%flink\n",
	"user": "anonymous",
	"dateUpdated": "2020-02-25 11:10:14.096",
	"progress": 0,
	"config": {},
	"settings": {
	"params": {},
	"forms": {}
	},
	"apps": [],
	"runtimeInfos": {},
	"progressUpdateIntervalMs": 500,
	"jobName": "paragraph_1582600214095_1825730071",
	"id": "paragraph_1582600214095_1825730071",
	"dateCreated": "2020-02-25 11:10:14.096",
	"status": "READY"
	}
	],
	"name": "1. Flink Basics",
	"id": "2F2YS7PCE",
	"defaultInterpreterGroup": "flink",
	"version": "0.9.0-SNAPSHOT",
	"noteParams": {},
	"noteForms": {},
	"angularObjects": {},
	"config": {
	"isZeppelinNotebookCronEnable": false
	},
	"info": {
	"isRunning": false
	}
	}