| |
| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <title>Apache Zeppelin 0.10.1 Documentation: Scio Interpreter for Apache Zeppelin</title> |
| <meta name="description" content="Scio is a Scala DSL for Apache Beam/Google Dataflow model."> |
| <meta name="author" content="The Apache Software Foundation"> |
| |
| <!-- Enable responsive viewport --> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <!-- Le HTML5 shim, for IE6-8 support of HTML elements --> |
| <!--[if lt IE 9]> |
| <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script> |
| <![endif]--> |
| |
| <link href="/docs/0.10.1/assets/themes/zeppelin/font-awesome.min.css" rel="stylesheet"> |
| |
| <!-- Le styles --> |
| <link href="/docs/0.10.1/assets/themes/zeppelin/bootstrap/css/bootstrap.css" rel="stylesheet"> |
| <link href="/docs/0.10.1/assets/themes/zeppelin/css/style.css?body=1" rel="stylesheet" type="text/css"> |
| <link href="/docs/0.10.1/assets/themes/zeppelin/css/syntax.css" rel="stylesheet" type="text/css" media="screen" /> |
| <!-- Le fav and touch icons --> |
| <!-- Update these with your own images |
| <link rel="shortcut icon" href="images/favicon.ico"> |
| <link rel="apple-touch-icon" href="images/apple-touch-icon.png"> |
| <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png"> |
| <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png"> |
| --> |
| |
| <!-- Js --> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/jquery-1.10.2.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/bootstrap/js/bootstrap.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/docs.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/anchor.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/toc.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/lunr.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/search.js"></script> |
| |
| <!-- atom & rss feed --> |
| <link href="/docs/0.10.1/atom.xml" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed"> |
| <link href="/docs/0.10.1/rss.xml" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed"> |
| |
| <!-- Matomo --> |
| <script> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| _paq.push(["setDoNotTrack", true]); |
| _paq.push(["disableCookies"]); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '69']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| </head> |
| |
| <body> |
| |
| <div id="menu" class="navbar navbar-inverse navbar-fixed-top" role="navigation"> |
| <div class="container navbar-container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <div class="navbar-brand"> |
| <a class="navbar-brand-main" href="http://zeppelin.apache.org"> |
| <img src="/docs/0.10.1/assets/themes/zeppelin/img/zeppelin_logo.png" width="50" |
| style="margin-top: -2px;" alt="I'm zeppelin"> |
| <span style="margin-left: 5px; font-size: 27px;">Zeppelin</span> |
| <a class="navbar-brand-version" href="/docs/0.10.1" |
| style="font-size: 15px; color: white;"> 0.10.1 |
| </a> |
| </a> |
| </div> |
| </div> |
| <nav class="navbar-collapse collapse" role="navigation"> |
| <ul class="nav navbar-nav"> |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Quick Start <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li class="title"><span>Getting Started</span></li> |
| <li><a href="/docs/0.10.1/quickstart/install.html">Install</a></li> |
| <li><a href="/docs/0.10.1/quickstart/explore_ui.html">Explore UI</a></li> |
| <li><a href="/docs/0.10.1/quickstart/tutorial.html">Tutorial</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Run Mode</span></li> |
| <li><a href="/docs/0.10.1/quickstart/kubernetes.html">Kubernetes</a></li> |
| <li><a href="/docs/0.10.1/quickstart/docker.html">Docker</a></li> |
| <li><a href="/docs/0.10.1/quickstart/yarn.html">Yarn</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.10.1/quickstart/spark_with_zeppelin.html">Spark with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/flink_with_zeppelin.html">Flink with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/sql_with_zeppelin.html">SQL with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/python_with_zeppelin.html">Python with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/r_with_zeppelin.html">R with Zeppelin</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Usage<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Dynamic Form</span></li> |
| <li><a href="/docs/0.10.1/usage/dynamic_form/intro.html">What is Dynamic Form?</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Display System</span></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#text">Text Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#html">HTML Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#table">Table Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#network">Network Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/angular_backend.html">Angular Display using Backend API</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/angular_frontend.html">Angular Display using Frontend API</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Interpreter</span></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/overview.html">Overview</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/interpreter_binding_mode.html">Interpreter Binding Mode</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/user_impersonation.html">User Impersonation</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/dependency_management.html">Dependency Management</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/installation.html">Installing Interpreters</a></li> |
| <!--<li><a href="/docs/0.10.1/usage/interpreter/dynamic_loading.html">Dynamic Interpreter Loading (Experimental)</a></li>--> |
| <li><a href="/docs/0.10.1/usage/interpreter/execution_hooks.html">Execution Hooks (Experimental)</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Other Features</span></li> |
| <li><a href="/docs/0.10.1/usage/other_features/publishing_paragraphs.html">Publishing Paragraphs</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/personalized_mode.html">Personalized Mode</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/customizing_homepage.html">Customizing Zeppelin Homepage</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/notebook_actions.html">Notebook Actions</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/cron_scheduler.html">Cron Scheduler</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/zeppelin_context.html">Zeppelin Context</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>REST API</span></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/interpreter.html">Interpreter API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/zeppelin_server.html">Zeppelin Server API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/notebook.html">Notebook API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/notebook_repository.html">Notebook Repository API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/configuration.html">Configuration API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/credential.html">Credential API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/helium.html">Helium API</a></li> |
| <li class="title"><span>Zeppelin SDK</span></li> |
| <li><a href="/docs/0.10.1/usage/zeppelin_sdk/client_api.html">Client API</a></li> |
| <li><a href="/docs/0.10.1/usage/zeppelin_sdk/session_api.html">Session API</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Setup<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Basics</span></li> |
| <li><a href="/docs/0.10.1/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/setup/basics/hadoop_integration.html">Hadoop Integration</a></li> |
| <li><a href="/docs/0.10.1/setup/basics/multi_user_support.html">Multi-user Support</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Deployment</span></li> |
| <!--<li><a href="/docs/0.10.1/setup/deployment/docker.html">Docker Image for Zeppelin</a></li>--> |
| <li><a href="/docs/0.10.1/setup/deployment/spark_cluster_mode.html#spark-standalone-mode">Spark Cluster Mode: Standalone</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/spark_cluster_mode.html#spark-on-yarn-mode">Spark Cluster Mode: YARN</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/spark_cluster_mode.html#spark-on-mesos-mode">Spark Cluster Mode: Mesos</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/flink_and_spark_cluster.html">Zeppelin with Flink, Spark Cluster</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/cdh.html">Zeppelin on CDH</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/virtual_machine.html">Zeppelin on VM: Vagrant</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Security</span></li> |
| <li><a href="/docs/0.10.1/setup/security/authentication_nginx.html">HTTP Basic Auth using NGINX</a></li> |
| <li><a href="/docs/0.10.1/setup/security/shiro_authentication.html">Shiro Authentication</a></li> |
| <li><a href="/docs/0.10.1/setup/security/notebook_authorization.html">Notebook Authorization</a></li> |
| <li><a href="/docs/0.10.1/setup/security/datasource_authorization.html">Data Source Authorization</a></li> |
| <li><a href="/docs/0.10.1/setup/security/http_security_headers.html">HTTP Security Headers</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Notebook Storage</span></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-local-git-repository">Git Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-s3">S3 Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-azure">Azure Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-oss">OSS Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-zeppelinhub">ZeppelinHub Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-mongodb">MongoDB Storage</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Operation</span></li> |
| <li><a href="/docs/0.10.1/setup/operation/configuration.html">Configuration</a></li> |
| <li><a href="/docs/0.10.1/setup/operation/proxy_setting.html">Proxy Setting</a></li> |
| <li><a href="/docs/0.10.1/setup/operation/upgrading.html">Upgrading</a></li> |
| <li><a href="/docs/0.10.1/setup/operation/trouble_shooting.html">Trouble Shooting</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Interpreter <b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Interpreters</span></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/overview.html">Overview</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.10.1/interpreter/spark.html">Spark</a></li> |
| <li><a href="/docs/0.10.1/interpreter/flink.html">Flink</a></li> |
| <li><a href="/docs/0.10.1/interpreter/jdbc.html">JDBC</a></li> |
| <li><a href="/docs/0.10.1/interpreter/python.html">Python</a></li> |
| <li><a href="/docs/0.10.1/interpreter/r.html">R</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.10.1/interpreter/alluxio.html">Alluxio</a></li> |
| <li><a href="/docs/0.10.1/interpreter/beam.html">Beam</a></li> |
| <li><a href="/docs/0.10.1/interpreter/bigquery.html">BigQuery</a></li> |
| <li><a href="/docs/0.10.1/interpreter/cassandra.html">Cassandra</a></li> |
| <li><a href="/docs/0.10.1/interpreter/elasticsearch.html">Elasticsearch</a></li> |
| <li><a href="/docs/0.10.1/interpreter/geode.html">Geode</a></li> |
| <li><a href="/docs/0.10.1/interpreter/groovy.html">Groovy</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hazelcastjet.html">Hazelcast Jet</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hbase.html">HBase</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hdfs.html">HDFS</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hive.html">Hive</a></li> |
| <li><a href="/docs/0.10.1/interpreter/ignite.html">Ignite</a></li> |
| <li><a href="/docs/0.10.1/interpreter/influxdb.html">influxDB</a></li> |
| <li><a href="/docs/0.10.1/interpreter/java.html">Java</a></li> |
| <li><a href="/docs/0.10.1/interpreter/jupyter.html">Jupyter</a></li> |
| <li><a href="/docs/0.10.1/interpreter/kotlin.html">Kotlin</a></li> |
| <li><a href="/docs/0.10.1/interpreter/ksql.html">KSQL</a></li> |
| <li><a href="/docs/0.10.1/interpreter/kylin.html">Kylin</a></li> |
| <li><a href="/docs/0.10.1/interpreter/lens.html">Lens</a></li> |
| <li><a href="/docs/0.10.1/interpreter/livy.html">Livy</a></li> |
| <li><a href="/docs/0.10.1/interpreter/mahout.html">Mahout</a></li> |
| <li><a href="/docs/0.10.1/interpreter/markdown.html">Markdown</a></li> |
| <li><a href="/docs/0.10.1/interpreter/mongodb.html">MongoDB</a></li> |
| <li><a href="/docs/0.10.1/interpreter/neo4j.html">Neo4j</a></li> |
| <li><a href="/docs/0.10.1/interpreter/pig.html">Pig</a></li> |
| <li><a href="/docs/0.10.1/interpreter/postgresql.html">Postgresql, HAWQ</a></li> |
| <li><a href="/docs/0.10.1/interpreter/sap.html">SAP</a></li> |
| <li><a href="/docs/0.10.1/interpreter/scalding.html">Scalding</a></li> |
| <li><a href="/docs/0.10.1/interpreter/scio.html">Scio</a></li> |
| <li><a href="/docs/0.10.1/interpreter/shell.html">Shell</a></li> |
| <li><a href="/docs/0.10.1/interpreter/sparql.html">Sparql</a></li> |
| <li><a href="/docs/0.10.1/interpreter/submarine.html">Submarine</a></li> |
| </ul> |
| </li> |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">More<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu" style="right: 0; left: auto;"> |
| <li class="title"><span>Extending Zeppelin</span></li> |
| <li><a href="/docs/0.10.1/development/writing_zeppelin_interpreter.html">Writing Zeppelin Interpreter</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Helium (Experimental)</span></li> |
| <li><a href="/docs/0.10.1/development/helium/overview.html">Overview</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_application.html">Writing Helium Application</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_spell.html">Writing Helium Spell</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_visualization_basic.html">Writing Helium Visualization: Basics</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_visualization_transformation.html">Writing Helium Visualization: Transformation</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Contributing to Zeppelin</span></li> |
| <li><a href="/docs/0.10.1/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/development/contribution/useful_developer_tools.html">Useful Developer Tools</a></li> |
| <li><a href="/docs/0.10.1/development/contribution/how_to_contribute_code.html">How to Contribute (code)</a></li> |
| <li><a href="/docs/0.10.1/development/contribution/how_to_contribute_website.html">How to Contribute (website)</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>External Resources</span></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="https://zeppelin.apache.org/community.html">Mailing List</a></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home">Apache Zeppelin Wiki</a></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="http://stackoverflow.com/questions/tagged/apache-zeppelin">Stackoverflow Questions about Zeppelin</a></li> |
| </ul> |
| </li> |
| <li> |
| <a href="/docs/0.10.1/search.html" class="nav-search-link"> |
| <span class="fa fa-search nav-search-icon"></span> |
| </a> |
| </li> |
| </ul> |
| </nav><!--/.navbar-collapse --> |
| </div> |
| </div> |
| |
| |
| |
| <div class="content"> |
| |
| <!--<div class="hero-unit Scio Interpreter for Apache Zeppelin"> |
| <h1></h1> |
| </div> |
| --> |
| |
| <div class="row"> |
| <div class="col-md-12"> |
| <!-- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <h1>Scio Interpreter for Apache Zeppelin</h1> |
| |
| <div id="toc"></div> |
| |
| <h2>Overview</h2> |
| |
| <p>Scio is a Scala DSL for <a href="https://github.com/GoogleCloudPlatform/DataflowJavaSDK">Google Cloud Dataflow</a> and <a href="http://beam.incubator.apache.org/">Apache Beam</a> inspired by <a href="http://spark.apache.org/">Spark</a> and <a href="https://github.com/twitter/scalding">Scalding</a>. See the current <a href="https://github.com/spotify/scio/wiki">wiki</a> and <a href="http://spotify.github.io/scio/">API documentation</a> for more information.</p> |
| |
| <h2>Configuration</h2> |
| |
| <table class="table-configuration"> |
| <tr> |
| <th>Name</th> |
| <th>Default Value</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td>zeppelin.scio.argz</td> |
| <td>--runner=InProcessPipelineRunner</td> |
| <td>Scio interpreter wide arguments. Documentation: https://github.com/spotify/scio/wiki#options and https://cloud.google.com/dataflow/pipelines/specifying-exec-params</td> |
| </tr> |
| <tr> |
| <td>zeppelin.scio.maxResult</td> |
| <td>1000</td> |
| <td>Max number of SCollection results to display</td> |
| </tr> |
| |
| </table> |
| |
| <h2>Enabling the Scio Interpreter</h2> |
| |
| <p>In a notebook, to enable the <strong>Scio</strong> interpreter, click the <strong>Gear</strong> icon and select <strong>beam</strong> (<strong>beam.scio</strong>).</p> |
| |
| <h2>Using the Scio Interpreter</h2> |
| |
| <p>In a paragraph, use <code>%beam.scio</code> to select the <strong>Scio</strong> interpreter. You can use it much the same way as vanilla Scala REPL and <a href="https://github.com/spotify/scio/wiki/Scio-REPL">Scio REPL</a>. State (like variables, imports, execution etc) is shared among all <em>Scio</em> paragraphs. There is a special variable <strong>argz</strong> which holds arguments from Scio interpreter settings. The easiest way to proceed is to create a Scio context via standard <code>ContextAndArgs</code>.</p> |
| <div class="highlight"><pre><code class="scala language-scala" data-lang="scala"><span class="o">%</span><span class="n">beam</span><span class="o">.</span><span class="n">scio</span> |
| <span class="k">val</span> <span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">args</span><span class="o">)</span> <span class="k">=</span> <span class="nc">ContextAndArgs</span><span class="o">(</span><span class="n">argz</span><span class="o">)</span> |
| </code></pre></div> |
| <p>Use <code>sc</code> context the way you would in a regular pipeline/REPL.</p> |
| |
| <p>Example:</p> |
| <div class="highlight"><pre><code class="scala language-scala" data-lang="scala"><span class="o">%</span><span class="n">beam</span><span class="o">.</span><span class="n">scio</span> |
| <span class="k">val</span> <span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">args</span><span class="o">)</span> <span class="k">=</span> <span class="nc">ContextAndArgs</span><span class="o">(</span><span class="n">argz</span><span class="o">)</span> |
| <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span><span class="s">"foo"</span><span class="o">,</span> <span class="s">"foo"</span><span class="o">,</span> <span class="s">"bar"</span><span class="o">)).</span><span class="n">countByValue</span><span class="o">.</span><span class="n">closeAndDisplay</span><span class="o">()</span> |
| </code></pre></div> |
| <p>If you close Scio context, go ahead an create a new one using <code>ContextAndArgs</code>. Please refer to <a href="https://github.com/spotify/scio/wiki">Scio wiki</a> for more complex examples. You can close Scio context much the same way as in Scio REPL, and use Zeppelin display helpers to synchronously close and display results - read more below.</p> |
| |
| <h3>Progress</h3> |
| |
| <p>There can be only one paragraph running at once. There is no notion of overall progress, thus progress bar will show <code>0</code>.</p> |
| |
| <h3>SCollection display helpers</h3> |
| |
| <p>Scio interpreter comes with display helpers to ease working with Zeppelin notebooks. Simply use <code>closeAndDisplay()</code> on <code>SCollection</code> to close context and display the results. The number of results is limited by <code>zeppelin.scio.maxResult</code> (by default 1000).</p> |
| |
| <p>Supported <code>SCollection</code> types:</p> |
| |
| <ul> |
| <li>Scio's typed BigQuery</li> |
| <li>Scala's Products (case classes, tuples)</li> |
| <li>Google BigQuery's TableRow</li> |
| <li>Apache Avro</li> |
| <li>All Scala's <code>AnyVal</code></li> |
| </ul> |
| |
| <h4>Helper methods</h4> |
| |
| <p>There are different helper methods for different objects. You can easily display results from <code>SCollection</code>, <code>Future[Tap]</code> and <code>Tap</code>.</p> |
| |
| <h5><code>SCollection</code> helper</h5> |
| |
| <p><code>SCollection</code> has <code>closeAndDisplay</code> Zeppelin helper method for types listed above. Use it to synchronously close Scio context, and once available pull and display results.</p> |
| |
| <h5><code>Future[Tap]</code> helper</h5> |
| |
| <p><code>Future[Tap]</code> has <code>waitAndDisplay</code> Zeppelin helper method for types listed above. Use it to synchronously wait for results, and once available pull and display results.</p> |
| |
| <h5><code>Tap</code> helper</h5> |
| |
| <p><code>Tap</code> has <code>display</code> Zeppelin helper method for types listed above. Use it to pull and display results.</p> |
| |
| <h3>Examples</h3> |
| |
| <h4>BigQuery example:</h4> |
| <div class="highlight"><pre><code class="scala language-scala" data-lang="scala"><span class="o">%</span><span class="n">beam</span><span class="o">.</span><span class="n">scio</span> |
| <span class="nd">@BigQueryType</span><span class="o">.</span><span class="n">fromQuery</span><span class="o">(</span><span class="s">"""|SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays</span> |
| <span class="s"> |FROM [bigquery-samples:airline_ontime_data.flights]</span> |
| <span class="s"> |group by departure_airport</span> |
| <span class="s"> |order by 2 desc</span> |
| <span class="s"> |limit 10"""</span><span class="o">.</span><span class="n">stripMargin</span><span class="o">)</span> <span class="k">class</span> <span class="nc">Flights</span> |
| |
| <span class="k">val</span> <span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">args</span><span class="o">)</span> <span class="k">=</span> <span class="nc">ContextAndArgs</span><span class="o">(</span><span class="n">argz</span><span class="o">)</span> |
| <span class="n">sc</span><span class="o">.</span><span class="n">bigQuerySelect</span><span class="o">(</span><span class="nc">Flights</span><span class="o">.</span><span class="n">query</span><span class="o">).</span><span class="n">closeAndDisplay</span><span class="o">(</span><span class="nc">Flights</span><span class="o">.</span><span class="n">schema</span><span class="o">)</span> |
| </code></pre></div> |
| <h4>BigQuery typed example:</h4> |
| <div class="highlight"><pre><code class="scala language-scala" data-lang="scala"><span class="o">%</span><span class="n">beam</span><span class="o">.</span><span class="n">scio</span> |
| <span class="nd">@BigQueryType</span><span class="o">.</span><span class="n">fromQuery</span><span class="o">(</span><span class="s">"""|SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays</span> |
| <span class="s"> |FROM [bigquery-samples:airline_ontime_data.flights]</span> |
| <span class="s"> |group by departure_airport</span> |
| <span class="s"> |order by 2 desc</span> |
| <span class="s"> |limit 10"""</span><span class="o">.</span><span class="n">stripMargin</span><span class="o">)</span> <span class="k">class</span> <span class="nc">Flights</span> |
| |
| <span class="k">val</span> <span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">args</span><span class="o">)</span> <span class="k">=</span> <span class="nc">ContextAndArgs</span><span class="o">(</span><span class="n">argz</span><span class="o">)</span> |
| <span class="n">sc</span><span class="o">.</span><span class="n">typedBigQuery</span><span class="o">[</span><span class="kt">Flights</span><span class="o">]().</span><span class="n">flatMap</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">no_of_delays</span><span class="o">).</span><span class="n">mean</span><span class="o">.</span><span class="n">closeAndDisplay</span><span class="o">()</span> |
| </code></pre></div> |
| <h4>Avro example:</h4> |
| <div class="highlight"><pre><code class="scala language-scala" data-lang="scala"><span class="o">%</span><span class="n">beam</span><span class="o">.</span><span class="n">scio</span> |
| <span class="k">import</span> <span class="nn">com.spotify.data.ExampleAvro</span> |
| |
| <span class="k">val</span> <span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">args</span><span class="o">)</span> <span class="k">=</span> <span class="nc">ContextAndArgs</span><span class="o">(</span><span class="n">argz</span><span class="o">)</span> |
| <span class="n">sc</span><span class="o">.</span><span class="n">avroFile</span><span class="o">[</span><span class="kt">ExampleAvro</span><span class="o">](</span><span class="s">"gs://<bucket>/tmp/my.avro"</span><span class="o">).</span><span class="n">take</span><span class="o">(</span><span class="mi">10</span><span class="o">).</span><span class="n">closeAndDisplay</span><span class="o">()</span> |
| </code></pre></div> |
| <h4>Avro example with a view schema:</h4> |
| <div class="highlight"><pre><code class="scala language-scala" data-lang="scala"><span class="o">%</span><span class="n">beam</span><span class="o">.</span><span class="n">scio</span> |
| <span class="k">import</span> <span class="nn">com.spotify.data.ExampleAvro</span> |
| <span class="k">import</span> <span class="nn">org.apache.avro.Schema</span> |
| |
| <span class="k">val</span> <span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">args</span><span class="o">)</span> <span class="k">=</span> <span class="nc">ContextAndArgs</span><span class="o">(</span><span class="n">argz</span><span class="o">)</span> |
| <span class="k">val</span> <span class="n">view</span> <span class="k">=</span> <span class="nc">Schema</span><span class="o">.</span><span class="n">parse</span><span class="o">(</span><span class="s">"""{"type":"record","name":"ExampleAvro","namespace":"com.spotify.data","fields":[{"name":"track","type":"string"}, {"name":"artist", "type":"string"}]}"""</span><span class="o">)</span> |
| |
| <span class="n">sc</span><span class="o">.</span><span class="n">avroFile</span><span class="o">[</span><span class="kt">EndSongCleaned</span><span class="o">](</span><span class="s">"gs://<bucket>/tmp/my.avro"</span><span class="o">).</span><span class="n">take</span><span class="o">(</span><span class="mi">10</span><span class="o">).</span><span class="n">closeAndDisplay</span><span class="o">(</span><span class="n">view</span><span class="o">)</span> |
| </code></pre></div> |
| <h3>Google credentials</h3> |
| |
| <p>Scio Interpreter will try to infer your Google Cloud credentials from its environment, it will take into the account:</p> |
| |
| <ul> |
| <li><code>argz</code> interpreter settings (<a href="https://github.com/spotify/scio/wiki#options">doc</a>)</li> |
| <li>environment variable (<code>GOOGLE_APPLICATION_CREDENTIALS</code>)</li> |
| <li>gcloud configuration</li> |
| </ul> |
| |
| <h4>BigQuery macro credentials</h4> |
| |
| <p>Currently BigQuery project for macro expansion is inferred using Google Dataflow's <a href="https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/options/GcpOptions.java#L187">DefaultProjectFactory().create()</a></p> |
| |
| </div> |
| </div> |
| |
| |
| <hr> |
| <footer> |
| <!-- <p>© 2022 The Apache Software Foundation</p>--> |
| </footer> |
| </div> |
| |
| |
| |
| |
| |
| |
| |
| </body> |
| </html> |
| |