| |
| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <title>Apache Zeppelin 0.10.1 Documentation: Pig Interpreter for Apache Zeppelin</title> |
| <meta name="description" content="Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs."> |
| <meta name="author" content="The Apache Software Foundation"> |
| |
| <!-- Enable responsive viewport --> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <!-- Le HTML5 shim, for IE6-8 support of HTML elements --> |
| <!--[if lt IE 9]> |
| <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script> |
| <![endif]--> |
| |
| <link href="/docs/0.10.1/assets/themes/zeppelin/font-awesome.min.css" rel="stylesheet"> |
| |
| <!-- Le styles --> |
| <link href="/docs/0.10.1/assets/themes/zeppelin/bootstrap/css/bootstrap.css" rel="stylesheet"> |
| <link href="/docs/0.10.1/assets/themes/zeppelin/css/style.css?body=1" rel="stylesheet" type="text/css"> |
| <link href="/docs/0.10.1/assets/themes/zeppelin/css/syntax.css" rel="stylesheet" type="text/css" media="screen" /> |
| <!-- Le fav and touch icons --> |
| <!-- Update these with your own images |
| <link rel="shortcut icon" href="images/favicon.ico"> |
| <link rel="apple-touch-icon" href="images/apple-touch-icon.png"> |
| <link rel="apple-touch-icon" sizes="72x72" href="images/apple-touch-icon-72x72.png"> |
| <link rel="apple-touch-icon" sizes="114x114" href="images/apple-touch-icon-114x114.png"> |
| --> |
| |
| <!-- Js --> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/jquery-1.10.2.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/bootstrap/js/bootstrap.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/docs.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/anchor.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/toc.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/lunr.min.js"></script> |
| <script src="/docs/0.10.1/assets/themes/zeppelin/js/search.js"></script> |
| |
| <!-- atom & rss feed --> |
| <link href="/docs/0.10.1/atom.xml" type="application/atom+xml" rel="alternate" title="Sitewide ATOM Feed"> |
| <link href="/docs/0.10.1/rss.xml" type="application/rss+xml" rel="alternate" title="Sitewide RSS Feed"> |
| |
| <!-- Matomo --> |
| <script> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| _paq.push(["setDoNotTrack", true]); |
| _paq.push(["disableCookies"]); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '69']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| </head> |
| |
| <body> |
| |
| <div id="menu" class="navbar navbar-inverse navbar-fixed-top" role="navigation"> |
| <div class="container navbar-container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <div class="navbar-brand"> |
| <a class="navbar-brand-main" href="http://zeppelin.apache.org"> |
| <img src="/docs/0.10.1/assets/themes/zeppelin/img/zeppelin_logo.png" width="50" |
| style="margin-top: -2px;" alt="I'm zeppelin"> |
| <span style="margin-left: 5px; font-size: 27px;">Zeppelin</span> |
| <a class="navbar-brand-version" href="/docs/0.10.1" |
| style="font-size: 15px; color: white;"> 0.10.1 |
| </a> |
| </a> |
| </div> |
| </div> |
| <nav class="navbar-collapse collapse" role="navigation"> |
| <ul class="nav navbar-nav"> |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Quick Start <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li class="title"><span>Getting Started</span></li> |
| <li><a href="/docs/0.10.1/quickstart/install.html">Install</a></li> |
| <li><a href="/docs/0.10.1/quickstart/explore_ui.html">Explore UI</a></li> |
| <li><a href="/docs/0.10.1/quickstart/tutorial.html">Tutorial</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Run Mode</span></li> |
| <li><a href="/docs/0.10.1/quickstart/kubernetes.html">Kubernetes</a></li> |
| <li><a href="/docs/0.10.1/quickstart/docker.html">Docker</a></li> |
| <li><a href="/docs/0.10.1/quickstart/yarn.html">Yarn</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.10.1/quickstart/spark_with_zeppelin.html">Spark with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/flink_with_zeppelin.html">Flink with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/sql_with_zeppelin.html">SQL with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/python_with_zeppelin.html">Python with Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/quickstart/r_with_zeppelin.html">R with Zeppelin</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Usage<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Dynamic Form</span></li> |
| <li><a href="/docs/0.10.1/usage/dynamic_form/intro.html">What is Dynamic Form?</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Display System</span></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#text">Text Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#html">HTML Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#table">Table Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/basic.html#network">Network Display</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/angular_backend.html">Angular Display using Backend API</a></li> |
| <li><a href="/docs/0.10.1/usage/display_system/angular_frontend.html">Angular Display using Frontend API</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Interpreter</span></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/overview.html">Overview</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/interpreter_binding_mode.html">Interpreter Binding Mode</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/user_impersonation.html">User Impersonation</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/dependency_management.html">Dependency Management</a></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/installation.html">Installing Interpreters</a></li> |
| <!--<li><a href="/docs/0.10.1/usage/interpreter/dynamic_loading.html">Dynamic Interpreter Loading (Experimental)</a></li>--> |
| <li><a href="/docs/0.10.1/usage/interpreter/execution_hooks.html">Execution Hooks (Experimental)</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Other Features</span></li> |
| <li><a href="/docs/0.10.1/usage/other_features/publishing_paragraphs.html">Publishing Paragraphs</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/personalized_mode.html">Personalized Mode</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/customizing_homepage.html">Customizing Zeppelin Homepage</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/notebook_actions.html">Notebook Actions</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/cron_scheduler.html">Cron Scheduler</a></li> |
| <li><a href="/docs/0.10.1/usage/other_features/zeppelin_context.html">Zeppelin Context</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>REST API</span></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/interpreter.html">Interpreter API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/zeppelin_server.html">Zeppelin Server API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/notebook.html">Notebook API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/notebook_repository.html">Notebook Repository API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/configuration.html">Configuration API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/credential.html">Credential API</a></li> |
| <li><a href="/docs/0.10.1/usage/rest_api/helium.html">Helium API</a></li> |
| <li class="title"><span>Zeppelin SDK</span></li> |
| <li><a href="/docs/0.10.1/usage/zeppelin_sdk/client_api.html">Client API</a></li> |
| <li><a href="/docs/0.10.1/usage/zeppelin_sdk/session_api.html">Session API</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Setup<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Basics</span></li> |
| <li><a href="/docs/0.10.1/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/setup/basics/hadoop_integration.html">Hadoop Integration</a></li> |
| <li><a href="/docs/0.10.1/setup/basics/multi_user_support.html">Multi-user Support</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Deployment</span></li> |
| <!--<li><a href="/docs/0.10.1/setup/deployment/docker.html">Docker Image for Zeppelin</a></li>--> |
| <li><a href="/docs/0.10.1/setup/deployment/spark_cluster_mode.html#spark-standalone-mode">Spark Cluster Mode: Standalone</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/spark_cluster_mode.html#spark-on-yarn-mode">Spark Cluster Mode: YARN</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/spark_cluster_mode.html#spark-on-mesos-mode">Spark Cluster Mode: Mesos</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/flink_and_spark_cluster.html">Zeppelin with Flink, Spark Cluster</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/cdh.html">Zeppelin on CDH</a></li> |
| <li><a href="/docs/0.10.1/setup/deployment/virtual_machine.html">Zeppelin on VM: Vagrant</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Security</span></li> |
| <li><a href="/docs/0.10.1/setup/security/authentication_nginx.html">HTTP Basic Auth using NGINX</a></li> |
| <li><a href="/docs/0.10.1/setup/security/shiro_authentication.html">Shiro Authentication</a></li> |
| <li><a href="/docs/0.10.1/setup/security/notebook_authorization.html">Notebook Authorization</a></li> |
| <li><a href="/docs/0.10.1/setup/security/datasource_authorization.html">Data Source Authorization</a></li> |
| <li><a href="/docs/0.10.1/setup/security/http_security_headers.html">HTTP Security Headers</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Notebook Storage</span></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-local-git-repository">Git Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-s3">S3 Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-azure">Azure Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-oss">OSS Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-zeppelinhub">ZeppelinHub Storage</a></li> |
| <li><a href="/docs/0.10.1/setup/storage/storage.html#notebook-storage-in-mongodb">MongoDB Storage</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Operation</span></li> |
| <li><a href="/docs/0.10.1/setup/operation/configuration.html">Configuration</a></li> |
| <li><a href="/docs/0.10.1/setup/operation/proxy_setting.html">Proxy Setting</a></li> |
| <li><a href="/docs/0.10.1/setup/operation/upgrading.html">Upgrading</a></li> |
| <li><a href="/docs/0.10.1/setup/operation/trouble_shooting.html">Trouble Shooting</a></li> |
| </ul> |
| </li> |
| |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">Interpreter <b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu"> |
| <li class="title"><span>Interpreters</span></li> |
| <li><a href="/docs/0.10.1/usage/interpreter/overview.html">Overview</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.10.1/interpreter/spark.html">Spark</a></li> |
| <li><a href="/docs/0.10.1/interpreter/flink.html">Flink</a></li> |
| <li><a href="/docs/0.10.1/interpreter/jdbc.html">JDBC</a></li> |
| <li><a href="/docs/0.10.1/interpreter/python.html">Python</a></li> |
| <li><a href="/docs/0.10.1/interpreter/r.html">R</a></li> |
| <li role="separator" class="divider"></li> |
| <li><a href="/docs/0.10.1/interpreter/alluxio.html">Alluxio</a></li> |
| <li><a href="/docs/0.10.1/interpreter/beam.html">Beam</a></li> |
| <li><a href="/docs/0.10.1/interpreter/bigquery.html">BigQuery</a></li> |
| <li><a href="/docs/0.10.1/interpreter/cassandra.html">Cassandra</a></li> |
| <li><a href="/docs/0.10.1/interpreter/elasticsearch.html">Elasticsearch</a></li> |
| <li><a href="/docs/0.10.1/interpreter/geode.html">Geode</a></li> |
| <li><a href="/docs/0.10.1/interpreter/groovy.html">Groovy</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hazelcastjet.html">Hazelcast Jet</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hbase.html">HBase</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hdfs.html">HDFS</a></li> |
| <li><a href="/docs/0.10.1/interpreter/hive.html">Hive</a></li> |
| <li><a href="/docs/0.10.1/interpreter/ignite.html">Ignite</a></li> |
| <li><a href="/docs/0.10.1/interpreter/influxdb.html">influxDB</a></li> |
| <li><a href="/docs/0.10.1/interpreter/java.html">Java</a></li> |
| <li><a href="/docs/0.10.1/interpreter/jupyter.html">Jupyter</a></li> |
| <li><a href="/docs/0.10.1/interpreter/kotlin.html">Kotlin</a></li> |
| <li><a href="/docs/0.10.1/interpreter/ksql.html">KSQL</a></li> |
| <li><a href="/docs/0.10.1/interpreter/kylin.html">Kylin</a></li> |
| <li><a href="/docs/0.10.1/interpreter/lens.html">Lens</a></li> |
| <li><a href="/docs/0.10.1/interpreter/livy.html">Livy</a></li> |
| <li><a href="/docs/0.10.1/interpreter/mahout.html">Mahout</a></li> |
| <li><a href="/docs/0.10.1/interpreter/markdown.html">Markdown</a></li> |
| <li><a href="/docs/0.10.1/interpreter/mongodb.html">MongoDB</a></li> |
| <li><a href="/docs/0.10.1/interpreter/neo4j.html">Neo4j</a></li> |
| <li><a href="/docs/0.10.1/interpreter/pig.html">Pig</a></li> |
| <li><a href="/docs/0.10.1/interpreter/postgresql.html">Postgresql, HAWQ</a></li> |
| <li><a href="/docs/0.10.1/interpreter/sap.html">SAP</a></li> |
| <li><a href="/docs/0.10.1/interpreter/scalding.html">Scalding</a></li> |
| <li><a href="/docs/0.10.1/interpreter/scio.html">Scio</a></li> |
| <li><a href="/docs/0.10.1/interpreter/shell.html">Shell</a></li> |
| <li><a href="/docs/0.10.1/interpreter/sparql.html">Sparql</a></li> |
| <li><a href="/docs/0.10.1/interpreter/submarine.html">Submarine</a></li> |
| </ul> |
| </li> |
| <li> |
| <a href="#" data-toggle="dropdown" class="dropdown-toggle">More<b class="caret"></b></a> |
| <ul class="dropdown-menu scrollable-menu" style="right: 0; left: auto;"> |
| <li class="title"><span>Extending Zeppelin</span></li> |
| <li><a href="/docs/0.10.1/development/writing_zeppelin_interpreter.html">Writing Zeppelin Interpreter</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Helium (Experimental)</span></li> |
| <li><a href="/docs/0.10.1/development/helium/overview.html">Overview</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_application.html">Writing Helium Application</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_spell.html">Writing Helium Spell</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_visualization_basic.html">Writing Helium Visualization: Basics</a></li> |
| <li><a href="/docs/0.10.1/development/helium/writing_visualization_transformation.html">Writing Helium Visualization: Transformation</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>Contributing to Zeppelin</span></li> |
| <li><a href="/docs/0.10.1/setup/basics/how_to_build.html">How to Build Zeppelin</a></li> |
| <li><a href="/docs/0.10.1/development/contribution/useful_developer_tools.html">Useful Developer Tools</a></li> |
| <li><a href="/docs/0.10.1/development/contribution/how_to_contribute_code.html">How to Contribute (code)</a></li> |
| <li><a href="/docs/0.10.1/development/contribution/how_to_contribute_website.html">How to Contribute (website)</a></li> |
| <li role="separator" class="divider"></li> |
| <li class="title"><span>External Resources</span></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="https://zeppelin.apache.org/community.html">Mailing List</a></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Home">Apache Zeppelin Wiki</a></li> |
| <li><a target="_blank" rel="noopener noreferrer" href="http://stackoverflow.com/questions/tagged/apache-zeppelin">Stackoverflow Questions about Zeppelin</a></li> |
| </ul> |
| </li> |
| <li> |
| <a href="/docs/0.10.1/search.html" class="nav-search-link"> |
| <span class="fa fa-search nav-search-icon"></span> |
| </a> |
| </li> |
| </ul> |
| </nav><!--/.navbar-collapse --> |
| </div> |
| </div> |
| |
| |
| |
| <div class="content"> |
| |
| <!--<div class="hero-unit Pig Interpreter for Apache Zeppelin"> |
| <h1></h1> |
| </div> |
| --> |
| |
| <div class="row"> |
| <div class="col-md-12"> |
| <!-- |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <h1>Pig Interpreter for Apache Zeppelin</h1> |
| |
| <div id="toc"></div> |
| |
| <h2>Overview</h2> |
| |
| <p><a href="https://pig.apache.org/">Apache Pig</a> is a platform for analyzing large data sets that consists of |
| a high-level language for expressing data analysis programs, |
| coupled with infrastructure for evaluating these programs. |
| The salient property of Pig programs is that their structure is amenable to substantial parallelization, |
| which in turns enables them to handle very large data sets.</p> |
| |
| <h2>Supported interpreter type</h2> |
| |
| <ul> |
| <li><p><code>%pig.script</code> (default Pig interpreter, so you can use <code>%pig</code>)</p> |
| |
| <p><code>%pig.script</code> is like the Pig grunt shell. Anything you can run in Pig grunt shell can be run in <code>%pig.script</code> interpreter, it is used for running Pig script where you don’t need to visualize the data, it is suitable for data munging. </p></li> |
| <li><p><code>%pig.query</code></p> |
| |
| <p><code>%pig.query</code> is a little different compared with <code>%pig.script</code>. It is used for exploratory data analysis via Pig latin where you can leverage Zeppelin’s visualization ability. There're 2 minor differences in the last statement between <code>%pig.script</code> and <code>%pig.query</code></p> |
| |
| <ul> |
| <li>No pig alias in the last statement in <code>%pig.query</code> (read the examples below).</li> |
| <li>The last statement must be in single line in <code>%pig.query</code></li> |
| </ul></li> |
| </ul> |
| |
| <h2>How to use</h2> |
| |
| <h3>How to setup Pig execution modes.</h3> |
| |
| <ul> |
| <li><p>Local Mode</p> |
| |
| <p>Set <code>zeppelin.pig.execType</code> as <code>local</code>.</p></li> |
| <li><p>MapReduce Mode</p> |
| |
| <p>Set <code>zeppelin.pig.execType</code> as <code>mapreduce</code>. HADOOP_CONF_DIR needs to be specified in <code>ZEPPELIN_HOME/conf/zeppelin-env.sh</code>.</p></li> |
| <li><p>Tez Local Mode</p> |
| |
| <p>Only Tez 0.7 is supported. Set <code>zeppelin.pig.execType</code> as <code>tez_local</code>.</p></li> |
| <li><p>Tez Mode</p> |
| |
| <p>Only Tez 0.7 is supported. Set <code>zeppelin.pig.execType</code> as <code>tez</code>. HADOOP_CONF_DIR and TEZ_CONF_DIR needs to be specified in <code>ZEPPELIN_HOME/conf/zeppelin-env.sh</code>.</p></li> |
| <li><p>Spark Local Mode</p> |
| |
| <p>Only Spark 1.6.x is supported, by default it is Spark 1.6.3. Set <code>zeppelin.pig.execType</code> as <code>spark_local</code>.</p></li> |
| <li><p>Spark Mode</p> |
| |
| <p>Only Spark 1.6.x is supported, by default it is Spark 1.6.3. Set <code>zeppelin.pig.execType</code> as <code>spark</code>. For now, only yarn-client mode is supported. To enable it, you need to set property <code>SPARK_MASTER</code> to yarn-client and set <code>SPARK_JAR</code> to the spark assembly jar.</p></li> |
| </ul> |
| |
| <h3>How to choose custom Spark Version</h3> |
| |
| <p>By default, Pig Interpreter would use Spark 1.6.3 built with scala 2.10, if you want to use another spark version or scala version, |
| you need to rebuild Zeppelin by specifying the custom Spark version via -Dpig.spark.version=<custom_spark_version> and scala version via -Dpig.scala.version=<scala_version> in the maven build command.</p> |
| |
| <h3>How to configure interpreter</h3> |
| |
| <p>At the Interpreters menu, you have to create a new Pig interpreter. Pig interpreter has below properties by default. |
| And you can set any Pig properties here which will be passed to Pig engine. (like tez.queue.name & mapred.job.queue.name). |
| Besides, we use paragraph title as job name if it exists, else use the last line of Pig script. |
| So you can use that to find app running in YARN RM UI.</p> |
| |
| <table class="table-configuration"> |
| <tr> |
| <th>Property</th> |
| <th>Default</th> |
| <th>Description</th> |
| </tr> |
| <tr> |
| <td>zeppelin.pig.execType</td> |
| <td>mapreduce</td> |
| <td>Execution mode for pig runtime. local | mapreduce | tez_local | tez | spark_local | spark </td> |
| </tr> |
| <tr> |
| <td>zeppelin.pig.includeJobStats</td> |
| <td>false</td> |
| <td>whether display jobStats info in <code>%pig.script</code></td> |
| </tr> |
| <tr> |
| <td>zeppelin.pig.maxResult</td> |
| <td>1000</td> |
| <td>max row number displayed in <code>%pig.query</code></td> |
| </tr> |
| <tr> |
| <td>tez.queue.name</td> |
| <td>default</td> |
| <td>queue name for tez engine</td> |
| </tr> |
| <tr> |
| <td>mapred.job.queue.name</td> |
| <td>default</td> |
| <td>queue name for mapreduce engine</td> |
| </tr> |
| <tr> |
| <td>SPARK_MASTER</td> |
| <td>local</td> |
| <td>local | yarn-client</td> |
| </tr> |
| <tr> |
| <td>SPARK_JAR</td> |
| <td></td> |
| <td>The spark assembly jar, both jar in local or hdfs is supported. Put it on hdfs could have |
| performance benefit</td> |
| </tr> |
| </table> |
| |
| <h3>Example</h3> |
| |
| <h5>pig</h5> |
| <div class="highlight"><pre><code class="text language-text" data-lang="text">%pig |
| |
| bankText = load 'bank.csv' using PigStorage(';'); |
| bank = foreach bankText generate $0 as age, $1 as job, $2 as marital, $3 as education, $5 as balance; |
| bank = filter bank by age != '"age"'; |
| bank = foreach bank generate (int)age, REPLACE(job,'"','') as job, REPLACE(marital, '"', '') as marital, (int)(REPLACE(balance, '"', '')) as balance; |
| store bank into 'clean_bank.csv' using PigStorage(';'); -- this statement is optional, it just show you that most of time %pig.script is used for data munging before querying the data. |
| </code></pre></div> |
| <h5>pig.query</h5> |
| |
| <p>Get the number of each age where age is less than 30</p> |
| <div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query |
| |
| bank_data = filter bank by age < 30; |
| b = group bank_data by age; |
| foreach b generate group, COUNT($1); |
| </code></pre></div> |
| <p>The same as above, but use dynamic text form so that use can specify the variable maxAge in textbox. |
| (See screenshot below). Dynamic form is a very cool feature of Zeppelin, you can refer this <a href="(../usage/dynamic_form/intro.html">link</a>) for details.</p> |
| <div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query |
| |
| bank_data = filter bank by age < ${maxAge=40}; |
| b = group bank_data by age; |
| foreach b generate group, COUNT($1) as count; |
| </code></pre></div> |
| <p>Get the number of each age for specific marital type, |
| also use dynamic form here. User can choose the marital type in the dropdown list (see screenshot below).</p> |
| <div class="highlight"><pre><code class="text language-text" data-lang="text">%pig.query |
| |
| bank_data = filter bank by marital=='${marital=single,single|divorced|married}'; |
| b = group bank_data by age; |
| foreach b generate group, COUNT($1) as count; |
| </code></pre></div> |
| <p>The above examples are in the Pig tutorial note in Zeppelin, you can check that for details. Here's the screenshot.</p> |
| |
| <p><img class="img-responsive" width="1024px" style="margin:0 auto; padding: 26px;" src="/docs/0.10.1/assets/themes/zeppelin/img/pig_zeppelin_tutorial.png" /></p> |
| |
| <p>Data is shared between <code>%pig</code> and <code>%pig.query</code>, so that you can do some common work in <code>%pig</code>, |
| and do different kinds of query based on the data of <code>%pig</code>. |
| Besides, we recommend you to specify alias explicitly so that the visualization can display |
| the column name correctly. In the above example 2 and 3 of <code>%pig.query</code>, we name <code>COUNT($1)</code> as <code>count</code>. |
| If you don't do this, then we will name it using position. |
| E.g. in the above first example of <code>%pig.query</code>, we will use <code>col_1</code> in chart to represent <code>COUNT($1)</code>.</p> |
| |
| </div> |
| </div> |
| |
| |
| <hr> |
| <footer> |
| <!-- <p>© 2022 The Apache Software Foundation</p>--> |
| </footer> |
| </div> |
| |
| |
| |
| |
| |
| |
| |
| </body> |
| </html> |
| |