blob: 6d55e8d3288762e1bb7a1997bc80165ac75299e0 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
<document>
<header>
<title>Administration</title>
</header>
<body>
<!-- ========================================================== -->
<!-- SET UP OUTPUT LOCATION STRICT CHECK -->
<section>
<title>Output location strict check</title>
<p>Pig scripts could contain multiple STORE statements. There are cases when one would like to avoid writing to the same output location. Pig provides admins/script writers with a property to check if multiple STORE statements make an attempt to write to the same output directory. And fail fast letting the user know of the same.</p>
<p>Specifically this makes sense for file-based output locations (HDFS, Local FS, S3..) to avoid Pig script from failing when multiple MR jobs write to the same location. </p>
<p>To enforce strict checking of output location, set <strong>pig.location.check.strict=true</strong>. See also <a href="start.html#properties">Pig Properties</a> on how to set this property.</p>
</section>
<!-- DISABLE PIG COMMANDS AND OPERATORS -->
<section>
<title>Disabling Pig commands and operators</title>
<p>This is an admin feature providing ability to blacklist or/and whitelist certain commands and operations. Pig exposes a few of these that could be not very safe in a multitenant environment. For example, "sh" invokes shell commands, "set" allows users to change non-final configs. While these are tremendously useful in general, having an ability to disable would make Pig a safer platform. The goal is to allow administrators to be able to have more control over user scripts. Default behaviour would still be the same - no filters applied on commands and operators.</p>
<p>There are two properties you can use to control what users are able to do</p>
<ul>
<li>pig.blacklist</li>
<li>pig.whitelist</li>
</ul>
<section>
<title>Blacklisting</title>
<p>Set "pig.blacklist" to a comma-delimited set of operators and commands. For eg, <strong>pig.blacklist=rm,kill,cross</strong> would disable users from executing any of "rm", "kill" commands and "cross" operator.</p>
</section>
<section>
<title>Whitelisting</title>
<p>This is an even safer approach to disallowing functionality in Pig. Using this you will be able to disable all commands and operators that are not a part of the whitelist. For eg, <strong>pig.whitelist=load,filter,store</strong> will disallow every command and operator other than "load", "filter" and "store".</p>
</section>
<section>
<title>Note</title>
<p>There should not be any conflicts between blacklist and whitelist. Make sure to have them entirely distinct or Pig will complain.</p>
</section>
</section>
</body>
</document>