blob: 32217d1afb3deae7da8aaa4d9dd543926680c216 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]-->
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 1.5.4">
<title>Load and Transform Guide</title>
<style>
/**
* @@@ START COPYRIGHT @@@
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
* @@@ END COPYRIGHT @@@
*/
/* Asciidoctor default stylesheet | MIT License | http://asciidoctor.org */
/* Remove the comments around the @import statement below when using this as a custom stylesheet */
/*@import "https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400";*/
article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}
audio,canvas,video{display:inline-block}
audio:not([controls]){display:none;height:0}
[hidden],template{display:none}
script{display:none!important}
html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}
body{margin:0}
a{background:transparent}
a:focus{outline:thin dotted}
a:active,a:hover{outline:0}
h1{font-size:2em;margin:.67em 0}
abbr[title]{border-bottom:1px dotted}
b,strong{font-weight:bold}
dfn{font-style:italic}
hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}
mark{background:#ff0;color:#000}
code,kbd,pre,samp{font-family:monospace;font-size:1em}
pre{white-space:pre-wrap}
q{quotes:"\201C" "\201D" "\2018" "\2019"}
small{font-size:80%}
sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}
sup{top:-.5em}
sub{bottom:-.25em}
img{border:0}
svg:not(:root){overflow:hidden}
figure{margin:0}
fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}
legend{border:0;padding:0}
button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}
button,input{line-height:normal}
button,select{text-transform:none}
button,html input[type="button"],input[type="reset"],input[type="submit"]{-webkit-appearance:button;cursor:pointer}
button[disabled],html input[disabled]{cursor:default}
input[type="checkbox"],input[type="radio"]{box-sizing:border-box;padding:0}
input[type="search"]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}
input[type="search"]::-webkit-search-cancel-button,input[type="search"]::-webkit-search-decoration{-webkit-appearance:none}
button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}
textarea{overflow:auto;vertical-align:top}
table{border-collapse:collapse;border-spacing:0}
*,*:before,*:after{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box}
html,body{font-size:100%}
body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif,serif;font-weight:400;font-style:normal;line-height:1;position:relative;cursor:auto}
a:hover{cursor:pointer}
img,object,embed{max-width:100%;height:auto}
object,embed{height:100%}
img{-ms-interpolation-mode:bicubic}
#map_canvas img,#map_canvas embed,#map_canvas object,.map_canvas img,.map_canvas embed,.map_canvas object{max-width:none!important}
.left{float:left!important}
.right{float:right!important}
.text-left{text-align:left!important}
.text-right{text-align:right!important}
.text-center{text-align:center!important}
.text-justify{text-align:justify!important}
.hide{display:none}
.antialiased,body{-webkit-font-smoothing:antialiased}
img{display:inline-block;vertical-align:middle}
textarea{height:auto;min-height:50px}
select{width:100%}
p.lead,.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type p{font-size:1.21875em;line-height:1.6}
.subheader,.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#3188ac;font-weight:400;margin-top:0;margin-bottom:.25em}
div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0;direction:ltr}
a{color:#2156a5;text-decoration:underline;line-height:inherit}
a:hover,a:focus{color:#1d4b8f}
a img{border:none}
p{font-family:inherit;font-weight:400;font-size:1em;line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility}
p aside{font-size:.875em;line-height:1.35;font-style:italic}
/* Defines headings */
h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-weight:bold;font-style:normal;color:#3188ac;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em}
h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0}
h1{font-size:2.125em}
h2{font-size:1.6875em}
h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em}
h4,h5{font-size:1.125em}
h6{font-size:1em}
hr{border:solid #ddddd8;border-width:1px 0 0;clear:both;margin:1.25em 0 1.1875em;height:0}
em,i{font-style:italic;line-height:inherit}
strong,b{font-weight:bold;line-height:inherit}
small{font-size:60%;line-height:inherit}
/* Defines the `text` (passthru) format */
code{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:bold;color:#3188ac}
ul,ol,dl{font-size:1em;line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit}
ul,ol,ul.no-bullet,ol.no-bullet{margin-left:1.5em}
ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0;font-size:1em}
ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit}
ul.square{list-style-type:square}
ul.circle{list-style-type:circle}
ul.disc{list-style-type:disc}
ul.no-bullet{list-style:none}
ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0}
dl dt{margin-bottom:.3125em;font-weight:bold}
dl dd{margin-bottom:1.25em}
abbr,acronym{text-transform:uppercase;font-size:90%;color:rgba(0,0,0,.8);border-bottom:1px dotted #ddd;cursor:help}
abbr{text-transform:none}
blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px solid #ddd}
blockquote cite{display:block;font-size:.9375em;color:rgba(0,0,0,.6)}
blockquote cite:before{content:"\2014 \0020"}
blockquote cite a,blockquote cite a:visited{color:rgba(0,0,0,.6)}
blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)}
@media only screen and (min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2}
h1{font-size:2.75em}
h2{font-size:2.3125em}
h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em}
h4{font-size:1.4375em}}table{background:#fff;margin-bottom:1.25em;border:solid 1px #dedede}
table thead,table tfoot{background:#f7f8f7;font-weight:bold}
table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr td{padding:.5em .625em .625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left}
table tr th,table tr td{padding:.5625em .625em;font-size:inherit;color:rgba(0,0,0,.8)}
table tr.even,table tr.alt,table tr:nth-of-type(even){background:#f8f8f7}
table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot tr td{display:table-cell;line-height:1.6}
h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em}
h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title strong,h4 strong,h5 strong,h6 strong{font-weight:400}
.clearfix:before,.clearfix:after,.float-group:before,.float-group:after{content:" ";display:table}
.clearfix:after,.float-group:after{clear:both}
*:not(pre)>code{font-size:1.0em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;-webkit-border-radius:4px;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}
/**:not(pre)>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;background-color:#f7f7f8;-webkit-border-radius:4px;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}*/
pre,pre>code{line-height:1.45;color:rgba(0,0,0,.9);font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;text-rendering:optimizeSpeed}
.keyseq{color:rgba(51,51,51,.8)}
kbd{display:inline-block;color:rgba(0,0,0,.8);font-size:.75em;line-height:1.4;background-color:#f7f7f7;border:1px solid #ccc;-webkit-border-radius:3px;border-radius:3px;-webkit-box-shadow:0 1px 0 rgba(0,0,0,.2),0 0 0 .1em white inset;box-shadow:0 1px 0 rgba(0,0,0,.2),0 0 0 .1em #fff inset;margin:-.15em .15em 0 .15em;padding:.2em .6em .2em .5em;vertical-align:middle;white-space:nowrap}
.keyseq kbd:first-child{margin-left:0}
.keyseq kbd:last-child{margin-right:0}
.menuseq,.menu{color:rgba(0,0,0,.8)}
b.button:before,b.button:after{position:relative;top:-1px;font-weight:400}
b.button:before{content:"[";padding:0 3px 0 2px}
b.button:after{content:"]";padding:0 2px 0 3px}
p a>code:hover{color:rgba(0,0,0,.9)}
#header,#content,#footnotes,#footer{width:100%;margin-left:auto;margin-right:auto;margin-top:0;margin-bottom:0;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em}
#header:before,#header:after,#content:before,#content:after,#footnotes:before,#footnotes:after,#footer:before,#footer:after{content:" ";display:table}
#header:after,#content:after,#footnotes:after,#footer:after{clear:both}
#content{margin-top:1.25em}
#content:before{content:none}
/* #header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0} */
#header>h1:first-child{color:#3188ac;margin-top:2.25rem;margin-bottom:0}
#header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #ddddd8}
#header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px solid #ddddd8;padding-bottom:8px}
#header .details{border-bottom:1px solid #ddddd8;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:-ms-flexbox;display:-webkit-flex;display:flex;-ms-flex-flow:row wrap;-webkit-flex-flow:row wrap;flex-flow:row wrap}
#header .details span:first-child{margin-left:-.125em}
#header .details span.email a{color:rgba(0,0,0,.85)}
#header .details br{display:none}
#header .details br+span:before{content:"\00a0\2013\00a0"}
#header .details br+span.author:before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)}
#header .details br+span#revremark:before{content:"\00a0|\00a0"}
#header #revnumber{text-transform:capitalize}
#header #revnumber:after{content:"\00a0"}
#content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px solid #ddddd8;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem}
#toc{border-bottom:1px solid #efefed;padding-bottom:.5em}
#toc>ul{margin-left:.125em}
#toc ul.sectlevel0>li>a{font-style:italic}
#toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0}
#toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none}
#toc a{text-decoration:none}
#toc a:active{text-decoration:underline}
#toctitle{color:#3188ac;font-size:1.2em}
@media only screen and (min-width:768px){#toctitle{font-size:1.375em}
body.toc2{padding-left:15em;padding-right:0}
#toc.toc2{margin-top:0!important;background-color:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px solid #efefed;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em 1em;height:100%;overflow:auto}
#toc.toc2 #toctitle{margin-top:0;font-size:1.2em}
#toc.toc2>ul{font-size:.9em;margin-bottom:0}
#toc.toc2 ul ul{margin-left:0;padding-left:1em}
#toc.toc2 ul.sectlevel0 ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em}
body.toc2.toc-right{padding-left:0;padding-right:15em}
body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid #efefed;left:auto;right:0}}@media only screen and (min-width:1280px){body.toc2{padding-left:20em;padding-right:0}
/* Controls width of panel */
#toc.toc2{width:20em}
#toc.toc2 #toctitle{font-size:1.375em}
#toc.toc2>ul{font-size:.95em}
#toc.toc2 ul ul{padding-left:1.25em}
body.toc2.toc-right{padding-left:0;padding-right:20em}}#content #toc{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px}
#content #toc>:first-child{margin-top:0}
#content #toc>:last-child{margin-bottom:0}
#footer{max-width:100%;background-color:rgba(0,0,0,.8);padding:1.25em}
#footer-text,#footer_nav{color:rgba(255,255,255,.8);line-height:1.44}
#footer a{color: #990000}
.sect1{padding-bottom:.625em}
@media only screen and (min-width:768px){.sect1{padding-bottom:1.25em}}.sect1+.sect1{border-top:1px solid #efefed}
#content h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400}
#content h1>a.anchor:before,h2>a.anchor:before,h3>a.anchor:before,#toctitle>a.anchor:before,.sidebarblock>.content>.title>a.anchor:before,h4>a.anchor:before,h5>a.anchor:before,h6>a.anchor:before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em}
#content h1:hover>a.anchor,#content h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible}
#content h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#990000;text-decoration:none}
#content h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221}
.audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em}
.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif,serif;font-size:1rem;font-weight:bold}
/* Here */
table.tableblock>caption.title{white-space:nowrap;overflow:visible;max-width:0}
.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type p{color:rgba(0,0,0,.85)}
table.tableblock #preamble>.sectionbody>.paragraph:first-of-type p{font-size:inherit}
.admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%}
.admonitionblock>table td.icon{text-align:center;width:80px}
.admonitionblock>table td.icon img{max-width:none}
.admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open Sans","DejaVu Sans",sans-serif;text-transform:uppercase}
.admonitionblock>table td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid #ddddd8;color:rgba(0,0,0,.6)}
.admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0}
.exampleblock>.content{border-style:solid;border-width:1px;border-color:#e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;-webkit-border-radius:4px;border-radius:4px}
.exampleblock>.content>:first-child{margin-top:0}
.exampleblock>.content>:last-child{margin-bottom:0}
.sidebarblock{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px}
.sidebarblock>:first-child{margin-top:0}
.sidebarblock>:last-child{margin-bottom:0}
.sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center}
.exampleblock>.content>:last-child>:last-child,.exampleblock>.content .olist>ol>li:last-child>:last-child,.exampleblock>.content .ulist>ul>li:last-child>:last-child,.exampleblock>.content .qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content .olist>ol>li:last-child>:last-child,.sidebarblock>.content .ulist>ul>li:last-child>:last-child,.sidebarblock>.content .qlist>ol>li:last-child>:last-child{margin-bottom:0}
.literalblock pre,.listingblock pre:not(.highlight),.listingblock pre[class="highlight"],.listingblock pre[class^="highlight "],.listingblock pre.CodeRay,.listingblock pre.prettyprint{background:#f7f7f8}
.sidebarblock .literalblock pre,.sidebarblock .listingblock pre:not(.highlight),.sidebarblock .listingblock pre[class="highlight"],.sidebarblock .listingblock pre[class^="highlight "],.sidebarblock .listingblock pre.CodeRay,.sidebarblock .listingblock pre.prettyprint{background:#f2f1f1}
.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{-webkit-border-radius:4px;border-radius:4px;word-wrap:break-word;padding:1em;font-size:.8125em}
.literalblock pre.nowrap,.literalblock pre[class].nowrap,.listingblock pre.nowrap,.listingblock pre[class].nowrap{overflow-x:auto;white-space:pre;word-wrap:normal}
@media only screen and (min-width:768px){.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{font-size:.90625em}}@media only screen and (min-width:1280px){.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{font-size:1em}}.literalblock.output pre{color:#f7f7f8;background-color:rgba(0,0,0,.9)}
.listingblock pre.highlightjs{padding:0}
.listingblock pre.highlightjs>code{padding:1em;-webkit-border-radius:4px;border-radius:4px}
.listingblock pre.prettyprint{border-width:0}
.listingblock>.content{position:relative}
.listingblock code[data-lang]:before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:#999}
.listingblock:hover code[data-lang]:before{display:block}
.listingblock.terminal pre .command:before{content:attr(data-prompt);padding-right:.5em;color:#999}
.listingblock.terminal pre .command:not([data-prompt]):before{content:"$"}
table.pyhltable{border-collapse:separate;border:0;margin-bottom:0;background:none}
table.pyhltable td{vertical-align:top;padding-top:0;padding-bottom:0}
table.pyhltable td.code{padding-left:.75em;padding-right:0}
pre.pygments .lineno,table.pyhltable td:not(.code){color:#999;padding-left:0;padding-right:.5em;border-right:1px solid #ddddd8}
pre.pygments .lineno{display:inline-block;margin-right:.25em}
table.pyhltable .linenodiv{background:none!important;padding-right:0!important}
.quoteblock{margin:0 1em 1.25em 1.5em;display:table}
.quoteblock>.title{margin-left:-1.5em;margin-bottom:.75em}
.quoteblock blockquote,.quoteblock blockquote p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify}
.quoteblock blockquote{margin:0;padding:0;border:0}
.quoteblock blockquote:before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0 1px 2px rgba(0,0,0,.1)}
.quoteblock blockquote>.paragraph:last-child p{margin-bottom:0}
.quoteblock .attribution{margin-top:.5em;margin-right:.5ex;text-align:right}
.quoteblock .quoteblock{margin-left:0;margin-right:0;padding:.5em 0;border-left:3px solid rgba(0,0,0,.6)}
.quoteblock .quoteblock blockquote{padding:0 0 0 .75em}
.quoteblock .quoteblock blockquote:before{display:none}
.verseblock{margin:0 1em 1.25em 1em}
.verseblock pre{font-family:"Open Sans","DejaVu Sans",sans;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility}
.verseblock pre strong{font-weight:400}
.verseblock .attribution{margin-top:1.25rem;margin-left:.5ex}
.quoteblock .attribution,.verseblock .attribution{font-size:.9375em;line-height:1.45;font-style:italic}
.quoteblock .attribution br,.verseblock .attribution br{display:none}
.quoteblock .attribution cite,.verseblock .attribution cite{display:block;letter-spacing:-.05em;color:rgba(0,0,0,.6)}
.quoteblock.abstract{margin:0 0 1.25em 0;display:block}
.quoteblock.abstract blockquote,.quoteblock.abstract blockquote p{text-align:left;word-spacing:0}
.quoteblock.abstract blockquote:before,.quoteblock.abstract blockquote p:first-of-type:before{display:none}
table.tableblock{max-width:100%;border-collapse:separate}
table.tableblock td>.paragraph:last-child p>p:last-child,table.tableblock th>p:last-child,table.tableblock td>p:last-child{margin-bottom:0}
table.spread{width:100%}
table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede}
table.grid-all th.tableblock,table.grid-all td.tableblock{border-width:0 1px 1px 0}
table.grid-all tfoot>tr>th.tableblock,table.grid-all tfoot>tr>td.tableblock{border-width:1px 1px 0 0}
table.grid-cols th.tableblock,table.grid-cols td.tableblock{border-width:0 1px 0 0}
table.grid-all *>tr>.tableblock:last-child,table.grid-cols *>tr>.tableblock:last-child{border-right-width:0}
table.grid-rows th.tableblock,table.grid-rows td.tableblock{border-width:0 0 1px 0}
table.grid-all tbody>tr:last-child>th.tableblock,table.grid-all tbody>tr:last-child>td.tableblock,table.grid-all thead:last-child>tr>th.tableblock,table.grid-rows tbody>tr:last-child>th.tableblock,table.grid-rows tbody>tr:last-child>td.tableblock,table.grid-rows thead:last-child>tr>th.tableblock{border-bottom-width:0}
table.grid-rows tfoot>tr>th.tableblock,table.grid-rows tfoot>tr>td.tableblock{border-width:1px 0 0 0}
table.frame-all{border-width:1px}
table.frame-sides{border-width:0 1px}
table.frame-topbot{border-width:1px 0}
th.halign-left,td.halign-left{text-align:left}
th.halign-right,td.halign-right{text-align:right}
th.halign-center,td.halign-center{text-align:center}
th.valign-top,td.valign-top{vertical-align:top}
th.valign-bottom,td.valign-bottom{vertical-align:bottom}
th.valign-middle,td.valign-middle{vertical-align:middle}
table thead th,table tfoot th{font-weight:bold}
tbody tr th{display:table-cell;line-height:1.6;background:#f7f8f7}
tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th p{color:rgba(0,0,0,.8);font-weight:bold}
p.tableblock>code:only-child{background:none;padding:0}
p.tableblock{font-size:1em}
td>div.verse{white-space:pre}
ol{margin-left:1.75em}
ul li ol{margin-left:1.5em}
dl dd{margin-left:1.125em}
dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0}
ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist .ulist{margin-bottom:.625em}
ul.unstyled,ol.unnumbered,ul.checklist,ul.none{list-style-type:none}
ul.unstyled,ol.unnumbered,ul.checklist{margin-left:.625em}
ul.checklist li>p:first-child>.fa-square-o:first-child,ul.checklist li>p:first-child>.fa-check-square-o:first-child{width:1em;font-size:.85em}
ul.checklist li>p:first-child>input[type="checkbox"]:first-child{width:1em;position:relative;top:1px}
ul.inline{margin:0 auto .625em auto;margin-left:-1.375em;margin-right:0;padding:0;list-style:none;overflow:hidden}
ul.inline>li{list-style:none;float:left;margin-left:1.375em;display:block}
ul.inline>li>*{display:block}
.unstyled dl dt{font-weight:400;font-style:normal}
ol.arabic{list-style-type:decimal}
ol.decimal{list-style-type:decimal-leading-zero}
ol.loweralpha{list-style-type:lower-alpha}
ol.upperalpha{list-style-type:upper-alpha}
ol.lowerroman{list-style-type:lower-roman}
ol.upperroman{list-style-type:upper-roman}
ol.lowergreek{list-style-type:lower-greek}
.hdlist>table,.colist>table{border:0;background:none}
.hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none}
td.hdlist1{padding-right:.75em;font-weight:bold}
td.hdlist1,td.hdlist2{vertical-align:top}
.literalblock+.colist,.listingblock+.colist{margin-top:-.5em}
.colist>table tr>td:first-of-type{padding:0 .75em;line-height:1}
.colist>table tr>td:last-of-type{padding:.25em 0}
.thumb,.th{line-height:0;display:inline-block;border:solid 4px #fff;-webkit-box-shadow:0 0 0 1px #ddd;box-shadow:0 0 0 1px #ddd}
.imageblock.left,.imageblock[style*="float: left"]{margin:.25em .625em 1.25em 0}
.imageblock.right,.imageblock[style*="float: right"]{margin:.25em 0 1.25em .625em}
.imageblock>.title{margin-bottom:0}
.imageblock.thumb,.imageblock.th{border-width:6px}
.imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em}
.image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0}
.image.left{margin-right:.625em}
.image.right{margin-left:.625em}
a.image{text-decoration:none}
span.footnote,span.footnoteref{vertical-align:super;font-size:.875em}
span.footnote a,span.footnoteref a{text-decoration:none}
span.footnote a:active,span.footnoteref a:active{text-decoration:underline}
#footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em}
#footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em 0;border-width:1px 0 0 0}
#footnotes .footnote{padding:0 .375em;line-height:1.3;font-size:.875em;margin-left:1.2em;text-indent:-1.2em;margin-bottom:.2em}
#footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none}
#footnotes .footnote:last-of-type{margin-bottom:0}
#content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0}
.gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0}
.gist .file-data>table td.line-data{width:99%}
div.unbreakable{page-break-inside:avoid}
.big{font-size:larger}
.small{font-size:smaller}
.underline{text-decoration:underline}
.overline{text-decoration:overline}
.line-through{text-decoration:line-through}
.aqua{color:#00bfbf}
.aqua-background{background-color:#00fafa}
.black{color:#000}
.black-background{background-color:#000}
.blue{color:#0000bf}
.blue-background{background-color:#0000fa}
.fuchsia{color:#bf00bf}
.fuchsia-background{background-color:#fa00fa}
.gray{color:#606060}
.gray-background{background-color:#7d7d7d}
.green{color:#006000}
.green-background{background-color:#007d00}
.lime{color:#00bf00}
.lime-background{background-color:#00fa00}
.maroon{color:#600000}
.maroon-background{background-color:#7d0000}
.navy{color:#000060}
.navy-background{background-color:#00007d}
.olive{color:#606000}
.olive-background{background-color:#7d7d00}
.purple{color:#600060}
.purple-background{background-color:#7d007d}
.red{color:#bf0000}
.red-background{background-color:#fa0000}
.silver{color:#909090}
.silver-background{background-color:#bcbcbc}
.teal{color:#006060}
.teal-background{background-color:#007d7d}
.white{color:#bfbfbf}
.white-background{background-color:#fafafa}
.yellow{color:#bfbf00}
.yellow-background{background-color:#fafa00}
span.icon>.fa{cursor:default}
.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
.admonitionblock td.icon .icon-note:before{content:"\f05a";color:#19407c}
.admonitionblock td.icon .icon-tip:before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
.admonitionblock td.icon .icon-warning:before{content:"\f071";color:#bf6900}
.admonitionblock td.icon .icon-caution:before{content:"\f06d";color:#bf3400}
.admonitionblock td.icon .icon-important:before{content:"\f06a";color:#bf0000}
.conum[data-value]{display:inline-block;color:#fff!important;background-color:rgba(0,0,0,.8);-webkit-border-radius:100px;border-radius:100px;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold}
.conum[data-value] *{color:#fff!important}
.conum[data-value]+b{display:none}
.conum[data-value]:after{content:attr(data-value)}
pre .conum[data-value]{position:relative;top:-.125em}
b.conum *{color:inherit!important}
.conum:not([data-value]):empty{display:none}
h1,h2{letter-spacing:-.01em}
dt,th.tableblock,td.content{text-rendering:optimizeLegibility}
p,td.content{letter-spacing:-.01em}
p strong,td.content strong{letter-spacing:-.005em}
p,blockquote,dt,td.content{font-size:1.0625rem}
p{margin-bottom:1.25rem}
.sidebarblock p,.sidebarblock dt,.sidebarblock td.content,p.tableblock{font-size:1em}
.exampleblock>.content{background-color:#fffef7;border-color:#e0e0dc;-webkit-box-shadow:0 1px 4px #e0e0dc;box-shadow:0 1px 4px #e0e0dc}
.print-only{display:none!important}
@media print{@page{margin:1.25cm .75cm}
*{-webkit-box-shadow:none!important;box-shadow:none!important;text-shadow:none!important}
a{color:inherit!important;text-decoration:underline!important}
a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important}
a[href^="http:"]:not(.bare):after,a[href^="https:"]:not(.bare):after{content:"(" attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em}
abbr[title]:after{content:" (" attr(title) ")"}
pre,blockquote,tr,img{page-break-inside:avoid}
thead{display:table-header-group}
img{max-width:100%!important}
p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3}
h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid}
#toc,.sidebarblock,.exampleblock>.content{background:none!important}
#toc{border-bottom:1px solid #ddddd8!important;padding-bottom:0!important}
.sect1{padding-bottom:0!important}
.sect1+.sect1{border:0!important}
#header>h1:first-child{margin-top:1.25rem}
body.book #header{text-align:center}
body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em 0}
body.book #header .details{border:0!important;display:block;padding:0!important}
body.book #header .details span:first-child{margin-left:0!important}
body.book #header .details br{display:block}
body.book #header .details br+span:before{content:none!important}
body.book #toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important}
body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-break-before:always}
.listingblock code[data-lang]:before{display:block}
#footer{background:none!important;padding:0 .9375em}
#footer-text{color:rgba(0,0,0,.6)!important;font-size:.9em}
.hide-on-print{display:none!important}
.print-only{display:block!important}
.hide-for-print{display:none!important}
.show-for-print{display:inherit!important}}
div.paragraph.indented p {padding-left: 3em;}
div.paragraph.indented2 p {padding-left: 6em;}
div.paragraph.indented3 p {padding-left: 9em;}
</style>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.5.0/css/font-awesome.min.css">
<style>
/* Stylesheet for CodeRay to match GitHub theme | MIT License | http://foundation.zurb.com */
/*pre.CodeRay {background-color:#f7f7f8;}*/
.CodeRay .line-numbers{border-right:1px solid #d8d8d8;padding:0 0.5em 0 .25em}
.CodeRay span.line-numbers{display:inline-block;margin-right:.5em;color:rgba(0,0,0,.3)}
.CodeRay .line-numbers strong{color:rgba(0,0,0,.4)}
table.CodeRay{border-collapse:separate;border-spacing:0;margin-bottom:0;border:0;background:none}
table.CodeRay td{vertical-align: top;line-height:1.45}
table.CodeRay td.line-numbers{text-align:right}
table.CodeRay td.line-numbers>pre{padding:0;color:rgba(0,0,0,.3)}
table.CodeRay td.code{padding:0 0 0 .5em}
table.CodeRay td.code>pre{padding:0}
.CodeRay .debug{color:#fff !important;background:#000080 !important}
.CodeRay .annotation{color:#007}
.CodeRay .attribute-name{color:#000080}
.CodeRay .attribute-value{color:#700}
.CodeRay .binary{color:#509}
.CodeRay .comment{color:#998;font-style:italic}
.CodeRay .char{color:#04d}
.CodeRay .char .content{color:#04d}
.CodeRay .char .delimiter{color:#039}
.CodeRay .class{color:#458;font-weight:bold}
.CodeRay .complex{color:#a08}
.CodeRay .constant,.CodeRay .predefined-constant{color:#008080}
.CodeRay .color{color:#099}
.CodeRay .class-variable{color:#369}
.CodeRay .decorator{color:#b0b}
.CodeRay .definition{color:#099}
.CodeRay .delimiter{color:#000}
.CodeRay .doc{color:#970}
.CodeRay .doctype{color:#34b}
.CodeRay .doc-string{color:#d42}
.CodeRay .escape{color:#666}
.CodeRay .entity{color:#800}
.CodeRay .error{color:#808}
.CodeRay .exception{color:inherit}
.CodeRay .filename{color:#099}
.CodeRay .function{color:#900;font-weight:bold}
.CodeRay .global-variable{color:#008080}
.CodeRay .hex{color:#058}
.CodeRay .integer,.CodeRay .float{color:#099}
.CodeRay .include{color:#555}
.CodeRay .inline{color:#000}
.CodeRay .inline .inline{background:#ccc}
.CodeRay .inline .inline .inline{background:#bbb}
.CodeRay .inline .inline-delimiter{color:#d14}
.CodeRay .inline-delimiter{color:#d14}
.CodeRay .important{color:#555;font-weight:bold}
.CodeRay .interpreted{color:#b2b}
.CodeRay .instance-variable{color:#008080}
.CodeRay .label{color:#970}
.CodeRay .local-variable{color:#963}
.CodeRay .octal{color:#40e}
.CodeRay .predefined{color:#369}
.CodeRay .preprocessor{color:#579}
.CodeRay .pseudo-class{color:#555}
.CodeRay .directive{font-weight:bold}
.CodeRay .type{font-weight:bold}
.CodeRay .predefined-type{color:inherit}
.CodeRay .reserved,.CodeRay .keyword {color:#000;font-weight:bold}
.CodeRay .key{color:#808}
.CodeRay .key .delimiter{color:#606}
.CodeRay .key .char{color:#80f}
.CodeRay .value{color:#088}
.CodeRay .regexp .delimiter{color:#808}
.CodeRay .regexp .content{color:#808}
.CodeRay .regexp .modifier{color:#808}
.CodeRay .regexp .char{color:#d14}
.CodeRay .regexp .function{color:#404;font-weight:bold}
.CodeRay .string{color:#d20}
.CodeRay .string .string .string{background:#ffd0d0}
.CodeRay .string .content{color:#d14}
.CodeRay .string .char{color:#d14}
.CodeRay .string .delimiter{color:#d14}
.CodeRay .shell{color:#d14}
.CodeRay .shell .delimiter{color:#d14}
.CodeRay .symbol{color:#990073}
.CodeRay .symbol .content{color:#a60}
.CodeRay .symbol .delimiter{color:#630}
.CodeRay .tag{color:#008080}
.CodeRay .tag-special{color:#d70}
.CodeRay .variable{color:#036}
.CodeRay .insert{background:#afa}
.CodeRay .delete{background:#faa}
.CodeRay .change{color:#aaf;background:#007}
.CodeRay .head{color:#f8f;background:#505}
.CodeRay .insert .insert{color:#080}
.CodeRay .delete .delete{color:#800}
.CodeRay .change .change{color:#66f}
.CodeRay .head .head{color:#f4f}
</style>
</head>
<body class="book toc2 toc-left">
<div id="header">
<h1>Load and Transform Guide</h1>
<div class="details">
<span id="revnumber">version 2.4.0</span>
</div>
<div id="toc" class="toc2">
<div id="toctitle">Table of Contents</div>
<ul class="sectlevel1">
<li><a href="#_about_this_document">1. About This Document</a>
<ul class="sectlevel2">
<li><a href="#_intended_audience">1.1. Intended Audience</a></li>
<li><a href="#_new_and_changed_information">1.2. New and Changed Information</a></li>
<li><a href="#_notation_conventions">1.3. Notation Conventions</a></li>
<li><a href="#_comments_encouraged">1.4. Comments Encouraged</a></li>
</ul>
</li>
<li><a href="#introduction">2. Introduction</a>
<ul class="sectlevel2">
<li><a href="#introduction-load-methods">2.1. Load Methods</a>
<ul class="sectlevel3">
<li><a href="#introduction-insert-types">2.1.1. Insert Types</a></li>
</ul>
</li>
<li><a href="#_unload">2.2. Unload</a></li>
</ul>
</li>
<li><a href="#tables-indexes">3. Tables and Indexes</a>
<ul class="sectlevel2">
<li><a href="#choose-primary-key">3.1. Choose Primary Key</a></li>
<li><a href="#salting">3.2. Salting</a></li>
<li><a href="#compression-encoding">3.3. Compression and Encoding</a></li>
<li><a href="#create-trafodion-tables-and-indexes">3.4. Create Tables and Indexes</a></li>
<li><a href="#_update_statistics">3.5. Update Statistics</a>
<ul class="sectlevel3">
<li><a href="#_default_sampling">3.5.1. Default Sampling</a></li>
</ul>
</li>
<li><a href="#_generate_single_column_and_multi_column_histograms_from_one_statement">3.6. Generate Single-Column and Multi-Column Histograms From One Statement</a>
<ul class="sectlevel3">
<li><a href="#_enable_update_statistics_automation">3.6.1. Enable Update Statistics Automation</a></li>
<li><a href="#_regenerate_histograms">3.6.2. Regenerate Histograms</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#bulk-load">4. Bulk Load</a>
<ul class="sectlevel2">
<li><a href="#bulk-load-from-trafodion-tables">4.1. Load Data From Trafodion Tables</a>
<ul class="sectlevel3">
<li><a href="#_example">4.1.1. Example</a></li>
</ul>
</li>
<li><a href="#bulk-load-data-from-hdfs-files">4.2. Load Data From HDFS Files</a>
<ul class="sectlevel3">
<li><a href="#_example_2">4.2.1. Example</a></li>
</ul>
</li>
<li><a href="#bulk-load-data-from-hive-tables">4.3. Load Data From Hive Tables</a>
<ul class="sectlevel3">
<li><a href="#_example_3">4.3.1. Example</a></li>
</ul>
</li>
<li><a href="#bulk-load-data-from-external-databases">4.4. Load Data From External Databases</a>
<ul class="sectlevel3">
<li><a href="#_install_required_software">4.4.1. Install Required Software</a></li>
<li><a href="#_sample_sqoop_commands">4.4.2. Sample Sqoop Commands</a></li>
<li><a href="#_example_4">4.4.3. Example</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#trickle-load">5. Trickle Load</a>
<ul class="sectlevel2">
<li><a href="#trickle-load-improving-throughput">5.1. Improving Throughput</a></li>
<li><a href="#trickle-load-odb">5.2. odb</a>
<ul class="sectlevel3">
<li><a href="#trickle-load-odb-throughput">5.2.1. odb Throughput</a></li>
<li><a href="#trickle-load-odb-load">5.2.2. odb Load</a></li>
<li><a href="#trickle-load-odb-copy">5.2.3. odb Copy</a></li>
<li><a href="#trickle-load-odb-extract">5.2.4. odb Extract</a></li>
<li><a href="#trickle-load-odb-transform">5.2.5. odb Transform</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#bulk-unload">6. Bulk Unload</a></li>
<li><a href="#monitor-progress">7. Monitor Progress</a>
<ul class="sectlevel2">
<li><a href="#_insert_and_upsert">7.1. INSERT and UPSERT</a></li>
<li><a href="#_upsert_using_load">7.2. UPSERT USING LOAD</a></li>
<li><a href="#_load">7.3. LOAD</a></li>
</ul>
</li>
<li><a href="#troubleshoot">8. Troubleshoot</a>
<ul class="sectlevel2">
<li><a href="#_improving_throughput">8.1. Improving Throughput</a>
<ul class="sectlevel3">
<li><a href="#_tuplelists_or_rowsets">8.1.1. Tuplelists or Rowsets</a></li>
<li><a href="#_native_hbase_tables">8.1.2. Native HBase Tables</a></li>
<li><a href="#_hive_tables">8.1.3. Hive Tables</a></li>
</ul>
</li>
<li><a href="#_checking_plan_quality">8.2. Checking Plan Quality</a></li>
<li><a href="#_update_statistics_times_out_during_sampling">8.3. UPDATE STATISTICS Times Out During Sampling</a></li>
<li><a href="#_index_creation_takes_too_long">8.4. Index Creation Takes Too Long</a></li>
<li><a href="#large-deletes">8.5. Large Deletes Take Too Long or Error Out</a></li>
<li><a href="#_large_upsert_using_load_on_a_table_with_index_errors_out">8.6. Large UPSERT USING LOAD On a Table With Index Errors Out</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p><strong>License Statement</strong></p>
</div>
<div class="paragraph">
<p>Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file
distributed with this work for additional information regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at <a href="http://www.apache.org/licenses/LICENSE-2.0" class="bare">http://www.apache.org/licenses/LICENSE-2.0</a></p>
</div>
<div class="paragraph">
<p>Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.</p>
</div>
<div style="page-break-after: always;"></div>
<div class="paragraph">
<p><strong>Acknowledgements</strong></p>
</div>
<div class="paragraph">
<p>Microsoft®, Windows®, Windows NT®, Windows® XP, and Windows Vista® are
U.S. registered trademarks of Microsoft Corporation. Intel® and Intel®
Itanium® are trademarks of Intel Corporation in the U.S. and other
countries. Java® is a registered trademark of Oracle and/or its
affiliates. Motif, OSF/1, UNIX®, X/Open®, and the X device is a
trademark of X/Open Company Ltd. in the UK and other countries.</p>
</div>
<div class="paragraph">
<p>OSF, OSF/1, OSF/Motif, Motif, and Open Software Foundation are trademarks of
the Open Software Foundation in the U.S. and other countries.
© 1990, 1991, 1992, 1993 Open Software Foundation, Inc.</p>
</div>
<div class="paragraph">
<p>The OSF documentation and the OSF software to which it relates are derived in
part from materials supplied by the following: © 1987, 1988, 1989
Carnegie-Mellon University. © 1989, 1990, 1991 Digital Equipment
Corporation. © 1985, 1988, 1989, 1990 Encore Computer Corporation. © 1988 Free
Software Foundation, Inc. © 1987, 1988, 1989, 1990, 1991 Hewlett-Packard
Company. © 1985, 1987, 1988, 1989, 1990, 1991, 1992 International
Business Machines Corporation. © 1988, 1989 Massachusetts Institute of
Technology. © 1988, 1989, 1990 Mentat Inc. © 1988 Microsoft Corporation.
© 1987, 1988, 1989, 1990, 1991,
1992 SecureWare, Inc. © 1990, 1991 Siemens Nixdorf Informations systeme
AG. © 1986, 1989, 1996, 1997 Sun Microsystems, Inc. © 1989, 1990, 1991
Transarc Corporation.</p>
</div>
<div class="paragraph">
<p>OSF software and documentation are based in part
on the Fourth Berkeley Software Distribution under license from The
Regents of the University of California. OSF acknowledges the following
individuals and institutions for their role in its development: Kenneth
C.R.C. Arnold, Gregory S. Couch, Conrad C. Huang, Ed James, Symmetric
Computer Systems, Robert Elz. © 1980, 1981, 1982, 1983, 1985, 1986,
1987, 1988, 1989 Regents of the University of California. OSF MAKES NO
WARRANTY OF ANY KIND WITH REGARD TO THE OSF MATERIAL PROVIDED HEREIN,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. OSF shall not be liable for errors
contained herein or for incidental consequential damages in connection
with the furnishing, performance, or use of this material.</p>
</div>
<div style="page-break-after: always;"></div>
<div class="paragraph">
<p><strong>Revision History</strong></p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Version</th>
<th class="tableblock halign-left valign-top">Date</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">2.2.0</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">TBD</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">2.1.0</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">May 1, 2017</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">2.0.1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">July 7, 2016</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">2.0.0</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">June 6, 2016</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">1.3.0</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">January, 2016</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect1">
<h2 id="_about_this_document">1. About This Document</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This guide describes how to load and transform data into a Trafodion database.</p>
</div>
<div class="paragraph">
<p>The information herein is complementary to the following Trafodion documentation:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html">Trafodion SQL Reference Manual</a></p>
</li>
<li>
<p><a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a></p>
</li>
</ul>
</div>
<div class="sect2">
<h3 id="_intended_audience">1.1. Intended Audience</h3>
<div class="paragraph">
<p>This guide targets anyone wanting to load data into a Trafodion database.</p>
</div>
<div class="paragraph">
<p>You need to have skills in the following areas to make full use of the information in this guide:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>SQL DDL and DML.</p>
</li>
<li>
<p>Installation and configuration of Linux software.</p>
</li>
<li>
<p>Trafodion administration.</p>
</li>
<li>
<p>Depending on your data source, Java and/or Hadoop ecosystem usage.</p>
</li>
</ul>
</div>
</div>
<div class="sect2">
<h3 id="_new_and_changed_information">1.2. New and Changed Information</h3>
<div class="paragraph">
<p>This is a new guide.</p>
</div>
<div style="page-break-after: always;"></div>
</div>
<div class="sect2">
<h3 id="_notation_conventions">1.3. Notation Conventions</h3>
<div class="paragraph">
<p>This list summarizes the notation conventions for syntax presentation in this manual.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>UPPERCASE LETTERS</p>
<div class="paragraph">
<p>Uppercase letters indicate keywords and reserved words. Type these items exactly as shown. Items not enclosed in brackets are required.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">SELECT</code></pre>
</div>
</div>
</li>
<li>
<p>lowercase letters</p>
<div class="paragraph">
<p>Lowercase letters, regardless of font, indicate variable items that you supply. Items not enclosed in brackets are required.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">file-name</code></pre>
</div>
</div>
</li>
<li>
<p>&#91; &#93; Brackets</p>
<div class="paragraph">
<p>Brackets enclose optional syntax items.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">DATETIME [start-field TO] end-field</code></pre>
</div>
</div>
<div class="paragraph">
<p>A group of items enclosed in brackets is a list from which you can choose one item or none.</p>
</div>
<div class="paragraph">
<p>The items in the list can be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines.</p>
</div>
<div class="paragraph">
<p>For example:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">DROP SCHEMA schema [CASCADE]
DROP SCHEMA schema [ CASCADE | RESTRICT ]</code></pre>
</div>
</div>
</li>
<li>
<p>{ } Braces</p>
<div class="paragraph">
<p>Braces enclose required syntax items.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">FROM { grantee [, grantee ] ... }</code></pre>
</div>
</div>
<div class="paragraph">
<p>A group of items enclosed in braces is a list from which you are required to choose one item.</p>
</div>
<div class="paragraph">
<p>The items in the list can be arranged either vertically, with aligned braces on each side of the list, or horizontally, enclosed in a pair of braces and separated by vertical lines.</p>
</div>
<div class="paragraph">
<p>For example:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">INTERVAL { start-field TO end-field }
{ single-field }
INTERVAL { start-field TO end-field | single-field }</code></pre>
</div>
</div>
</li>
<li>
<p>| Vertical Line</p>
<div class="paragraph">
<p>A vertical line separates alternatives in a horizontal list that is enclosed in brackets or braces.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">{expression | NULL}</code></pre>
</div>
</div>
</li>
<li>
<p>&#8230; Ellipsis</p>
<div class="paragraph">
<p>An ellipsis immediately following a pair of brackets or braces indicates that you can repeat the enclosed sequence of syntax items any number of times.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">ATTRIBUTE[S] attribute [, attribute] ...
{, sql-expression } ...</code></pre>
</div>
</div>
<div class="paragraph">
<p>An ellipsis immediately following a single syntax item indicates that you can repeat that syntax item any number of times.</p>
</div>
<div class="paragraph">
<p>For example:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">expression-n ...</code></pre>
</div>
</div>
</li>
<li>
<p>Punctuation</p>
<div class="paragraph">
<p>Parentheses, commas, semicolons, and other symbols not previously described must be typed as shown.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">DAY (datetime-expression)
@script-file</code></pre>
</div>
</div>
<div class="paragraph">
<p>Quotation marks around a symbol such as a bracket or brace indicate the symbol is a required character that you must type as shown.</p>
</div>
<div class="paragraph">
<p>For example:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">&quot;{&quot; module-name [, module-name] ... &quot;}&quot;</code></pre>
</div>
</div>
</li>
<li>
<p>Item Spacing</p>
<div class="paragraph">
<p>Spaces shown between items are required unless one of the items is a punctuation symbol such as a parenthesis or a comma.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">DAY (datetime-expression) DAY(datetime-expression)</code></pre>
</div>
</div>
<div class="paragraph">
<p>If there is no space between two items, spaces are not permitted. In this example, no spaces are permitted between the period and any other items:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">myfile.sh</code></pre>
</div>
</div>
</li>
<li>
<p>Line Spacing</p>
<div class="paragraph">
<p>If the syntax of a command is too long to fit on a single line, each continuation line is indented three spaces and is separated from the preceding line by a blank line.</p>
</div>
<div class="paragraph">
<p>This spacing distinguishes items in a continuation line from items in a vertical list of selections.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">match-value [NOT] LIKE _pattern
[ESCAPE esc-char-expression]</code></pre>
</div>
</div>
</li>
</ul>
</div>
</div>
<div class="sect2">
<h3 id="_comments_encouraged">1.4. Comments Encouraged</h3>
<div class="paragraph">
<p>We encourage your comments concerning this document. We are committed to providing documentation that meets your
needs. Send any errors found, suggestions for improvement, or compliments to <a href="mailto:user@trafodion.apache.org">user@trafodion.apache.org</a>.</p>
</div>
<div class="paragraph">
<p>Include the document title and any comment, error found, or suggestion for improvement you have concerning this document.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="introduction">2. Introduction</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="introduction-load-methods">2.1. Load Methods</h3>
<div class="paragraph">
<p>There are two methods used to load data into a Trafodion table. Both methods can run while the database is concurrently queried:</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 20%;">
<col style="width: 40%;">
<col style="width: 40%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Type</th>
<th class="tableblock halign-left valign-top">Description</th>
<th class="tableblock halign-left valign-top">Methods/Tools</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Bulk Load</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Large data volumes<br>
Stage data and load in the batches</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Trafodion Bulk Loader</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Trickle Load</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Small data volumes<br>
Insert data as it arrives</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">ETL tool<br>
Custom ODBC/JDBC application<br>
User-Defined Functions<br>
odb Tool</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>These two methods use four types of SQL insert statements</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Bulk Load</strong></p>
<div class="ulist">
<ul>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#load_statement">LOAD</a></p>
</li>
</ul>
</div>
</li>
<li>
<p><strong>Trickle Load</strong></p>
<div class="ulist">
<ul>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#insert_statement">INSERT</a></p>
</li>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT</a></p>
</li>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT USING LOAD</a></p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>The <a href="http://trafodion.apache.org/docs/sql_reference/index.html">Trafodion SQL Reference Manual</a>
provides syntax descriptions for these statements.</p>
</div>
<div class="paragraph">
<p>The data source defines what type of load approach and method you use:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Bulk Load</strong> (LOAD statement)</p>
<div class="ulist">
<ul>
<li>
<p><em>Text Files</em>: Map an external Hive table.</p>
</li>
<li>
<p><em>JDBC-Compliant Database</em>: Load into Hive on the Trafodion cluster using <code>sqoop</code>.</p>
</li>
<li>
<p><em>Hive Tables</em>: Direct load.</p>
</li>
<li>
<p><em>Native HBase Tables</em>: Direct load.</p>
</li>
<li>
<p><em>Disparate Data Source</em>: Write Java/C++ UDF to read data from source and pass rows to LOAD.</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div style="page-break-after: always;"></div>
<div class="ulist">
<ul>
<li>
<p><strong>Trickle Load</strong> (odb utility)</p>
<div class="ulist">
<ul>
<li>
<p><em>Text Files</em>: Direct access</p>
</li>
<li>
<p><em>pipes</em>: Via <code>stdin</code></p>
</li>
<li>
<p><em>ODBC-Compliant Database</em>: odb COPY command, no intermediate storage</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>For more information, refer to:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="#bulk-load">Bulk Load</a></p>
</li>
<li>
<p><a href="#trickle-load">Trickle Load</a></p>
</li>
</ul>
</div>
<div style="page-break-after: always;"></div>
<div class="sect3">
<h4 id="introduction-insert-types">2.1.1. Insert Types</h4>
<div class="paragraph">
<p>The following insert types are supported in Trafodion:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><code>INSERT INTO T &#8230;</code></p>
</li>
<li>
<p><code>UPSERT INTO T &#8230;</code></p>
</li>
<li>
<p><code>UPSERT USING LOAD INTO T &#8230;</code></p>
</li>
<li>
<p><code>LOAD INTO T &#8230;</code></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The following table compares the different insert types:</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Characteristic</th>
<th class="tableblock halign-left valign-top">INSERT</th>
<th class="tableblock halign-left valign-top">UPSERT</th>
<th class="tableblock halign-left valign-top">UPSERT USING LOAD</th>
<th class="tableblock halign-left valign-top">LOAD</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Transaction</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Yes</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Yes</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">No, uses HBase WAL for recovery</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">No, uses snapshot for recovery</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Method of Operation</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Uses the standard HBase write path through its <code>CheckAndPut</code> call. Rows are held in transaction co-processor memory until the transaction is committed.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Uses the standard HBase write path through its <code>Put</code> call.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Uses the standard HBase write path through its <code>Put</code> call.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Uses the HBase bulk load write path and creates HFiles directly, bypassing HBase RegionServers for most of its operation.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Uniqueness Constraint</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Enforced</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Not enforced. New row with the same key value overwrites previous row.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Not enforced. New row with same key value overwrites the previous row.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Enforced only within the set of rows in a single statement. Not enforced with rows already in the table.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Index</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Can be used on a table with an index.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Can be used on a table with an index.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">When used on a table with an index, it reverts to UPSERT.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Can be used on a table with an index. Index is off-line during the LOAD.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Max Size/Invocation</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">10,000 * n<sup>1</sup> rows</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">10,000 * n<sup>1</sup> rows</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5 million * n<sup>1</sup> rows</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">2 billion * n<sup>1</sup> rows</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Min Size/Invocation</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">1 row</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">1 row</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">1 row</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Suitable for greater than 1 million * n<sup>1</sup> rows</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Speed</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Slowest</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Faster than INSERT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Faster than UPSERT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Fastest</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><sup>1</sup> <strong>n</strong> is the number of nodes in each invocation.</p>
</div>
<div class="paragraph">
<p>Throughput, max/min sizes depends on multiple factors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Format of rows in Trafodion table (aligned format or not).</p>
</li>
<li>
<p>Length of row.</p>
</li>
<li>
<p>Number of columns in row.</p>
</li>
<li>
<p>Data type of columns.</p>
</li>
<li>
<p>Network between nodes in cluster.</p>
</li>
<li>
<p>WAL setting.</p>
</li>
<li>
<p>Number of clients.</p>
</li>
<li>
<p>Use of rowsets.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_unload">2.2. Unload</h3>
<div class="paragraph">
<p>The Trafodion UNLOAD statement exports data from Trafodion tables into an HDFS directory. Refer to <a href="#bulk-unload">Bulk Unload</a> for more information.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="tables-indexes">3. Tables and Indexes</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The following guidance helps you set up your tables and indexes for better load performance.</p>
</div>
<div class="sect2">
<h3 id="choose-primary-key">3.1. Choose Primary Key</h3>
<div class="paragraph">
<p>The primary key for a Trafodion table must be chosen based on the workload that accesses the table.</p>
</div>
<div class="paragraph">
<p>Keyed access to Trafodion tables is very efficient since HBase is a key-value store. You need to analyze the queries
that are used to access the tables to understand their predicates and join conditions. Once identified, you can
choose a primary key that ensures that the leading key columns have highly selective predicates applied to them.</p>
</div>
<div class="paragraph">
<p>This technique limits the number of rows that need to scanned in the HBase. Trafodion uses MDAM (Multi Dimensional Access Method) to limit
the rows scanned when predicates are present to only trailing key columns and not the leading key column. MDAM works best when the
unique entry count of leading key columns (on which predicates are absent) is low.</p>
</div>
</div>
<div class="sect2">
<h3 id="salting">3.2. Salting</h3>
<div class="paragraph">
<p>With range partitioned data in some workloads, certain key ranges of data may see more access than other key ranges. This can lead to an
unbalanced usage pattern with some HBase RegionServers handling most of the load. This behavior is referred to as "hot-spotting."</p>
</div>
<div class="paragraph">
<p>With Native HBase tables, hot-spotting is often addressed by designing appropriate keys. In Trafodion, once you choose the key to a table, as
discussed in <a href="#choose-primary-key">Choose Primary Key</a>, you can use <strong>salting</strong> to distribute the data evenly. Salting applies a
hash function to the salt keys and distributes data to partitions based on this hash value. The hash value is physically stored in the
table as the leading key value. Each split of the table will have only one salt key value.</p>
</div>
<div class="paragraph">
<p>The salting key can be any subset (including the whole set) of the primary key. It is a good practice to keep the salting key as small
as possible. The key should provide an even distribution of data, which can be achieved when the key values have a large unique entry
count and no significant skew.</p>
</div>
<div class="paragraph">
<p>The number of partitions must also be specified during table creation. You choose the number of partition depending on the size of the
cluster and the expected size of the table. A salted table can split if more data is added to it than initially estimated. If this
happens, then more than one partition having rows with the same salt value, which may result in suboptimal execution plans for the table.</p>
</div>
<div style="page-break-after: always;"></div>
<div class="paragraph">
<p>You can also choose not to salt Trafodion tables. This is similar to range partitioning in a traditional database. The number of partitions
grows with the size of the table, and range boundaries are determined by HBase based on the specified split policy.</p>
</div>
</div>
<div class="sect2">
<h3 id="compression-encoding">3.3. Compression and Encoding</h3>
<div class="paragraph">
<p>Large Trafodion tables must be encoded and compressed. Trafodion tables that have a large key or several columns grow in size to 10X or more
when compared to a Hive table with equivalent data since HBase stores the key separately for every column in a row.</p>
</div>
<div class="paragraph">
<p>HBase provides several types of encoding to avoid storing the same key value to disk for every column in the row. HBase also supports various
types of compression of the entire data block, regardless whether it is encoded or not.
See <a href="http://hbase.apache.org/book.html#compression">Appendix E: Compression and Data Block Encoding In HBase</a> in the
<a href="http://hbase.apache.org/book.html">Apache HBase Reference Guide</a> for a comparison of various compression and encoding algorithms. Use the
information in the <a href="http://hbase.apache.org/book.html#data.block.encoding.types">Which Compressor or Data Block Encoder To Use</a> section to
determine the best compression technique for your tables.
&lt;&lt;&lt;</p>
</div>
</div>
<div class="sect2">
<h3 id="create-trafodion-tables-and-indexes">3.4. Create Tables and Indexes</h3>
<div class="paragraph">
<p>Create Trafodion tables using the CREATE TABLE statements with the <code>SALT USING &lt;num&gt; PARTITIONS</code> clause for salting and
the <code>HBASE_OPTIONS</code> clause for compression and encoding.</p>
</div>
<div class="paragraph">
<p><strong>Example</strong></p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CREATE TABLE trafodion.sch.demo
( demo_sk INT NOT NULL
, name VARCHAR(100)
, PRIMARY KEY (demo_sk)
)
HBASE_OPTIONS
( DATA_BLOCK_ENCODING = 'FAST_DIFF'
, COMPRESSION = 'SNAPPY'
, MEMSTORE_FLUSH_SIZE = '1073741824'
)
SALT USING 8 PARTITIONS ON (demo_sk);</code></pre>
</div>
</div>
<div class="paragraph">
<p>ANY indexes on the table may be salted or not. However, if they are salted, their salting key and number of partitions must be the same as the table.</p>
</div>
<div class="paragraph">
<p><strong>Example</strong></p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CREATE INDEX demo_ix ON sch.demo(name)
HBASE_OPTIONS
( DATA_BLOCK_ENCODING = 'FAST_DIFF'
, COMPRESSION = 'GZ'
)
SALT LIKE TABLE;</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_update_statistics">3.5. Update Statistics</h3>
<div class="paragraph">
<p>To generate good plans that allow queries to execute quickly and use resources wisely, the Trafodion Optimizer must have a good idea about how the
values of columns are distributed, the number of distinct values, and so on. Trafodion supplies this information to the optimizer in the
form of histograms generated by executing the UPDATE STATISTICS statement. See the
<a href="http://trafodion.apache.org/docs/sql_reference/index.html#update_statistics_statement">Trafodion SQL Reference Manual</a> for a full
description of this statement.</p>
</div>
<div class="sect3">
<h4 id="_default_sampling">3.5.1. Default Sampling</h4>
<div class="paragraph">
<p>While accurate statistics are important, the time required to generate them by reading every row in the table may be prohibitive and is
usually unnecessary. Random sampling of the rows of the table can give adequate results in a fraction of the time required to read all
the values in the table. For most situations, the best option is to simply specify SAMPLE at the end of the UPDATE STATISTICS statement,
which will use the default sampling protocol. For example, to use default sampling in the construction of histograms for each column of
table T1, you would execute the following statement:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t1 ON EVERY COLUMN SAMPLE;</code></pre>
</div>
</div>
<div class="paragraph">
<p>This default sampling protocol uses a high sampling rate for small tables, reducing the rate with a steep gradient until hitting 1% and
capping the sample size at one million rows. The specific details of default sampling are as follows:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Use the full table for tables up to 10,000 rows.</p>
</li>
<li>
<p>For table sizes from 10,000 up to a million rows, 10,000 rows are randomly sampled. In effect, this causes the sampling rate to decline
from 100% to 1% as a function of increasing table size.</p>
</li>
<li>
<p>For tables with one million to 100 million rows, use a 1% random sample.</p>
</li>
<li>
<p>For tables exceeding 100 million rows, the sampling rate is calculated as 1 million divided by the number of rows in the table.
This limits the overall sample size to 1 million rows while ensuring uniform random sampling across the entire table.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_generate_single_column_and_multi_column_histograms_from_one_statement">3.6. Generate Single-Column and Multi-Column Histograms From One Statement</h3>
<div class="paragraph">
<p>If you use the ON EVERY COLUMN syntax in an UPDATE STATISTICS statement, then it is important to realize that multi-column histograms can be
requested in the same statement. For example, if you wanted to generate a histogram for each single column of table T1, as well as
multi-column histograms for column sets (c1, c2) and (c5, c6, c7), then you could use the following statement:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t1 ON EVERY COLUMN, (c1,c2), (c5,c6,c7) SAMPLE;</code></pre>
</div>
</div>
<div class="paragraph">
<p>In terms of the end result, this is equivalent to the following pair of statements:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t1 ON EVERY COLUMN SAMPLE;
UPDATE STATISTICS FOR TABLE t1 ON (c1, c2), (c5, c6, c7) SAMPLE;</code></pre>
</div>
</div>
<div class="paragraph">
<p>However, the performance is superior when they are combined into a single statement because a multi-column histogram depends
on the single-column histograms of its component columns. Therefore, separating the generation of single-column and multi-column histograms
for a table into two statements leads to redundantly calculating some of the single-column histograms. Even though the
relevant single-column histograms already exist, they are recomputed at the time the multi-column histograms are generated.</p>
</div>
<div class="sect3">
<h4 id="_enable_update_statistics_automation">3.6.1. Enable Update Statistics Automation</h4>
<div class="paragraph">
<p>If a standard set of queries is run on a regular basis, then one way to generate only those histograms that are needed for efficient execution
of those queries is to enable update statistics automation, and then PREPARE each of the queries:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CONTROL QUERY DEFAULT USTAT_AUTOMATION_INTERVAL '1440';
PREPARE s FROM SELECT...;</code></pre>
</div>
</div>
<div class="paragraph">
<p>The value of the CQD USTAT_AUTOMATION_INTERVAL is intended to determine the automation interval (in minutes) for update statistics
automation. The PREPARE statement causes the Trafodion Compiler to compile and optimize a query without executing it. In the process
of doing so with automation enabled, any histograms needed by the optimizer that are missing causes those columns to be marked
as needing histograms. Then, the following UPDATE STATISTICS statement can be run against each table to generate the needed histograms:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE &lt;table-name&gt; ON NECESSARY COLUMNS SAMPLE;</code></pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_regenerate_histograms">3.6.2. Regenerate Histograms</h4>
<div class="paragraph">
<p>Histograms can become "stale" as the underlying data changes and possibly reflects a different distribution of values, although
it is possible that data turnover or accumulation can be high while maintaining the same distribution. To ensure that statistics
remain accurate, you should regenerate histograms for a table once significant changes have been made to that table since its
histograms were last generated. To refresh existing histograms without adding new ones, use the following statement:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE &lt;table-name&gt; ON EXISTING COLUMNS SAMPLE;</code></pre>
</div>
</div>
<div class="paragraph">
<p>The critical set of histograms that were previously generated with the ON NECESSARY COLUMNS syntax can be periodically regenerated
using ON EXISTING COLUMNS. Note that using ON NECESSARY COLUMNS will only identify those columns that have been previously
requested by the optimizer but do not exist. The current implementation of automation does not know which existing histograms might be stale.</p>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="bulk-load">4. Bulk Load</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The LOAD statement enables batch loading large volumes of data efficiently in a scalable manner.</p>
</div>
<div class="paragraph">
<p>See the <a href="http://trafodion.apache.org/docs/sql_reference/index.html#load_statement">Trafodion SQL Reference Manual</a>
for a full description of this SQL statement.</p>
</div>
<div class="paragraph">
<p>You can bulk-load data using one of the following methods:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="#bulk-load-data-from-trafodion-tables">Load Data From Trafodion Tables</a></p>
</li>
<li>
<p><a href="#bulk-load-data-from-hdfs-files">Load Data From HDFS Files</a></p>
</li>
<li>
<p><a href="#bulk-load-data-from-hive-tables">Load Data From Hive Tables</a></p>
</li>
<li>
<p><a href="#bulk-load-data-from-external-databases">Load Data From External Databases</a></p>
</li>
</ul>
</div>
<div class="sect2">
<h3 id="bulk-load-from-trafodion-tables">4.1. Load Data From Trafodion Tables</h3>
<div class="paragraph">
<p>You copy data between two Trafodion tables by using the appropriate SELECT statement in the LOAD command.</p>
</div>
<div class="sect3">
<h4 id="_example">4.1.1. Example</h4>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">LOAD INTO target_table SELECT * FROM source_table WHERE custkey &gt;= 1000 ;</code></pre>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="bulk-load-data-from-hdfs-files">4.2. Load Data From HDFS Files</h3>
<div class="paragraph">
<p>You copy your data (local or remote) into an HDFS folder. Then, you create an external Hive table (with correct fields) that points
to the HDFS folder containing the data. You may also specify a WHERE clause on the source data as a filter, if needed.
See the <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExternalTables">External Tables</a>
page on the <a href="https://cwiki.apache.org/confluence/display/Hive">Hive Wiki</a> for more information.</p>
</div>
<div class="paragraph">
<p>Trafodion can access columns in Hive tables having integer, string and char types.
See the <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types">LanguageManual Types</a>
page on the <a href="https://cwiki.apache.org/confluence/display/Hive">Hive Wiki</a> for the data types available in Hive.</p>
</div>
<div class="paragraph">
<p>Overall, you do the following:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Export the data on the local or remote cluster.</p>
</li>
<li>
<p>If applicable, transport files to Trafodion cluster via FTP, scp, or some other method.</p>
</li>
<li>
<p>Use LOAD referencing HIVE external tables.</p>
</li>
</ol>
</div>
<div class="sect3">
<h4 id="_example_2">4.2.1. Example</h4>
<div class="paragraph">
<p>You have a customer-demographics in a text file, which you need to load into Trafodion.
The columns are separated by <code>|</code>.</p>
</div>
<div class="paragraph">
<p>Do the following:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Using trafci, define the Trafodion table where you want to load the data.</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CREATE TABLE customer_demographics_salt
(
cd_demo_sk INT NOT NULL
, cd_gender CHAR(1)
, cd_marital_status CHAR(1)
, cd_education_status CHAR(20)
, cd_purchase_estimate INT
, cd_credit_rating CHAR(10)
, cd_dep_count INT
, cd_dep_employed_count INT
, cd_dep_college_count INT
, PRIMARY KEY (cd_demo_sk)
)
SALT USING 4 PARTITIONS ON (cd_demo_sk)
;</code></pre>
</div>
</div>
</li>
<li>
<p>Copy the data into HDFS</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">hadoop fs -copyFromLocal $HOME/data/customer_demographics /hive/tpcds/customer_demographics</code></pre>
</div>
</div>
</li>
<li>
<p>Using the Hive shell, create an external Hive table:</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CREATE EXTERNAL TABLE customer_demographics
(
cd_demo_sk INT
, cd_gender STRING
, cd_marital_status STRING
, cd_education_status STRING
, cd_purchase_estimate INT
, cd_credit_rating STRING
, cd_dep_count INT
, cd_dep_employed_count INT
, cd_dep_college_count INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION '/hive/tpcds/customer_demographics'
;</code></pre>
</div>
</div>
</li>
<li>
<p>Using trafci, load the Trafodion <code>customer_demographics_salt</code> table from the Hive table named
<code>hive.hive.customer_demographics</code>:</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">&gt;&gt;LOAD INTO customer_demographics_salt
+&gt;SELECT * FROM hive.hive.customer_demographics WHERE cd_demo_sk &lt;= 5000;
Task: LOAD Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: DISABLE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: DISABLE INDEX Status: Ended Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: PREPARATION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Rows Processed: 5000
Task: PREPARATION Status: Ended ET: 00:00:03.199
Task: COMPLETION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: COMPLETION Status: Ended ET: 00:00:00.331
Task: POPULATE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: POPULATE INDEX Status: Ended ET: 00:00:05.262</code></pre>
</div>
</div>
</li>
</ol>
</div>
</div>
</div>
<div class="sect2">
<h3 id="bulk-load-data-from-hive-tables">4.3. Load Data From Hive Tables</h3>
<div class="paragraph">
<p>You can import data from Hive using the trafci or sqlci command interface. Do the following:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Set these required Control Query Defaults (CQDs) to improve load performance:</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CQD HIVE_MAX_STRING_LENGTH '1000'; -- if the widest column is 1KB</code></pre>
</div>
</div>
<div class="paragraph">
<p>This setting is required if there are time-related column types in the target Trafodion table.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'on';</code></pre>
</div>
</div>
</li>
<li>
<p>Issue the LOAD statement to load data into Trafodion tables from Hive. For example:</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">LOAD WITH NO POPULATE INDEXES INTO trafodion.sch.demo SELECT * FROM hive.hive.demo;</code></pre>
</div>
</div>
</li>
</ol>
</div>
<div class="paragraph">
<p>See the <a href="http://trafodion.apache.org/docs/sql_reference/index.html#load_statement">Trafodion SQL Reference Manual</a>.
for the complete syntax of the LOAD statement.</p>
</div>
<div class="paragraph">
<p>If you use multiple LOAD statements to incrementally load sets of data into a single target table, then several HFiles are created
for each partition of the target table. This causes inefficient access during SELECT queries and may also cause a compaction
to be triggered based on the policies configured in the HBase settings.</p>
</div>
<div class="paragraph">
<p>To avoid this issue, it is good practice to perform a major compaction on a table that has been the target of more than two LOAD statements
in a short period of time. To perform compaction, use this <code>hbase shell</code> command:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">major_compact 'TRAFODION.SCH.DEMO'</code></pre>
</div>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
The <code>major_compact</code> command returns immediately since it&#8217;s not waited. Typically, compaction of a large table takes a long time
(several minutes to hours) to complete. You can monitor the progress of compaction from the HBase Master Web user interface.
</td>
</tr>
</table>
</div>
<div class="sect3">
<h4 id="_example_3">4.3.1. Example</h4>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">&gt;&gt; CQD HIVE_MAX_STRING_LENGTH '1000' ;
&gt;&gt; CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'on' ;
&gt;&gt; LOAD WITH NO POPULATE INDEXES INTO trafodion.sch.demo SELECT * FROM hive.hive.demo ;</code></pre>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="bulk-load-data-from-external-databases">4.4. Load Data From External Databases</h3>
<div class="paragraph">
<p>You need to import data into Hive when loading data from external databases.
Use <a href="http://sqoop.apache.org/">Apache Sqoop</a>, an open-source tools to move the data from the external database
into Hive tables on the Trafodion cluster.</p>
</div>
<div class="paragraph">
<p>Source data can be in the following formats:</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 40%;">
<col style="width: 60%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Format</th>
<th class="tableblock halign-left valign-top">Examples</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Structured</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Relational databases such as Oracle or MySQL.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Semi-Structured</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Cassandra or HBase</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Unstructured</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">HDFS</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>You use the Sqoop command-line shell for interactive commands and basic scripting.</p>
</div>
<div class="paragraph">
<p>Sqoop basics:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Generic JDBC Connector: supports JDBC T-4 Driver.</p>
</li>
<li>
<p>Configuration Language for FROM/TO jobs that specify in SQL terms.</p>
</li>
<li>
<p>Partitioner: Divide/parallelize the data streams; uses primary key by default.</p>
</li>
<li>
<p>Extractor: Uses FROM configuration for SQL statements, plus partitioner information to query data subsets.</p>
</li>
<li>
<p>Loader: Uses TO job configuration; INSERT INTO could be generated from col list or explicitly specified.</p>
</li>
<li>
<p>Destroyer: Copies staging table to final table and deletes staging table.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>See the <a href="http://sqoop.apache.org/docs/1.99.6/Sqoop5MinutesDemo.html">Sqoop 5 Minutes Demo</a> for a quick
introduction to Sqoop.</p>
</div>
<div class="sect3">
<h4 id="_install_required_software">4.4.1. Install Required Software</h4>
<div class="paragraph">
<p>By default, Sqoop is not installed on Trafodion clusters. Do the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Install and start Sqoop on the Trafodion cluster using either the Ambari or Cloudera Manager GUI.
See the <a href="http://sqoop.apache.org/docs/1.99.3/Installation.html">Sqoop installation instructions</a>.</p>
</li>
<li>
<p>Install <a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html">JDK 1.8</a></p>
</li>
<li>
<p>Install the <a href="http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html">Oracle JDBC driver</a></p>
</li>
<li>
<p>Set the following environment variables:</p>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">export JAVA_HOME=/opt/java/jdk1.8.0_11
export JAVA_OPTIONS=-Dmapred.child.java.opts=\-Djava.security.egd=file:/dev/urandom+</code></pre>
</div>
</div>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_sample_sqoop_commands">4.4.2. Sample Sqoop Commands</h4>
<div class="sect4">
<h5 id="_list_all_oracle_tables">List All Oracle Tables</h5>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">sqoop list-tables --driver oracle.jdbc.OracleDriver
--connect jdbc:oracle:thin:@&lt;Oracle host name&gt;:&lt;port&gt;/&lt;database&gt;
--username &lt;user-name&gt; --password &lt;password&gt;</code></pre>
</div>
</div>
</div>
<div class="sect4">
<h5 id="_import_data_to_hive">Import Data to Hive</h5>
<div class="paragraph">
<p><strong>Syntax</strong></p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">sqoop import --connect jdbc:oracle:thin:@&lt;Oracle host name:port&gt;/&lt;database&gt;
--username &lt;user-name&gt; --password &lt;password&gt; --table &lt;tablename&gt;
--split-by &lt;column-name&gt; --hive-import --create-hive-table
--hive-table &lt;hive-table-name&gt; --hive-overwrite --null-string ''
--null-non-string '' --hive-drop-import-delims--verbose</code></pre>
</div>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 40%;">
<col style="width: 60%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Parameter</th>
<th class="tableblock halign-left valign-top">Guidelines</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--split-by &lt;column-name&gt;</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">By default, if not specified, sqoop uses the primary key column as a splitting column, which is not optimal most of the time.
If the table does not contain a primary key, then you must manually specify the splitting column.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--null-string &lt;null-string&gt;</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This is the string to be written for a null value in a string column.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--null-non-string &lt;null-string&gt;</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This is the string to be written for a null value in a non-string column.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--hive-drop-import-delims</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This drops <code>\n</code>, <code>\r</code>, and <code>\01</code> string fields when importing to Hive.<br>
<br>
<strong>NOTE</strong>: If the data contains \n or \r and if you do not use the hive-drop-import-delims option, then data is truncated.
You need to use additional Sqoop options during migration by specifying the delimiter that you would like to use,
which does not exist in the data itself.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect3">
<h4 id="_example_4">4.4.3. Example</h4>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">sqoop import --connect jdbc:oracle:thin:@localhost:1521/orcl
--username trafdemo --password traf123 --table CUSTOMER
--split-by CUSTNUM --hive-import --create-hive-table
--hive-table customers --hive-overwrite --null-string ''
--null-non-string '' --hive-drop-import-delims--verbose</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="trickle-load">5. Trickle Load</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Trafodion Trickle Load allows data to be committed in batches, with sizes ranging from 1 row to a several
thousand rows in each commit. Trickle Load uses the following SQL statements (defined in the
<a href="http://trafodion.apache.org/docs/sql_reference/index.html">Trafodion SQL Reference Manual</a>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#insert_statement">INSERT</a></p>
</li>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT</a></p>
</li>
<li>
<p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT USING LOAD</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Contrary to <a href="#bulk-load">Bulk Load</a>, committed rows are immediately visible from other transactions
thereby leading to minimal latency in making newly ingested rows visible to applications and end users.</p>
</div>
<div class="paragraph">
<p>You use Trickle Load in the following situations:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Inserting and/or updating data on an ongoing basis. Typically, you create a custom JDBC or ODBC
application for this approach.</p>
</li>
<li>
<p>You want to migrate a smaller amount of data (a few millions rows). Typically, you use JDBC- or
ODBC-based ETL tools for this approach; for example:</p>
<div class="ulist">
<ul>
<li>
<p><a href="#trickle-load-odb">Trafodion odb<sup>1</sup></a></p>
</li>
<li>
<p><a href="http://squirrel-sql.sourceforge.net">SQuirrel-SQL</a></p>
</li>
<li>
<p><a href="http://www.pentaho.com/">Pentaho</a></p>
</li>
<li>
<p><a href="http://www.informatica.com/us/">Informatica</a>.</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p><sup>1</sup> Trafodion obd typically achieves better load throughput than third-party ETL tools.</p>
</div>
<div class="sect2">
<h3 id="trickle-load-improving-throughput">5.1. Improving Throughput</h3>
<div class="paragraph">
<p>Trickle Load uses the HBase write path, with every row being written to the WAL (Write-Ahead Log) and HBase MemStore.
When memstore is full data is flushed to HStorefile in background.</p>
</div>
<div class="paragraph">
<p>Throughput can be improved by use of:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Rowsets or Batch Updates.</p>
</li>
<li>
<p>UPSERT instead of INSERT statements, if applicable.</p>
</li>
<li>
<p>Multiple simultaneous connections.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>In addition, when using INSERT or UPSERT USING LOAD with the objective of maximizing data ingestion throughput,
increasing the HBase table attribute MEMSTORE_FLUSHSIZE from its default value helps.</p>
</div>
<div class="paragraph">
<p>The actual value you use depends on the heap size allocated to each Region Server, the concurrent query workload, and the
number of tables for which simultaneous fast data ingestion is needed. With a heap size of 31 GB for each Region Server in
an environment with heavy concurrent query workload, setting this attribute 1 GB gives good performance.</p>
</div>
<div class="paragraph">
<p>You can specify this attribute in the HBASE_OPTIONS clause when creating the table. Alternatively, you can also set it from
the <code>hbase shell</code> through an <code>ALTER 'TRAFODION.&lt;schema-name&gt;.&lt;table-name&gt;', MEMSTORE_FLUSHSIZE &#62;&#61; '1073741824'</code> command.</p>
</div>
</div>
<div class="sect2">
<h3 id="trickle-load-odb">5.2. odb</h3>
<div class="paragraph">
<p>odb is a Linux and Windows Trafodion client that is:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>ODBC based</p>
</li>
<li>
<p>Database agnostic query driver</p>
</li>
<li>
<p>Query interpreter</p>
</li>
<li>
<p>Loader and extractor</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>odb may be installed on:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The Trafodion cluster.</p>
</li>
<li>
<p>The machine that contains source data</p>
</li>
<li>
<p>An intermediate machine that is being used for data loading.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Source data can be in any database (local or remote) that supports ODBC or in flat files local to the machine hosting
the odb tool.</p>
</div>
<div class="paragraph">
<p>odd uses threads to achieve parallelism, rowsets to improve throughput. You can specify INSERT, UPSERT or UPSERT USING LOAD
insert types.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
odb does not use the bulk load command LOAD, and, therefore, throughput when using odb may be lower than what can be achieved
with the bulk loader. However, when using the odb tool, source data need not be moved to the Trafodion cluster in a separate step.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>odb allows you to access Hadoop data using one of the following methods:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><strong>Use Hive and its ODBC Driver</strong>: odb can access HIVE like any other relational database.
For example, you can copy to from HIVE and other databases using odb&#8217;s copy option.</p>
</li>
<li>
<p><strong>Add the hdfs.</strong>` prefix to the input or output file during loads/extracts*: The file is read/written
from/to Hadoop. odb interacts directly with the HDFS file system using <strong>libhdfs</strong>.</p>
<div class="paragraph">
<p>This option is currently available only under Linux.</p>
</div>
</li>
</ol>
</div>
<div class="paragraph">
<p>The following odb commands/features are discussed in this guide:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="#trickle-load-odb-load">odb Load</a></p>
</li>
<li>
<p><a href="#trickle-load-odb-copy">odb Copy</a></p>
</li>
<li>
<p><a href="#trickle-load-odb-extract">odb Extract</a></p>
</li>
<li>
<p><a href="#trickle-load-odb-transform">odb Transform</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>See the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a>
for installation instructions and usage syntax for the odb tool.</p>
</div>
<div class="paragraph">
<p>The following subsections assume that you&#8217;ve installed odb.</p>
</div>
<div class="sect3">
<h4 id="trickle-load-odb-throughput">5.2.1. odb Throughput</h4>
<div class="paragraph">
<p>You achieve the best throughput with odb if using the UPSERT USING LOAD option.</p>
</div>
<div class="paragraph">
<p>The default insert type used by odb is INSERT; to use UPSERT USING LOAD, please specify <code>:loadcmd=UL</code> in odb&#8217;s <code>load</code> or
<code>copy</code> command.</p>
</div>
<div class="paragraph">
<p><strong>Example</strong></p>
</div>
<div class="paragraph">
<p>Copy the table <code>mytable</code> from <code>&lt;source_catalog&gt;.&lt;source_schema&gt;</code> on the source database to <code>trafodion.my
schema.mytable</code>
on Trafodion.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">odb64luo -u &lt;src_username&gt;:&lt;tgt_username&gt; -p &lt;src_pswd&gt;:&lt;tgt_pswd&gt;
-d &lt;src_dsn&gt;:&lt;tgt_dsn&gt;
-cp src:&lt;source_catalog&gt;.&lt;source_schema&gt;.mytable tgt:trafodion.myschema.mytable
:splitby=&lt;col-name&gt;:parallel=4:loadcmd=UL</code></pre>
</div>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 25%;">
<col style="width: 75%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Option</th>
<th class="tableblock halign-left valign-top">Defines</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>src_username</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">User name for the source database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>src_pswd</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Password for the source database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>src_dsn</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">ODBC DSN for the source database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt_username</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">User name for the Trafodion database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt_pswd</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Password for the Trafodion database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt_dsn</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">ODBC DSN for the Trafodion database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>splitby</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Defines the column used to evenly distributed values for parallelism. Consider using a leading key column.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>parallel=4</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Use four connections to extract data from the source database and another four connections to write data to the target Trafodion database.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>loadcmd=UL</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Use UPSERT USING LOAD syntax to write data.</p></td>
</tr>
</tbody>
</table>
</div>
<div class="sect3">
<h4 id="trickle-load-odb-load">5.2.2. odb Load</h4>
<div class="paragraph">
<p>Refer to the <a href="http://trafodion.apache.org/docs/odb/index.html#_load_files">Load Files</a> section
in the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a> for complete
documentation of this option.</p>
</div>
<div class="paragraph">
<p>You use the <code>-l</code> option to load into a table from:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>File or standard input (pipe)</p>
</li>
<li>
<p>gzip compressed files with no external dependencies</p>
</li>
<li>
<p>HDFS</p>
</li>
<li>
<p>Load XML files</p>
</li>
<li>
<p>Delimited and fixed format files</p>
</li>
<li>
<p>"Binary" files (example images)</p>
</li>
<li>
<p>Generic SQL scripts before/after loads</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The <code>-l</code> option provides:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Data generation (constant, sequences, random values, from external datasets)</p>
</li>
<li>
<p>Configurable rowsets</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>You can load single tables or list of tables in the same session using single/parallel threads.
Limited "ETL like" functionalities are provided; for example:
SUBSTR, TRANSLITERATION, TRUNCATE target, DATE/TIME format conversion, and TOUPPER.</p>
</div>
<div class="paragraph">
<p><strong>Important Options</strong></p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 25%;">
<col style="width: 75%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Option</th>
<th class="tableblock halign-left valign-top">Defines</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>src</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Source file. If empty, then odb generates sample data.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>fs</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Field separator.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Target table, required.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>map</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Map file. A text file describing which input column is mapped to which target table column. See
<a href="#trickle-load-odb-transform">odb Transform</a> below.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>rows</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Rowset size to be used.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>parallel</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Number of connections/threads to be used.`</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>loadcmd</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>IN</code>, <code>UP</code> or <code>UL</code>. INSERT, UPSERT or UPSERT USING LOAD. Use UL for best throughput.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><strong>Example</strong></p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u user -p xx -d dsn -l src=customer.tbl:tgt=TRAFODION.MAURIZIO.CUSTOMER \
:fs=\|:rows=1000:loadcmd=UL:truncate:parallel=4</code></pre>
</div>
</div>
<div class="paragraph">
<p>This command:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Loads the file named <code>customer.tbl</code> (<code>src=customer.tbl</code>)</p>
</li>
<li>
<p>in the table <code>TRAFODION.MAURIZIO.CUSTOMER</code> (<code>tgt=TRAFODION.MAURIZIO.CUSTOMER</code>)</p>
</li>
<li>
<p>using <code>|</code> (vertical bar) as a field separator (<code>fs=\|</code>)</p>
</li>
<li>
<p>using <code>1000 rows</code> as row-set buffer (<code>rows=1000</code>)</p>
</li>
<li>
<p>using UPSERT USING LOAD syntax to achieve better throughput</p>
</li>
<li>
<p>truncating the target table before loading (<code>truncate</code>)</p>
</li>
<li>
<p>using <code>4 parallel threads</code> to load the target table (<code>parallel=4</code>)</p>
</li>
</ul>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">./odb64luo -u xx -p yy -d traf_sqws125 -l src=myfile:fs=|:tgt=TRAFODION.SEABASE.REGION:map=region.map:max=10000:rows=500:parallel=2:loadcmd=UL</code></pre>
</div>
</div>
<div class="paragraph">
<p>You can load multiple files using different <code>-l</code> options. By default odb creates as many threads (and ODBC connections) as the sum
of parallel load threads.</p>
</div>
<div class="paragraph">
<p><strong>Example</strong></p>
</div>
<div class="paragraph">
<p>Truncates and load the CUSTOMER, ORDERS and LINEITEM tables in parallel.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="text">odb64luo -u user -p xx -d dsn -T 5 \
-l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.CUSTOMER:fs=\
|:rows=m2:truncate:norb:parallel=4 \
-l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.ORDERS:fs=\
|:rows=1000:truncate:norb:parallel=4 \