| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]--> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="generator" content="Asciidoctor 1.5.4"> |
| <title>Load and Transform Guide</title> |
| <style> |
| /** |
| * @@@ START COPYRIGHT @@@ |
| * |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, |
| * software distributed under the License is distributed on an |
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| * KIND, either express or implied. See the License for the |
| * specific language governing permissions and limitations |
| * under the License. |
| * |
| * @@@ END COPYRIGHT @@@ |
| */ |
| /* Asciidoctor default stylesheet | MIT License | http://asciidoctor.org */ |
| /* Remove the comments around the @import statement below when using this as a custom stylesheet */ |
| /*@import "https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400";*/ |
| article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block} |
| audio,canvas,video{display:inline-block} |
| audio:not([controls]){display:none;height:0} |
| [hidden],template{display:none} |
| script{display:none!important} |
| html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%} |
| body{margin:0} |
| a{background:transparent} |
| a:focus{outline:thin dotted} |
| a:active,a:hover{outline:0} |
| h1{font-size:2em;margin:.67em 0} |
| abbr[title]{border-bottom:1px dotted} |
| b,strong{font-weight:bold} |
| dfn{font-style:italic} |
| hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0} |
| mark{background:#ff0;color:#000} |
| code,kbd,pre,samp{font-family:monospace;font-size:1em} |
| pre{white-space:pre-wrap} |
| q{quotes:"\201C" "\201D" "\2018" "\2019"} |
| small{font-size:80%} |
| sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline} |
| sup{top:-.5em} |
| sub{bottom:-.25em} |
| img{border:0} |
| svg:not(:root){overflow:hidden} |
| figure{margin:0} |
| fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em} |
| legend{border:0;padding:0} |
| button,input,select,textarea{font-family:inherit;font-size:100%;margin:0} |
| button,input{line-height:normal} |
| button,select{text-transform:none} |
| button,html input[type="button"],input[type="reset"],input[type="submit"]{-webkit-appearance:button;cursor:pointer} |
| button[disabled],html input[disabled]{cursor:default} |
| input[type="checkbox"],input[type="radio"]{box-sizing:border-box;padding:0} |
| input[type="search"]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box} |
| input[type="search"]::-webkit-search-cancel-button,input[type="search"]::-webkit-search-decoration{-webkit-appearance:none} |
| button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0} |
| textarea{overflow:auto;vertical-align:top} |
| table{border-collapse:collapse;border-spacing:0} |
| *,*:before,*:after{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box} |
| html,body{font-size:100%} |
| body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif,serif;font-weight:400;font-style:normal;line-height:1;position:relative;cursor:auto} |
| a:hover{cursor:pointer} |
| img,object,embed{max-width:100%;height:auto} |
| object,embed{height:100%} |
| img{-ms-interpolation-mode:bicubic} |
| #map_canvas img,#map_canvas embed,#map_canvas object,.map_canvas img,.map_canvas embed,.map_canvas object{max-width:none!important} |
| .left{float:left!important} |
| .right{float:right!important} |
| .text-left{text-align:left!important} |
| .text-right{text-align:right!important} |
| .text-center{text-align:center!important} |
| .text-justify{text-align:justify!important} |
| .hide{display:none} |
| .antialiased,body{-webkit-font-smoothing:antialiased} |
| img{display:inline-block;vertical-align:middle} |
| textarea{height:auto;min-height:50px} |
| select{width:100%} |
| p.lead,.paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type p{font-size:1.21875em;line-height:1.6} |
| .subheader,.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#3188ac;font-weight:400;margin-top:0;margin-bottom:.25em} |
| div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0;direction:ltr} |
| a{color:#2156a5;text-decoration:underline;line-height:inherit} |
| a:hover,a:focus{color:#1d4b8f} |
| a img{border:none} |
| p{font-family:inherit;font-weight:400;font-size:1em;line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility} |
| p aside{font-size:.875em;line-height:1.35;font-style:italic} |
| /* Defines headings */ |
| h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-weight:bold;font-style:normal;color:#3188ac;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em} |
| h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0} |
| h1{font-size:2.125em} |
| h2{font-size:1.6875em} |
| h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em} |
| h4,h5{font-size:1.125em} |
| h6{font-size:1em} |
| hr{border:solid #ddddd8;border-width:1px 0 0;clear:both;margin:1.25em 0 1.1875em;height:0} |
| em,i{font-style:italic;line-height:inherit} |
| strong,b{font-weight:bold;line-height:inherit} |
| small{font-size:60%;line-height:inherit} |
| /* Defines the `text` (passthru) format */ |
| code{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:bold;color:#3188ac} |
| ul,ol,dl{font-size:1em;line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit} |
| ul,ol,ul.no-bullet,ol.no-bullet{margin-left:1.5em} |
| ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0;font-size:1em} |
| ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit} |
| ul.square{list-style-type:square} |
| ul.circle{list-style-type:circle} |
| ul.disc{list-style-type:disc} |
| ul.no-bullet{list-style:none} |
| ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0} |
| dl dt{margin-bottom:.3125em;font-weight:bold} |
| dl dd{margin-bottom:1.25em} |
| abbr,acronym{text-transform:uppercase;font-size:90%;color:rgba(0,0,0,.8);border-bottom:1px dotted #ddd;cursor:help} |
| abbr{text-transform:none} |
| blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px solid #ddd} |
| blockquote cite{display:block;font-size:.9375em;color:rgba(0,0,0,.6)} |
| blockquote cite:before{content:"\2014 \0020"} |
| blockquote cite a,blockquote cite a:visited{color:rgba(0,0,0,.6)} |
| blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)} |
| @media only screen and (min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2} |
| h1{font-size:2.75em} |
| h2{font-size:2.3125em} |
| h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em} |
| h4{font-size:1.4375em}}table{background:#fff;margin-bottom:1.25em;border:solid 1px #dedede} |
| table thead,table tfoot{background:#f7f8f7;font-weight:bold} |
| table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr td{padding:.5em .625em .625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left} |
| table tr th,table tr td{padding:.5625em .625em;font-size:inherit;color:rgba(0,0,0,.8)} |
| table tr.even,table tr.alt,table tr:nth-of-type(even){background:#f8f8f7} |
| table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot tr td{display:table-cell;line-height:1.6} |
| h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em} |
| h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title strong,h4 strong,h5 strong,h6 strong{font-weight:400} |
| .clearfix:before,.clearfix:after,.float-group:before,.float-group:after{content:" ";display:table} |
| .clearfix:after,.float-group:after{clear:both} |
| *:not(pre)>code{font-size:1.0em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;-webkit-border-radius:4px;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed} |
| /**:not(pre)>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;background-color:#f7f7f8;-webkit-border-radius:4px;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}*/ |
| pre,pre>code{line-height:1.45;color:rgba(0,0,0,.9);font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;text-rendering:optimizeSpeed} |
| .keyseq{color:rgba(51,51,51,.8)} |
| kbd{display:inline-block;color:rgba(0,0,0,.8);font-size:.75em;line-height:1.4;background-color:#f7f7f7;border:1px solid #ccc;-webkit-border-radius:3px;border-radius:3px;-webkit-box-shadow:0 1px 0 rgba(0,0,0,.2),0 0 0 .1em white inset;box-shadow:0 1px 0 rgba(0,0,0,.2),0 0 0 .1em #fff inset;margin:-.15em .15em 0 .15em;padding:.2em .6em .2em .5em;vertical-align:middle;white-space:nowrap} |
| .keyseq kbd:first-child{margin-left:0} |
| .keyseq kbd:last-child{margin-right:0} |
| .menuseq,.menu{color:rgba(0,0,0,.8)} |
| b.button:before,b.button:after{position:relative;top:-1px;font-weight:400} |
| b.button:before{content:"[";padding:0 3px 0 2px} |
| b.button:after{content:"]";padding:0 2px 0 3px} |
| p a>code:hover{color:rgba(0,0,0,.9)} |
| #header,#content,#footnotes,#footer{width:100%;margin-left:auto;margin-right:auto;margin-top:0;margin-bottom:0;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em} |
| #header:before,#header:after,#content:before,#content:after,#footnotes:before,#footnotes:after,#footer:before,#footer:after{content:" ";display:table} |
| #header:after,#content:after,#footnotes:after,#footer:after{clear:both} |
| #content{margin-top:1.25em} |
| #content:before{content:none} |
| /* #header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0} */ |
| #header>h1:first-child{color:#3188ac;margin-top:2.25rem;margin-bottom:0} |
| #header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #ddddd8} |
| #header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px solid #ddddd8;padding-bottom:8px} |
| #header .details{border-bottom:1px solid #ddddd8;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:-ms-flexbox;display:-webkit-flex;display:flex;-ms-flex-flow:row wrap;-webkit-flex-flow:row wrap;flex-flow:row wrap} |
| #header .details span:first-child{margin-left:-.125em} |
| #header .details span.email a{color:rgba(0,0,0,.85)} |
| #header .details br{display:none} |
| #header .details br+span:before{content:"\00a0\2013\00a0"} |
| #header .details br+span.author:before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)} |
| #header .details br+span#revremark:before{content:"\00a0|\00a0"} |
| #header #revnumber{text-transform:capitalize} |
| #header #revnumber:after{content:"\00a0"} |
| #content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px solid #ddddd8;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem} |
| #toc{border-bottom:1px solid #efefed;padding-bottom:.5em} |
| #toc>ul{margin-left:.125em} |
| #toc ul.sectlevel0>li>a{font-style:italic} |
| #toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0} |
| #toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none} |
| #toc a{text-decoration:none} |
| #toc a:active{text-decoration:underline} |
| #toctitle{color:#3188ac;font-size:1.2em} |
| @media only screen and (min-width:768px){#toctitle{font-size:1.375em} |
| body.toc2{padding-left:15em;padding-right:0} |
| #toc.toc2{margin-top:0!important;background-color:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px solid #efefed;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em 1em;height:100%;overflow:auto} |
| #toc.toc2 #toctitle{margin-top:0;font-size:1.2em} |
| #toc.toc2>ul{font-size:.9em;margin-bottom:0} |
| #toc.toc2 ul ul{margin-left:0;padding-left:1em} |
| #toc.toc2 ul.sectlevel0 ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em} |
| body.toc2.toc-right{padding-left:0;padding-right:15em} |
| body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid #efefed;left:auto;right:0}}@media only screen and (min-width:1280px){body.toc2{padding-left:20em;padding-right:0} |
| /* Controls width of panel */ |
| #toc.toc2{width:20em} |
| #toc.toc2 #toctitle{font-size:1.375em} |
| #toc.toc2>ul{font-size:.95em} |
| #toc.toc2 ul ul{padding-left:1.25em} |
| body.toc2.toc-right{padding-left:0;padding-right:20em}}#content #toc{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px} |
| #content #toc>:first-child{margin-top:0} |
| #content #toc>:last-child{margin-bottom:0} |
| #footer{max-width:100%;background-color:rgba(0,0,0,.8);padding:1.25em} |
| #footer-text,#footer_nav{color:rgba(255,255,255,.8);line-height:1.44} |
| #footer a{color: #990000} |
| .sect1{padding-bottom:.625em} |
| @media only screen and (min-width:768px){.sect1{padding-bottom:1.25em}}.sect1+.sect1{border-top:1px solid #efefed} |
| #content h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400} |
| #content h1>a.anchor:before,h2>a.anchor:before,h3>a.anchor:before,#toctitle>a.anchor:before,.sidebarblock>.content>.title>a.anchor:before,h4>a.anchor:before,h5>a.anchor:before,h6>a.anchor:before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em} |
| #content h1:hover>a.anchor,#content h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible} |
| #content h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#990000;text-decoration:none} |
| #content h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221} |
| .audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em} |
| .admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif,serif;font-size:1rem;font-weight:bold} |
| /* Here */ |
| table.tableblock>caption.title{white-space:nowrap;overflow:visible;max-width:0} |
| .paragraph.lead>p,#preamble>.sectionbody>.paragraph:first-of-type p{color:rgba(0,0,0,.85)} |
| table.tableblock #preamble>.sectionbody>.paragraph:first-of-type p{font-size:inherit} |
| .admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%} |
| .admonitionblock>table td.icon{text-align:center;width:80px} |
| .admonitionblock>table td.icon img{max-width:none} |
| .admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open Sans","DejaVu Sans",sans-serif;text-transform:uppercase} |
| .admonitionblock>table td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid #ddddd8;color:rgba(0,0,0,.6)} |
| .admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0} |
| .exampleblock>.content{border-style:solid;border-width:1px;border-color:#e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;-webkit-border-radius:4px;border-radius:4px} |
| .exampleblock>.content>:first-child{margin-top:0} |
| .exampleblock>.content>:last-child{margin-bottom:0} |
| .sidebarblock{border-style:solid;border-width:1px;border-color:#e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;-webkit-border-radius:4px;border-radius:4px} |
| .sidebarblock>:first-child{margin-top:0} |
| .sidebarblock>:last-child{margin-bottom:0} |
| .sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center} |
| .exampleblock>.content>:last-child>:last-child,.exampleblock>.content .olist>ol>li:last-child>:last-child,.exampleblock>.content .ulist>ul>li:last-child>:last-child,.exampleblock>.content .qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content .olist>ol>li:last-child>:last-child,.sidebarblock>.content .ulist>ul>li:last-child>:last-child,.sidebarblock>.content .qlist>ol>li:last-child>:last-child{margin-bottom:0} |
| .literalblock pre,.listingblock pre:not(.highlight),.listingblock pre[class="highlight"],.listingblock pre[class^="highlight "],.listingblock pre.CodeRay,.listingblock pre.prettyprint{background:#f7f7f8} |
| .sidebarblock .literalblock pre,.sidebarblock .listingblock pre:not(.highlight),.sidebarblock .listingblock pre[class="highlight"],.sidebarblock .listingblock pre[class^="highlight "],.sidebarblock .listingblock pre.CodeRay,.sidebarblock .listingblock pre.prettyprint{background:#f2f1f1} |
| .literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{-webkit-border-radius:4px;border-radius:4px;word-wrap:break-word;padding:1em;font-size:.8125em} |
| .literalblock pre.nowrap,.literalblock pre[class].nowrap,.listingblock pre.nowrap,.listingblock pre[class].nowrap{overflow-x:auto;white-space:pre;word-wrap:normal} |
| @media only screen and (min-width:768px){.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{font-size:.90625em}}@media only screen and (min-width:1280px){.literalblock pre,.literalblock pre[class],.listingblock pre,.listingblock pre[class]{font-size:1em}}.literalblock.output pre{color:#f7f7f8;background-color:rgba(0,0,0,.9)} |
| .listingblock pre.highlightjs{padding:0} |
| .listingblock pre.highlightjs>code{padding:1em;-webkit-border-radius:4px;border-radius:4px} |
| .listingblock pre.prettyprint{border-width:0} |
| .listingblock>.content{position:relative} |
| .listingblock code[data-lang]:before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:#999} |
| .listingblock:hover code[data-lang]:before{display:block} |
| .listingblock.terminal pre .command:before{content:attr(data-prompt);padding-right:.5em;color:#999} |
| .listingblock.terminal pre .command:not([data-prompt]):before{content:"$"} |
| table.pyhltable{border-collapse:separate;border:0;margin-bottom:0;background:none} |
| table.pyhltable td{vertical-align:top;padding-top:0;padding-bottom:0} |
| table.pyhltable td.code{padding-left:.75em;padding-right:0} |
| pre.pygments .lineno,table.pyhltable td:not(.code){color:#999;padding-left:0;padding-right:.5em;border-right:1px solid #ddddd8} |
| pre.pygments .lineno{display:inline-block;margin-right:.25em} |
| table.pyhltable .linenodiv{background:none!important;padding-right:0!important} |
| .quoteblock{margin:0 1em 1.25em 1.5em;display:table} |
| .quoteblock>.title{margin-left:-1.5em;margin-bottom:.75em} |
| .quoteblock blockquote,.quoteblock blockquote p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify} |
| .quoteblock blockquote{margin:0;padding:0;border:0} |
| .quoteblock blockquote:before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0 1px 2px rgba(0,0,0,.1)} |
| .quoteblock blockquote>.paragraph:last-child p{margin-bottom:0} |
| .quoteblock .attribution{margin-top:.5em;margin-right:.5ex;text-align:right} |
| .quoteblock .quoteblock{margin-left:0;margin-right:0;padding:.5em 0;border-left:3px solid rgba(0,0,0,.6)} |
| .quoteblock .quoteblock blockquote{padding:0 0 0 .75em} |
| .quoteblock .quoteblock blockquote:before{display:none} |
| .verseblock{margin:0 1em 1.25em 1em} |
| .verseblock pre{font-family:"Open Sans","DejaVu Sans",sans;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility} |
| .verseblock pre strong{font-weight:400} |
| .verseblock .attribution{margin-top:1.25rem;margin-left:.5ex} |
| .quoteblock .attribution,.verseblock .attribution{font-size:.9375em;line-height:1.45;font-style:italic} |
| .quoteblock .attribution br,.verseblock .attribution br{display:none} |
| .quoteblock .attribution cite,.verseblock .attribution cite{display:block;letter-spacing:-.05em;color:rgba(0,0,0,.6)} |
| .quoteblock.abstract{margin:0 0 1.25em 0;display:block} |
| .quoteblock.abstract blockquote,.quoteblock.abstract blockquote p{text-align:left;word-spacing:0} |
| .quoteblock.abstract blockquote:before,.quoteblock.abstract blockquote p:first-of-type:before{display:none} |
| table.tableblock{max-width:100%;border-collapse:separate} |
| table.tableblock td>.paragraph:last-child p>p:last-child,table.tableblock th>p:last-child,table.tableblock td>p:last-child{margin-bottom:0} |
| table.spread{width:100%} |
| table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede} |
| table.grid-all th.tableblock,table.grid-all td.tableblock{border-width:0 1px 1px 0} |
| table.grid-all tfoot>tr>th.tableblock,table.grid-all tfoot>tr>td.tableblock{border-width:1px 1px 0 0} |
| table.grid-cols th.tableblock,table.grid-cols td.tableblock{border-width:0 1px 0 0} |
| table.grid-all *>tr>.tableblock:last-child,table.grid-cols *>tr>.tableblock:last-child{border-right-width:0} |
| table.grid-rows th.tableblock,table.grid-rows td.tableblock{border-width:0 0 1px 0} |
| table.grid-all tbody>tr:last-child>th.tableblock,table.grid-all tbody>tr:last-child>td.tableblock,table.grid-all thead:last-child>tr>th.tableblock,table.grid-rows tbody>tr:last-child>th.tableblock,table.grid-rows tbody>tr:last-child>td.tableblock,table.grid-rows thead:last-child>tr>th.tableblock{border-bottom-width:0} |
| table.grid-rows tfoot>tr>th.tableblock,table.grid-rows tfoot>tr>td.tableblock{border-width:1px 0 0 0} |
| table.frame-all{border-width:1px} |
| table.frame-sides{border-width:0 1px} |
| table.frame-topbot{border-width:1px 0} |
| th.halign-left,td.halign-left{text-align:left} |
| th.halign-right,td.halign-right{text-align:right} |
| th.halign-center,td.halign-center{text-align:center} |
| th.valign-top,td.valign-top{vertical-align:top} |
| th.valign-bottom,td.valign-bottom{vertical-align:bottom} |
| th.valign-middle,td.valign-middle{vertical-align:middle} |
| table thead th,table tfoot th{font-weight:bold} |
| tbody tr th{display:table-cell;line-height:1.6;background:#f7f8f7} |
| tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th p{color:rgba(0,0,0,.8);font-weight:bold} |
| p.tableblock>code:only-child{background:none;padding:0} |
| p.tableblock{font-size:1em} |
| td>div.verse{white-space:pre} |
| ol{margin-left:1.75em} |
| ul li ol{margin-left:1.5em} |
| dl dd{margin-left:1.125em} |
| dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0} |
| ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist .ulist{margin-bottom:.625em} |
| ul.unstyled,ol.unnumbered,ul.checklist,ul.none{list-style-type:none} |
| ul.unstyled,ol.unnumbered,ul.checklist{margin-left:.625em} |
| ul.checklist li>p:first-child>.fa-square-o:first-child,ul.checklist li>p:first-child>.fa-check-square-o:first-child{width:1em;font-size:.85em} |
| ul.checklist li>p:first-child>input[type="checkbox"]:first-child{width:1em;position:relative;top:1px} |
| ul.inline{margin:0 auto .625em auto;margin-left:-1.375em;margin-right:0;padding:0;list-style:none;overflow:hidden} |
| ul.inline>li{list-style:none;float:left;margin-left:1.375em;display:block} |
| ul.inline>li>*{display:block} |
| .unstyled dl dt{font-weight:400;font-style:normal} |
| ol.arabic{list-style-type:decimal} |
| ol.decimal{list-style-type:decimal-leading-zero} |
| ol.loweralpha{list-style-type:lower-alpha} |
| ol.upperalpha{list-style-type:upper-alpha} |
| ol.lowerroman{list-style-type:lower-roman} |
| ol.upperroman{list-style-type:upper-roman} |
| ol.lowergreek{list-style-type:lower-greek} |
| .hdlist>table,.colist>table{border:0;background:none} |
| .hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none} |
| td.hdlist1{padding-right:.75em;font-weight:bold} |
| td.hdlist1,td.hdlist2{vertical-align:top} |
| .literalblock+.colist,.listingblock+.colist{margin-top:-.5em} |
| .colist>table tr>td:first-of-type{padding:0 .75em;line-height:1} |
| .colist>table tr>td:last-of-type{padding:.25em 0} |
| .thumb,.th{line-height:0;display:inline-block;border:solid 4px #fff;-webkit-box-shadow:0 0 0 1px #ddd;box-shadow:0 0 0 1px #ddd} |
| .imageblock.left,.imageblock[style*="float: left"]{margin:.25em .625em 1.25em 0} |
| .imageblock.right,.imageblock[style*="float: right"]{margin:.25em 0 1.25em .625em} |
| .imageblock>.title{margin-bottom:0} |
| .imageblock.thumb,.imageblock.th{border-width:6px} |
| .imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em} |
| .image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0} |
| .image.left{margin-right:.625em} |
| .image.right{margin-left:.625em} |
| a.image{text-decoration:none} |
| span.footnote,span.footnoteref{vertical-align:super;font-size:.875em} |
| span.footnote a,span.footnoteref a{text-decoration:none} |
| span.footnote a:active,span.footnoteref a:active{text-decoration:underline} |
| #footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em} |
| #footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em 0;border-width:1px 0 0 0} |
| #footnotes .footnote{padding:0 .375em;line-height:1.3;font-size:.875em;margin-left:1.2em;text-indent:-1.2em;margin-bottom:.2em} |
| #footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none} |
| #footnotes .footnote:last-of-type{margin-bottom:0} |
| #content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0} |
| .gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0} |
| .gist .file-data>table td.line-data{width:99%} |
| div.unbreakable{page-break-inside:avoid} |
| .big{font-size:larger} |
| .small{font-size:smaller} |
| .underline{text-decoration:underline} |
| .overline{text-decoration:overline} |
| .line-through{text-decoration:line-through} |
| .aqua{color:#00bfbf} |
| .aqua-background{background-color:#00fafa} |
| .black{color:#000} |
| .black-background{background-color:#000} |
| .blue{color:#0000bf} |
| .blue-background{background-color:#0000fa} |
| .fuchsia{color:#bf00bf} |
| .fuchsia-background{background-color:#fa00fa} |
| .gray{color:#606060} |
| .gray-background{background-color:#7d7d7d} |
| .green{color:#006000} |
| .green-background{background-color:#007d00} |
| .lime{color:#00bf00} |
| .lime-background{background-color:#00fa00} |
| .maroon{color:#600000} |
| .maroon-background{background-color:#7d0000} |
| .navy{color:#000060} |
| .navy-background{background-color:#00007d} |
| .olive{color:#606000} |
| .olive-background{background-color:#7d7d00} |
| .purple{color:#600060} |
| .purple-background{background-color:#7d007d} |
| .red{color:#bf0000} |
| .red-background{background-color:#fa0000} |
| .silver{color:#909090} |
| .silver-background{background-color:#bcbcbc} |
| .teal{color:#006060} |
| .teal-background{background-color:#007d7d} |
| .white{color:#bfbfbf} |
| .white-background{background-color:#fafafa} |
| .yellow{color:#bfbf00} |
| .yellow-background{background-color:#fafa00} |
| span.icon>.fa{cursor:default} |
| .admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default} |
| .admonitionblock td.icon .icon-note:before{content:"\f05a";color:#19407c} |
| .admonitionblock td.icon .icon-tip:before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111} |
| .admonitionblock td.icon .icon-warning:before{content:"\f071";color:#bf6900} |
| .admonitionblock td.icon .icon-caution:before{content:"\f06d";color:#bf3400} |
| .admonitionblock td.icon .icon-important:before{content:"\f06a";color:#bf0000} |
| .conum[data-value]{display:inline-block;color:#fff!important;background-color:rgba(0,0,0,.8);-webkit-border-radius:100px;border-radius:100px;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold} |
| .conum[data-value] *{color:#fff!important} |
| .conum[data-value]+b{display:none} |
| .conum[data-value]:after{content:attr(data-value)} |
| pre .conum[data-value]{position:relative;top:-.125em} |
| b.conum *{color:inherit!important} |
| .conum:not([data-value]):empty{display:none} |
| h1,h2{letter-spacing:-.01em} |
| dt,th.tableblock,td.content{text-rendering:optimizeLegibility} |
| p,td.content{letter-spacing:-.01em} |
| p strong,td.content strong{letter-spacing:-.005em} |
| p,blockquote,dt,td.content{font-size:1.0625rem} |
| p{margin-bottom:1.25rem} |
| .sidebarblock p,.sidebarblock dt,.sidebarblock td.content,p.tableblock{font-size:1em} |
| .exampleblock>.content{background-color:#fffef7;border-color:#e0e0dc;-webkit-box-shadow:0 1px 4px #e0e0dc;box-shadow:0 1px 4px #e0e0dc} |
| .print-only{display:none!important} |
| @media print{@page{margin:1.25cm .75cm} |
| *{-webkit-box-shadow:none!important;box-shadow:none!important;text-shadow:none!important} |
| a{color:inherit!important;text-decoration:underline!important} |
| a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important} |
| a[href^="http:"]:not(.bare):after,a[href^="https:"]:not(.bare):after{content:"(" attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em} |
| abbr[title]:after{content:" (" attr(title) ")"} |
| pre,blockquote,tr,img{page-break-inside:avoid} |
| thead{display:table-header-group} |
| img{max-width:100%!important} |
| p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3} |
| h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid} |
| #toc,.sidebarblock,.exampleblock>.content{background:none!important} |
| #toc{border-bottom:1px solid #ddddd8!important;padding-bottom:0!important} |
| .sect1{padding-bottom:0!important} |
| .sect1+.sect1{border:0!important} |
| #header>h1:first-child{margin-top:1.25rem} |
| body.book #header{text-align:center} |
| body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em 0} |
| body.book #header .details{border:0!important;display:block;padding:0!important} |
| body.book #header .details span:first-child{margin-left:0!important} |
| body.book #header .details br{display:block} |
| body.book #header .details br+span:before{content:none!important} |
| body.book #toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important} |
| body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-break-before:always} |
| .listingblock code[data-lang]:before{display:block} |
| #footer{background:none!important;padding:0 .9375em} |
| #footer-text{color:rgba(0,0,0,.6)!important;font-size:.9em} |
| .hide-on-print{display:none!important} |
| .print-only{display:block!important} |
| .hide-for-print{display:none!important} |
| .show-for-print{display:inherit!important}} |
| div.paragraph.indented p {padding-left: 3em;} |
| div.paragraph.indented2 p {padding-left: 6em;} |
| div.paragraph.indented3 p {padding-left: 9em;} |
| |
| </style> |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.5.0/css/font-awesome.min.css"> |
| <style> |
| /* Stylesheet for CodeRay to match GitHub theme | MIT License | http://foundation.zurb.com */ |
| /*pre.CodeRay {background-color:#f7f7f8;}*/ |
| .CodeRay .line-numbers{border-right:1px solid #d8d8d8;padding:0 0.5em 0 .25em} |
| .CodeRay span.line-numbers{display:inline-block;margin-right:.5em;color:rgba(0,0,0,.3)} |
| .CodeRay .line-numbers strong{color:rgba(0,0,0,.4)} |
| table.CodeRay{border-collapse:separate;border-spacing:0;margin-bottom:0;border:0;background:none} |
| table.CodeRay td{vertical-align: top;line-height:1.45} |
| table.CodeRay td.line-numbers{text-align:right} |
| table.CodeRay td.line-numbers>pre{padding:0;color:rgba(0,0,0,.3)} |
| table.CodeRay td.code{padding:0 0 0 .5em} |
| table.CodeRay td.code>pre{padding:0} |
| .CodeRay .debug{color:#fff !important;background:#000080 !important} |
| .CodeRay .annotation{color:#007} |
| .CodeRay .attribute-name{color:#000080} |
| .CodeRay .attribute-value{color:#700} |
| .CodeRay .binary{color:#509} |
| .CodeRay .comment{color:#998;font-style:italic} |
| .CodeRay .char{color:#04d} |
| .CodeRay .char .content{color:#04d} |
| .CodeRay .char .delimiter{color:#039} |
| .CodeRay .class{color:#458;font-weight:bold} |
| .CodeRay .complex{color:#a08} |
| .CodeRay .constant,.CodeRay .predefined-constant{color:#008080} |
| .CodeRay .color{color:#099} |
| .CodeRay .class-variable{color:#369} |
| .CodeRay .decorator{color:#b0b} |
| .CodeRay .definition{color:#099} |
| .CodeRay .delimiter{color:#000} |
| .CodeRay .doc{color:#970} |
| .CodeRay .doctype{color:#34b} |
| .CodeRay .doc-string{color:#d42} |
| .CodeRay .escape{color:#666} |
| .CodeRay .entity{color:#800} |
| .CodeRay .error{color:#808} |
| .CodeRay .exception{color:inherit} |
| .CodeRay .filename{color:#099} |
| .CodeRay .function{color:#900;font-weight:bold} |
| .CodeRay .global-variable{color:#008080} |
| .CodeRay .hex{color:#058} |
| .CodeRay .integer,.CodeRay .float{color:#099} |
| .CodeRay .include{color:#555} |
| .CodeRay .inline{color:#000} |
| .CodeRay .inline .inline{background:#ccc} |
| .CodeRay .inline .inline .inline{background:#bbb} |
| .CodeRay .inline .inline-delimiter{color:#d14} |
| .CodeRay .inline-delimiter{color:#d14} |
| .CodeRay .important{color:#555;font-weight:bold} |
| .CodeRay .interpreted{color:#b2b} |
| .CodeRay .instance-variable{color:#008080} |
| .CodeRay .label{color:#970} |
| .CodeRay .local-variable{color:#963} |
| .CodeRay .octal{color:#40e} |
| .CodeRay .predefined{color:#369} |
| .CodeRay .preprocessor{color:#579} |
| .CodeRay .pseudo-class{color:#555} |
| .CodeRay .directive{font-weight:bold} |
| .CodeRay .type{font-weight:bold} |
| .CodeRay .predefined-type{color:inherit} |
| .CodeRay .reserved,.CodeRay .keyword {color:#000;font-weight:bold} |
| .CodeRay .key{color:#808} |
| .CodeRay .key .delimiter{color:#606} |
| .CodeRay .key .char{color:#80f} |
| .CodeRay .value{color:#088} |
| .CodeRay .regexp .delimiter{color:#808} |
| .CodeRay .regexp .content{color:#808} |
| .CodeRay .regexp .modifier{color:#808} |
| .CodeRay .regexp .char{color:#d14} |
| .CodeRay .regexp .function{color:#404;font-weight:bold} |
| .CodeRay .string{color:#d20} |
| .CodeRay .string .string .string{background:#ffd0d0} |
| .CodeRay .string .content{color:#d14} |
| .CodeRay .string .char{color:#d14} |
| .CodeRay .string .delimiter{color:#d14} |
| .CodeRay .shell{color:#d14} |
| .CodeRay .shell .delimiter{color:#d14} |
| .CodeRay .symbol{color:#990073} |
| .CodeRay .symbol .content{color:#a60} |
| .CodeRay .symbol .delimiter{color:#630} |
| .CodeRay .tag{color:#008080} |
| .CodeRay .tag-special{color:#d70} |
| .CodeRay .variable{color:#036} |
| .CodeRay .insert{background:#afa} |
| .CodeRay .delete{background:#faa} |
| .CodeRay .change{color:#aaf;background:#007} |
| .CodeRay .head{color:#f8f;background:#505} |
| .CodeRay .insert .insert{color:#080} |
| .CodeRay .delete .delete{color:#800} |
| .CodeRay .change .change{color:#66f} |
| .CodeRay .head .head{color:#f4f} |
| </style> |
| </head> |
| <body class="book toc2 toc-left"> |
| <div id="header"> |
| <h1>Load and Transform Guide</h1> |
| <div class="details"> |
| <span id="revnumber">version 2.4.0</span> |
| </div> |
| <div id="toc" class="toc2"> |
| <div id="toctitle">Table of Contents</div> |
| <ul class="sectlevel1"> |
| <li><a href="#_about_this_document">1. About This Document</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_intended_audience">1.1. Intended Audience</a></li> |
| <li><a href="#_new_and_changed_information">1.2. New and Changed Information</a></li> |
| <li><a href="#_notation_conventions">1.3. Notation Conventions</a></li> |
| <li><a href="#_comments_encouraged">1.4. Comments Encouraged</a></li> |
| </ul> |
| </li> |
| <li><a href="#introduction">2. Introduction</a> |
| <ul class="sectlevel2"> |
| <li><a href="#introduction-load-methods">2.1. Load Methods</a> |
| <ul class="sectlevel3"> |
| <li><a href="#introduction-insert-types">2.1.1. Insert Types</a></li> |
| </ul> |
| </li> |
| <li><a href="#_unload">2.2. Unload</a></li> |
| </ul> |
| </li> |
| <li><a href="#tables-indexes">3. Tables and Indexes</a> |
| <ul class="sectlevel2"> |
| <li><a href="#choose-primary-key">3.1. Choose Primary Key</a></li> |
| <li><a href="#salting">3.2. Salting</a></li> |
| <li><a href="#compression-encoding">3.3. Compression and Encoding</a></li> |
| <li><a href="#create-trafodion-tables-and-indexes">3.4. Create Tables and Indexes</a></li> |
| <li><a href="#_update_statistics">3.5. Update Statistics</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_default_sampling">3.5.1. Default Sampling</a></li> |
| </ul> |
| </li> |
| <li><a href="#_generate_single_column_and_multi_column_histograms_from_one_statement">3.6. Generate Single-Column and Multi-Column Histograms From One Statement</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_enable_update_statistics_automation">3.6.1. Enable Update Statistics Automation</a></li> |
| <li><a href="#_regenerate_histograms">3.6.2. Regenerate Histograms</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li><a href="#bulk-load">4. Bulk Load</a> |
| <ul class="sectlevel2"> |
| <li><a href="#bulk-load-from-trafodion-tables">4.1. Load Data From Trafodion Tables</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_example">4.1.1. Example</a></li> |
| </ul> |
| </li> |
| <li><a href="#bulk-load-data-from-hdfs-files">4.2. Load Data From HDFS Files</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_example_2">4.2.1. Example</a></li> |
| </ul> |
| </li> |
| <li><a href="#bulk-load-data-from-hive-tables">4.3. Load Data From Hive Tables</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_example_3">4.3.1. Example</a></li> |
| </ul> |
| </li> |
| <li><a href="#bulk-load-data-from-external-databases">4.4. Load Data From External Databases</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_install_required_software">4.4.1. Install Required Software</a></li> |
| <li><a href="#_sample_sqoop_commands">4.4.2. Sample Sqoop Commands</a></li> |
| <li><a href="#_example_4">4.4.3. Example</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li><a href="#trickle-load">5. Trickle Load</a> |
| <ul class="sectlevel2"> |
| <li><a href="#trickle-load-improving-throughput">5.1. Improving Throughput</a></li> |
| <li><a href="#trickle-load-odb">5.2. odb</a> |
| <ul class="sectlevel3"> |
| <li><a href="#trickle-load-odb-throughput">5.2.1. odb Throughput</a></li> |
| <li><a href="#trickle-load-odb-load">5.2.2. odb Load</a></li> |
| <li><a href="#trickle-load-odb-copy">5.2.3. odb Copy</a></li> |
| <li><a href="#trickle-load-odb-extract">5.2.4. odb Extract</a></li> |
| <li><a href="#trickle-load-odb-transform">5.2.5. odb Transform</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li><a href="#bulk-unload">6. Bulk Unload</a></li> |
| <li><a href="#monitor-progress">7. Monitor Progress</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_insert_and_upsert">7.1. INSERT and UPSERT</a></li> |
| <li><a href="#_upsert_using_load">7.2. UPSERT USING LOAD</a></li> |
| <li><a href="#_load">7.3. LOAD</a></li> |
| </ul> |
| </li> |
| <li><a href="#troubleshoot">8. Troubleshoot</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_improving_throughput">8.1. Improving Throughput</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_tuplelists_or_rowsets">8.1.1. Tuplelists or Rowsets</a></li> |
| <li><a href="#_native_hbase_tables">8.1.2. Native HBase Tables</a></li> |
| <li><a href="#_hive_tables">8.1.3. Hive Tables</a></li> |
| </ul> |
| </li> |
| <li><a href="#_checking_plan_quality">8.2. Checking Plan Quality</a></li> |
| <li><a href="#_update_statistics_times_out_during_sampling">8.3. UPDATE STATISTICS Times Out During Sampling</a></li> |
| <li><a href="#_index_creation_takes_too_long">8.4. Index Creation Takes Too Long</a></li> |
| <li><a href="#large-deletes">8.5. Large Deletes Take Too Long or Error Out</a></li> |
| <li><a href="#_large_upsert_using_load_on_a_table_with_index_errors_out">8.6. Large UPSERT USING LOAD On a Table With Index Errors Out</a></li> |
| </ul> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div id="content"> |
| <div id="preamble"> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p><strong>License Statement</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at <a href="http://www.apache.org/licenses/LICENSE-2.0" class="bare">http://www.apache.org/licenses/LICENSE-2.0</a></p> |
| </div> |
| <div class="paragraph"> |
| <p>Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations under the License.</p> |
| </div> |
| <div style="page-break-after: always;"></div> |
| <div class="paragraph"> |
| <p><strong>Acknowledgements</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Microsoft®, Windows®, Windows NT®, Windows® XP, and Windows Vista® are |
| U.S. registered trademarks of Microsoft Corporation. Intel® and Intel® |
| Itanium® are trademarks of Intel Corporation in the U.S. and other |
| countries. Java® is a registered trademark of Oracle and/or its |
| affiliates. Motif, OSF/1, UNIX®, X/Open®, and the X device is a |
| trademark of X/Open Company Ltd. in the UK and other countries.</p> |
| </div> |
| <div class="paragraph"> |
| <p>OSF, OSF/1, OSF/Motif, Motif, and Open Software Foundation are trademarks of |
| the Open Software Foundation in the U.S. and other countries. |
| © 1990, 1991, 1992, 1993 Open Software Foundation, Inc.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The OSF documentation and the OSF software to which it relates are derived in |
| part from materials supplied by the following: © 1987, 1988, 1989 |
| Carnegie-Mellon University. © 1989, 1990, 1991 Digital Equipment |
| Corporation. © 1985, 1988, 1989, 1990 Encore Computer Corporation. © 1988 Free |
| Software Foundation, Inc. © 1987, 1988, 1989, 1990, 1991 Hewlett-Packard |
| Company. © 1985, 1987, 1988, 1989, 1990, 1991, 1992 International |
| Business Machines Corporation. © 1988, 1989 Massachusetts Institute of |
| Technology. © 1988, 1989, 1990 Mentat Inc. © 1988 Microsoft Corporation. |
| © 1987, 1988, 1989, 1990, 1991, |
| 1992 SecureWare, Inc. © 1990, 1991 Siemens Nixdorf Informations systeme |
| AG. © 1986, 1989, 1996, 1997 Sun Microsystems, Inc. © 1989, 1990, 1991 |
| Transarc Corporation.</p> |
| </div> |
| <div class="paragraph"> |
| <p>OSF software and documentation are based in part |
| on the Fourth Berkeley Software Distribution under license from The |
| Regents of the University of California. OSF acknowledges the following |
| individuals and institutions for their role in its development: Kenneth |
| C.R.C. Arnold, Gregory S. Couch, Conrad C. Huang, Ed James, Symmetric |
| Computer Systems, Robert Elz. © 1980, 1981, 1982, 1983, 1985, 1986, |
| 1987, 1988, 1989 Regents of the University of California. OSF MAKES NO |
| WARRANTY OF ANY KIND WITH REGARD TO THE OSF MATERIAL PROVIDED HEREIN, |
| INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY |
| AND FITNESS FOR A PARTICULAR PURPOSE. OSF shall not be liable for errors |
| contained herein or for incidental consequential damages in connection |
| with the furnishing, performance, or use of this material.</p> |
| </div> |
| <div style="page-break-after: always;"></div> |
| <div class="paragraph"> |
| <p><strong>Revision History</strong></p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 50%;"> |
| <col style="width: 50%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Version</th> |
| <th class="tableblock halign-left valign-top">Date</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">2.2.0</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">TBD</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">2.1.0</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">May 1, 2017</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">2.0.1</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">July 7, 2016</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">2.0.0</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">June 6, 2016</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1.3.0</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">January, 2016</p></td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_about_this_document">1. About This Document</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>This guide describes how to load and transform data into a Trafodion database.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The information herein is complementary to the following Trafodion documentation:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html">Trafodion SQL Reference Manual</a></p> |
| </li> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a></p> |
| </li> |
| </ul> |
| </div> |
| <div class="sect2"> |
| <h3 id="_intended_audience">1.1. Intended Audience</h3> |
| <div class="paragraph"> |
| <p>This guide targets anyone wanting to load data into a Trafodion database.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You need to have skills in the following areas to make full use of the information in this guide:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>SQL DDL and DML.</p> |
| </li> |
| <li> |
| <p>Installation and configuration of Linux software.</p> |
| </li> |
| <li> |
| <p>Trafodion administration.</p> |
| </li> |
| <li> |
| <p>Depending on your data source, Java and/or Hadoop ecosystem usage.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_new_and_changed_information">1.2. New and Changed Information</h3> |
| <div class="paragraph"> |
| <p>This is a new guide.</p> |
| </div> |
| <div style="page-break-after: always;"></div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_notation_conventions">1.3. Notation Conventions</h3> |
| <div class="paragraph"> |
| <p>This list summarizes the notation conventions for syntax presentation in this manual.</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>UPPERCASE LETTERS</p> |
| <div class="paragraph"> |
| <p>Uppercase letters indicate keywords and reserved words. Type these items exactly as shown. Items not enclosed in brackets are required.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">SELECT</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>lowercase letters</p> |
| <div class="paragraph"> |
| <p>Lowercase letters, regardless of font, indicate variable items that you supply. Items not enclosed in brackets are required.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">file-name</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>[ ] Brackets</p> |
| <div class="paragraph"> |
| <p>Brackets enclose optional syntax items.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">DATETIME [start-field TO] end-field</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>A group of items enclosed in brackets is a list from which you can choose one item or none.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The items in the list can be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines.</p> |
| </div> |
| <div class="paragraph"> |
| <p>For example:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">DROP SCHEMA schema [CASCADE] |
| DROP SCHEMA schema [ CASCADE | RESTRICT ]</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>{ } Braces</p> |
| <div class="paragraph"> |
| <p>Braces enclose required syntax items.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">FROM { grantee [, grantee ] ... }</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>A group of items enclosed in braces is a list from which you are required to choose one item.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The items in the list can be arranged either vertically, with aligned braces on each side of the list, or horizontally, enclosed in a pair of braces and separated by vertical lines.</p> |
| </div> |
| <div class="paragraph"> |
| <p>For example:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">INTERVAL { start-field TO end-field } |
| { single-field } |
| INTERVAL { start-field TO end-field | single-field }</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>| Vertical Line</p> |
| <div class="paragraph"> |
| <p>A vertical line separates alternatives in a horizontal list that is enclosed in brackets or braces.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">{expression | NULL}</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>… Ellipsis</p> |
| <div class="paragraph"> |
| <p>An ellipsis immediately following a pair of brackets or braces indicates that you can repeat the enclosed sequence of syntax items any number of times.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">ATTRIBUTE[S] attribute [, attribute] ... |
| {, sql-expression } ...</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>An ellipsis immediately following a single syntax item indicates that you can repeat that syntax item any number of times.</p> |
| </div> |
| <div class="paragraph"> |
| <p>For example:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">expression-n ...</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Punctuation</p> |
| <div class="paragraph"> |
| <p>Parentheses, commas, semicolons, and other symbols not previously described must be typed as shown.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">DAY (datetime-expression) |
| @script-file</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Quotation marks around a symbol such as a bracket or brace indicate the symbol is a required character that you must type as shown.</p> |
| </div> |
| <div class="paragraph"> |
| <p>For example:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">"{" module-name [, module-name] ... "}"</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Item Spacing</p> |
| <div class="paragraph"> |
| <p>Spaces shown between items are required unless one of the items is a punctuation symbol such as a parenthesis or a comma.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">DAY (datetime-expression) DAY(datetime-expression)</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>If there is no space between two items, spaces are not permitted. In this example, no spaces are permitted between the period and any other items:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">myfile.sh</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Line Spacing</p> |
| <div class="paragraph"> |
| <p>If the syntax of a command is too long to fit on a single line, each continuation line is indented three spaces and is separated from the preceding line by a blank line.</p> |
| </div> |
| <div class="paragraph"> |
| <p>This spacing distinguishes items in a continuation line from items in a vertical list of selections.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">match-value [NOT] LIKE _pattern |
| [ESCAPE esc-char-expression]</code></pre> |
| </div> |
| </div> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_comments_encouraged">1.4. Comments Encouraged</h3> |
| <div class="paragraph"> |
| <p>We encourage your comments concerning this document. We are committed to providing documentation that meets your |
| needs. Send any errors found, suggestions for improvement, or compliments to <a href="mailto:user@trafodion.apache.org">user@trafodion.apache.org</a>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Include the document title and any comment, error found, or suggestion for improvement you have concerning this document.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="introduction">2. Introduction</h2> |
| <div class="sectionbody"> |
| <div class="sect2"> |
| <h3 id="introduction-load-methods">2.1. Load Methods</h3> |
| <div class="paragraph"> |
| <p>There are two methods used to load data into a Trafodion table. Both methods can run while the database is concurrently queried:</p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 20%;"> |
| <col style="width: 40%;"> |
| <col style="width: 40%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Type</th> |
| <th class="tableblock halign-left valign-top">Description</th> |
| <th class="tableblock halign-left valign-top">Methods/Tools</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Bulk Load</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Large data volumes<br> |
| Stage data and load in the batches</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Trafodion Bulk Loader</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Trickle Load</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Small data volumes<br> |
| Insert data as it arrives</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">ETL tool<br> |
| Custom ODBC/JDBC application<br> |
| User-Defined Functions<br> |
| odb Tool</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>These two methods use four types of SQL insert statements</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><strong>Bulk Load</strong></p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#load_statement">LOAD</a></p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| <li> |
| <p><strong>Trickle Load</strong></p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#insert_statement">INSERT</a></p> |
| </li> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT</a></p> |
| </li> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT USING LOAD</a></p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The <a href="http://trafodion.apache.org/docs/sql_reference/index.html">Trafodion SQL Reference Manual</a> |
| provides syntax descriptions for these statements.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The data source defines what type of load approach and method you use:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><strong>Bulk Load</strong> (LOAD statement)</p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><em>Text Files</em>: Map an external Hive table.</p> |
| </li> |
| <li> |
| <p><em>JDBC-Compliant Database</em>: Load into Hive on the Trafodion cluster using <code>sqoop</code>.</p> |
| </li> |
| <li> |
| <p><em>Hive Tables</em>: Direct load.</p> |
| </li> |
| <li> |
| <p><em>Native HBase Tables</em>: Direct load.</p> |
| </li> |
| <li> |
| <p><em>Disparate Data Source</em>: Write Java/C++ UDF to read data from source and pass rows to LOAD.</p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| </ul> |
| </div> |
| <div style="page-break-after: always;"></div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><strong>Trickle Load</strong> (odb utility)</p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><em>Text Files</em>: Direct access</p> |
| </li> |
| <li> |
| <p><em>pipes</em>: Via <code>stdin</code></p> |
| </li> |
| <li> |
| <p><em>ODBC-Compliant Database</em>: odb COPY command, no intermediate storage</p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>For more information, refer to:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="#bulk-load">Bulk Load</a></p> |
| </li> |
| <li> |
| <p><a href="#trickle-load">Trickle Load</a></p> |
| </li> |
| </ul> |
| </div> |
| <div style="page-break-after: always;"></div> |
| <div class="sect3"> |
| <h4 id="introduction-insert-types">2.1.1. Insert Types</h4> |
| <div class="paragraph"> |
| <p>The following insert types are supported in Trafodion:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><code>INSERT INTO T …</code></p> |
| </li> |
| <li> |
| <p><code>UPSERT INTO T …</code></p> |
| </li> |
| <li> |
| <p><code>UPSERT USING LOAD INTO T …</code></p> |
| </li> |
| <li> |
| <p><code>LOAD INTO T …</code></p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The following table compares the different insert types:</p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 20%;"> |
| <col style="width: 20%;"> |
| <col style="width: 20%;"> |
| <col style="width: 20%;"> |
| <col style="width: 20%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Characteristic</th> |
| <th class="tableblock halign-left valign-top">INSERT</th> |
| <th class="tableblock halign-left valign-top">UPSERT</th> |
| <th class="tableblock halign-left valign-top">UPSERT USING LOAD</th> |
| <th class="tableblock halign-left valign-top">LOAD</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Transaction</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Yes</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Yes</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">No, uses HBase WAL for recovery</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">No, uses snapshot for recovery</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Method of Operation</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Uses the standard HBase write path through its <code>CheckAndPut</code> call. Rows are held in transaction co-processor memory until the transaction is committed.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Uses the standard HBase write path through its <code>Put</code> call.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Uses the standard HBase write path through its <code>Put</code> call.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Uses the HBase bulk load write path and creates HFiles directly, bypassing HBase RegionServers for most of its operation.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Uniqueness Constraint</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Enforced</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Not enforced. New row with the same key value overwrites previous row.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Not enforced. New row with same key value overwrites the previous row.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Enforced only within the set of rows in a single statement. Not enforced with rows already in the table.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Index</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Can be used on a table with an index.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Can be used on a table with an index.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">When used on a table with an index, it reverts to UPSERT.</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Can be used on a table with an index. Index is off-line during the LOAD.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Max Size/Invocation</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">10,000 * n<sup>1</sup> rows</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">10,000 * n<sup>1</sup> rows</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">5 million * n<sup>1</sup> rows</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">2 billion * n<sup>1</sup> rows</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Min Size/Invocation</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1 row</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1 row</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">1 row</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Suitable for greater than 1 million * n<sup>1</sup> rows</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Speed</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Slowest</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Faster than INSERT</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Faster than UPSERT</p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Fastest</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p><sup>1</sup> <strong>n</strong> is the number of nodes in each invocation.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Throughput, max/min sizes depends on multiple factors:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Format of rows in Trafodion table (aligned format or not).</p> |
| </li> |
| <li> |
| <p>Length of row.</p> |
| </li> |
| <li> |
| <p>Number of columns in row.</p> |
| </li> |
| <li> |
| <p>Data type of columns.</p> |
| </li> |
| <li> |
| <p>Network between nodes in cluster.</p> |
| </li> |
| <li> |
| <p>WAL setting.</p> |
| </li> |
| <li> |
| <p>Number of clients.</p> |
| </li> |
| <li> |
| <p>Use of rowsets.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_unload">2.2. Unload</h3> |
| <div class="paragraph"> |
| <p>The Trafodion UNLOAD statement exports data from Trafodion tables into an HDFS directory. Refer to <a href="#bulk-unload">Bulk Unload</a> for more information.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="tables-indexes">3. Tables and Indexes</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The following guidance helps you set up your tables and indexes for better load performance.</p> |
| </div> |
| <div class="sect2"> |
| <h3 id="choose-primary-key">3.1. Choose Primary Key</h3> |
| <div class="paragraph"> |
| <p>The primary key for a Trafodion table must be chosen based on the workload that accesses the table.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Keyed access to Trafodion tables is very efficient since HBase is a key-value store. You need to analyze the queries |
| that are used to access the tables to understand their predicates and join conditions. Once identified, you can |
| choose a primary key that ensures that the leading key columns have highly selective predicates applied to them.</p> |
| </div> |
| <div class="paragraph"> |
| <p>This technique limits the number of rows that need to scanned in the HBase. Trafodion uses MDAM (Multi Dimensional Access Method) to limit |
| the rows scanned when predicates are present to only trailing key columns and not the leading key column. MDAM works best when the |
| unique entry count of leading key columns (on which predicates are absent) is low.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="salting">3.2. Salting</h3> |
| <div class="paragraph"> |
| <p>With range partitioned data in some workloads, certain key ranges of data may see more access than other key ranges. This can lead to an |
| unbalanced usage pattern with some HBase RegionServers handling most of the load. This behavior is referred to as "hot-spotting."</p> |
| </div> |
| <div class="paragraph"> |
| <p>With Native HBase tables, hot-spotting is often addressed by designing appropriate keys. In Trafodion, once you choose the key to a table, as |
| discussed in <a href="#choose-primary-key">Choose Primary Key</a>, you can use <strong>salting</strong> to distribute the data evenly. Salting applies a |
| hash function to the salt keys and distributes data to partitions based on this hash value. The hash value is physically stored in the |
| table as the leading key value. Each split of the table will have only one salt key value.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The salting key can be any subset (including the whole set) of the primary key. It is a good practice to keep the salting key as small |
| as possible. The key should provide an even distribution of data, which can be achieved when the key values have a large unique entry |
| count and no significant skew.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The number of partitions must also be specified during table creation. You choose the number of partition depending on the size of the |
| cluster and the expected size of the table. A salted table can split if more data is added to it than initially estimated. If this |
| happens, then more than one partition having rows with the same salt value, which may result in suboptimal execution plans for the table.</p> |
| </div> |
| <div style="page-break-after: always;"></div> |
| <div class="paragraph"> |
| <p>You can also choose not to salt Trafodion tables. This is similar to range partitioning in a traditional database. The number of partitions |
| grows with the size of the table, and range boundaries are determined by HBase based on the specified split policy.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="compression-encoding">3.3. Compression and Encoding</h3> |
| <div class="paragraph"> |
| <p>Large Trafodion tables must be encoded and compressed. Trafodion tables that have a large key or several columns grow in size to 10X or more |
| when compared to a Hive table with equivalent data since HBase stores the key separately for every column in a row.</p> |
| </div> |
| <div class="paragraph"> |
| <p>HBase provides several types of encoding to avoid storing the same key value to disk for every column in the row. HBase also supports various |
| types of compression of the entire data block, regardless whether it is encoded or not. |
| See <a href="http://hbase.apache.org/book.html#compression">Appendix E: Compression and Data Block Encoding In HBase</a> in the |
| <a href="http://hbase.apache.org/book.html">Apache HBase Reference Guide</a> for a comparison of various compression and encoding algorithms. Use the |
| information in the <a href="http://hbase.apache.org/book.html#data.block.encoding.types">Which Compressor or Data Block Encoder To Use</a> section to |
| determine the best compression technique for your tables. |
| <<<</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="create-trafodion-tables-and-indexes">3.4. Create Tables and Indexes</h3> |
| <div class="paragraph"> |
| <p>Create Trafodion tables using the CREATE TABLE statements with the <code>SALT USING <num> PARTITIONS</code> clause for salting and |
| the <code>HBASE_OPTIONS</code> clause for compression and encoding.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CREATE TABLE trafodion.sch.demo |
| ( demo_sk INT NOT NULL |
| , name VARCHAR(100) |
| , PRIMARY KEY (demo_sk) |
| ) |
| HBASE_OPTIONS |
| ( DATA_BLOCK_ENCODING = 'FAST_DIFF' |
| , COMPRESSION = 'SNAPPY' |
| , MEMSTORE_FLUSH_SIZE = '1073741824' |
| ) |
| SALT USING 8 PARTITIONS ON (demo_sk);</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>ANY indexes on the table may be salted or not. However, if they are salted, their salting key and number of partitions must be the same as the table.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CREATE INDEX demo_ix ON sch.demo(name) |
| HBASE_OPTIONS |
| ( DATA_BLOCK_ENCODING = 'FAST_DIFF' |
| , COMPRESSION = 'GZ' |
| ) |
| SALT LIKE TABLE;</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_update_statistics">3.5. Update Statistics</h3> |
| <div class="paragraph"> |
| <p>To generate good plans that allow queries to execute quickly and use resources wisely, the Trafodion Optimizer must have a good idea about how the |
| values of columns are distributed, the number of distinct values, and so on. Trafodion supplies this information to the optimizer in the |
| form of histograms generated by executing the UPDATE STATISTICS statement. See the |
| <a href="http://trafodion.apache.org/docs/sql_reference/index.html#update_statistics_statement">Trafodion SQL Reference Manual</a> for a full |
| description of this statement.</p> |
| </div> |
| <div class="sect3"> |
| <h4 id="_default_sampling">3.5.1. Default Sampling</h4> |
| <div class="paragraph"> |
| <p>While accurate statistics are important, the time required to generate them by reading every row in the table may be prohibitive and is |
| usually unnecessary. Random sampling of the rows of the table can give adequate results in a fraction of the time required to read all |
| the values in the table. For most situations, the best option is to simply specify SAMPLE at the end of the UPDATE STATISTICS statement, |
| which will use the default sampling protocol. For example, to use default sampling in the construction of histograms for each column of |
| table T1, you would execute the following statement:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t1 ON EVERY COLUMN SAMPLE;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>This default sampling protocol uses a high sampling rate for small tables, reducing the rate with a steep gradient until hitting 1% and |
| capping the sample size at one million rows. The specific details of default sampling are as follows:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Use the full table for tables up to 10,000 rows.</p> |
| </li> |
| <li> |
| <p>For table sizes from 10,000 up to a million rows, 10,000 rows are randomly sampled. In effect, this causes the sampling rate to decline |
| from 100% to 1% as a function of increasing table size.</p> |
| </li> |
| <li> |
| <p>For tables with one million to 100 million rows, use a 1% random sample.</p> |
| </li> |
| <li> |
| <p>For tables exceeding 100 million rows, the sampling rate is calculated as 1 million divided by the number of rows in the table. |
| This limits the overall sample size to 1 million rows while ensuring uniform random sampling across the entire table.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_generate_single_column_and_multi_column_histograms_from_one_statement">3.6. Generate Single-Column and Multi-Column Histograms From One Statement</h3> |
| <div class="paragraph"> |
| <p>If you use the ON EVERY COLUMN syntax in an UPDATE STATISTICS statement, then it is important to realize that multi-column histograms can be |
| requested in the same statement. For example, if you wanted to generate a histogram for each single column of table T1, as well as |
| multi-column histograms for column sets (c1, c2) and (c5, c6, c7), then you could use the following statement:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t1 ON EVERY COLUMN, (c1,c2), (c5,c6,c7) SAMPLE;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>In terms of the end result, this is equivalent to the following pair of statements:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t1 ON EVERY COLUMN SAMPLE; |
| UPDATE STATISTICS FOR TABLE t1 ON (c1, c2), (c5, c6, c7) SAMPLE;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>However, the performance is superior when they are combined into a single statement because a multi-column histogram depends |
| on the single-column histograms of its component columns. Therefore, separating the generation of single-column and multi-column histograms |
| for a table into two statements leads to redundantly calculating some of the single-column histograms. Even though the |
| relevant single-column histograms already exist, they are recomputed at the time the multi-column histograms are generated.</p> |
| </div> |
| <div class="sect3"> |
| <h4 id="_enable_update_statistics_automation">3.6.1. Enable Update Statistics Automation</h4> |
| <div class="paragraph"> |
| <p>If a standard set of queries is run on a regular basis, then one way to generate only those histograms that are needed for efficient execution |
| of those queries is to enable update statistics automation, and then PREPARE each of the queries:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CONTROL QUERY DEFAULT USTAT_AUTOMATION_INTERVAL '1440'; |
| PREPARE s FROM SELECT...;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The value of the CQD USTAT_AUTOMATION_INTERVAL is intended to determine the automation interval (in minutes) for update statistics |
| automation. The PREPARE statement causes the Trafodion Compiler to compile and optimize a query without executing it. In the process |
| of doing so with automation enabled, any histograms needed by the optimizer that are missing causes those columns to be marked |
| as needing histograms. Then, the following UPDATE STATISTICS statement can be run against each table to generate the needed histograms:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE <table-name> ON NECESSARY COLUMNS SAMPLE;</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="_regenerate_histograms">3.6.2. Regenerate Histograms</h4> |
| <div class="paragraph"> |
| <p>Histograms can become "stale" as the underlying data changes and possibly reflects a different distribution of values, although |
| it is possible that data turnover or accumulation can be high while maintaining the same distribution. To ensure that statistics |
| remain accurate, you should regenerate histograms for a table once significant changes have been made to that table since its |
| histograms were last generated. To refresh existing histograms without adding new ones, use the following statement:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE <table-name> ON EXISTING COLUMNS SAMPLE;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The critical set of histograms that were previously generated with the ON NECESSARY COLUMNS syntax can be periodically regenerated |
| using ON EXISTING COLUMNS. Note that using ON NECESSARY COLUMNS will only identify those columns that have been previously |
| requested by the optimizer but do not exist. The current implementation of automation does not know which existing histograms might be stale.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="bulk-load">4. Bulk Load</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The LOAD statement enables batch loading large volumes of data efficiently in a scalable manner.</p> |
| </div> |
| <div class="paragraph"> |
| <p>See the <a href="http://trafodion.apache.org/docs/sql_reference/index.html#load_statement">Trafodion SQL Reference Manual</a> |
| for a full description of this SQL statement.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You can bulk-load data using one of the following methods:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="#bulk-load-data-from-trafodion-tables">Load Data From Trafodion Tables</a></p> |
| </li> |
| <li> |
| <p><a href="#bulk-load-data-from-hdfs-files">Load Data From HDFS Files</a></p> |
| </li> |
| <li> |
| <p><a href="#bulk-load-data-from-hive-tables">Load Data From Hive Tables</a></p> |
| </li> |
| <li> |
| <p><a href="#bulk-load-data-from-external-databases">Load Data From External Databases</a></p> |
| </li> |
| </ul> |
| </div> |
| <div class="sect2"> |
| <h3 id="bulk-load-from-trafodion-tables">4.1. Load Data From Trafodion Tables</h3> |
| <div class="paragraph"> |
| <p>You copy data between two Trafodion tables by using the appropriate SELECT statement in the LOAD command.</p> |
| </div> |
| <div class="sect3"> |
| <h4 id="_example">4.1.1. Example</h4> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">LOAD INTO target_table SELECT * FROM source_table WHERE custkey >= 1000 ;</code></pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="bulk-load-data-from-hdfs-files">4.2. Load Data From HDFS Files</h3> |
| <div class="paragraph"> |
| <p>You copy your data (local or remote) into an HDFS folder. Then, you create an external Hive table (with correct fields) that points |
| to the HDFS folder containing the data. You may also specify a WHERE clause on the source data as a filter, if needed. |
| See the <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExternalTables">External Tables</a> |
| page on the <a href="https://cwiki.apache.org/confluence/display/Hive">Hive Wiki</a> for more information.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Trafodion can access columns in Hive tables having integer, string and char types. |
| See the <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types">LanguageManual Types</a> |
| page on the <a href="https://cwiki.apache.org/confluence/display/Hive">Hive Wiki</a> for the data types available in Hive.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Overall, you do the following:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p>Export the data on the local or remote cluster.</p> |
| </li> |
| <li> |
| <p>If applicable, transport files to Trafodion cluster via FTP, scp, or some other method.</p> |
| </li> |
| <li> |
| <p>Use LOAD referencing HIVE external tables.</p> |
| </li> |
| </ol> |
| </div> |
| <div class="sect3"> |
| <h4 id="_example_2">4.2.1. Example</h4> |
| <div class="paragraph"> |
| <p>You have a customer-demographics in a text file, which you need to load into Trafodion. |
| The columns are separated by <code>|</code>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Do the following:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p>Using trafci, define the Trafodion table where you want to load the data.</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CREATE TABLE customer_demographics_salt |
| ( |
| cd_demo_sk INT NOT NULL |
| , cd_gender CHAR(1) |
| , cd_marital_status CHAR(1) |
| , cd_education_status CHAR(20) |
| , cd_purchase_estimate INT |
| , cd_credit_rating CHAR(10) |
| , cd_dep_count INT |
| , cd_dep_employed_count INT |
| , cd_dep_college_count INT |
| , PRIMARY KEY (cd_demo_sk) |
| ) |
| SALT USING 4 PARTITIONS ON (cd_demo_sk) |
| ;</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Copy the data into HDFS</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">hadoop fs -copyFromLocal $HOME/data/customer_demographics /hive/tpcds/customer_demographics</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Using the Hive shell, create an external Hive table:</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CREATE EXTERNAL TABLE customer_demographics |
| ( |
| cd_demo_sk INT |
| , cd_gender STRING |
| , cd_marital_status STRING |
| , cd_education_status STRING |
| , cd_purchase_estimate INT |
| , cd_credit_rating STRING |
| , cd_dep_count INT |
| , cd_dep_employed_count INT |
| , cd_dep_college_count INT |
| ) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY '|' |
| LOCATION '/hive/tpcds/customer_demographics' |
| ;</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Using trafci, load the Trafodion <code>customer_demographics_salt</code> table from the Hive table named |
| <code>hive.hive.customer_demographics</code>:</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">>>LOAD INTO customer_demographics_salt |
| +>SELECT * FROM hive.hive.customer_demographics WHERE cd_demo_sk <= 5000; |
| Task: LOAD Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT |
| Task: DISABLE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT |
| Task: DISABLE INDEX Status: Ended Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT |
| Task: PREPARATION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT |
| Rows Processed: 5000 |
| Task: PREPARATION Status: Ended ET: 00:00:03.199 |
| Task: COMPLETION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT |
| Task: COMPLETION Status: Ended ET: 00:00:00.331 |
| Task: POPULATE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT |
| Task: POPULATE INDEX Status: Ended ET: 00:00:05.262</code></pre> |
| </div> |
| </div> |
| </li> |
| </ol> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="bulk-load-data-from-hive-tables">4.3. Load Data From Hive Tables</h3> |
| <div class="paragraph"> |
| <p>You can import data from Hive using the trafci or sqlci command interface. Do the following:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p>Set these required Control Query Defaults (CQDs) to improve load performance:</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CQD HIVE_MAX_STRING_LENGTH '1000'; -- if the widest column is 1KB</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>This setting is required if there are time-related column types in the target Trafodion table.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'on';</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Issue the LOAD statement to load data into Trafodion tables from Hive. For example:</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">LOAD WITH NO POPULATE INDEXES INTO trafodion.sch.demo SELECT * FROM hive.hive.demo;</code></pre> |
| </div> |
| </div> |
| </li> |
| </ol> |
| </div> |
| <div class="paragraph"> |
| <p>See the <a href="http://trafodion.apache.org/docs/sql_reference/index.html#load_statement">Trafodion SQL Reference Manual</a>. |
| for the complete syntax of the LOAD statement.</p> |
| </div> |
| <div class="paragraph"> |
| <p>If you use multiple LOAD statements to incrementally load sets of data into a single target table, then several HFiles are created |
| for each partition of the target table. This causes inefficient access during SELECT queries and may also cause a compaction |
| to be triggered based on the policies configured in the HBase settings.</p> |
| </div> |
| <div class="paragraph"> |
| <p>To avoid this issue, it is good practice to perform a major compaction on a table that has been the target of more than two LOAD statements |
| in a short period of time. To perform compaction, use this <code>hbase shell</code> command:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">major_compact 'TRAFODION.SCH.DEMO'</code></pre> |
| </div> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| The <code>major_compact</code> command returns immediately since it’s not waited. Typically, compaction of a large table takes a long time |
| (several minutes to hours) to complete. You can monitor the progress of compaction from the HBase Master Web user interface. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3"> |
| <h4 id="_example_3">4.3.1. Example</h4> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">>> CQD HIVE_MAX_STRING_LENGTH '1000' ; |
| >> CQD ALLOW_INCOMPATIBLE_ASSIGNMENT 'on' ; |
| >> LOAD WITH NO POPULATE INDEXES INTO trafodion.sch.demo SELECT * FROM hive.hive.demo ;</code></pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="bulk-load-data-from-external-databases">4.4. Load Data From External Databases</h3> |
| <div class="paragraph"> |
| <p>You need to import data into Hive when loading data from external databases. |
| Use <a href="http://sqoop.apache.org/">Apache Sqoop</a>, an open-source tools to move the data from the external database |
| into Hive tables on the Trafodion cluster.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Source data can be in the following formats:</p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 40%;"> |
| <col style="width: 60%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Format</th> |
| <th class="tableblock halign-left valign-top">Examples</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Structured</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Relational databases such as Oracle or MySQL.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Semi-Structured</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Cassandra or HBase</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Unstructured</strong></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">HDFS</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>You use the Sqoop command-line shell for interactive commands and basic scripting.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Sqoop basics:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Generic JDBC Connector: supports JDBC T-4 Driver.</p> |
| </li> |
| <li> |
| <p>Configuration Language for FROM/TO jobs that specify in SQL terms.</p> |
| </li> |
| <li> |
| <p>Partitioner: Divide/parallelize the data streams; uses primary key by default.</p> |
| </li> |
| <li> |
| <p>Extractor: Uses FROM configuration for SQL statements, plus partitioner information to query data subsets.</p> |
| </li> |
| <li> |
| <p>Loader: Uses TO job configuration; INSERT INTO could be generated from col list or explicitly specified.</p> |
| </li> |
| <li> |
| <p>Destroyer: Copies staging table to final table and deletes staging table.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>See the <a href="http://sqoop.apache.org/docs/1.99.6/Sqoop5MinutesDemo.html">Sqoop 5 Minutes Demo</a> for a quick |
| introduction to Sqoop.</p> |
| </div> |
| <div class="sect3"> |
| <h4 id="_install_required_software">4.4.1. Install Required Software</h4> |
| <div class="paragraph"> |
| <p>By default, Sqoop is not installed on Trafodion clusters. Do the following:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Install and start Sqoop on the Trafodion cluster using either the Ambari or Cloudera Manager GUI. |
| See the <a href="http://sqoop.apache.org/docs/1.99.3/Installation.html">Sqoop installation instructions</a>.</p> |
| </li> |
| <li> |
| <p>Install <a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html">JDK 1.8</a></p> |
| </li> |
| <li> |
| <p>Install the <a href="http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html">Oracle JDBC driver</a></p> |
| </li> |
| <li> |
| <p>Set the following environment variables:</p> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">export JAVA_HOME=/opt/java/jdk1.8.0_11 |
| export JAVA_OPTIONS=-Dmapred.child.java.opts=\-Djava.security.egd=file:/dev/urandom+</code></pre> |
| </div> |
| </div> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="_sample_sqoop_commands">4.4.2. Sample Sqoop Commands</h4> |
| <div class="sect4"> |
| <h5 id="_list_all_oracle_tables">List All Oracle Tables</h5> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">sqoop list-tables --driver oracle.jdbc.OracleDriver |
| --connect jdbc:oracle:thin:@<Oracle host name>:<port>/<database> |
| --username <user-name> --password <password></code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect4"> |
| <h5 id="_import_data_to_hive">Import Data to Hive</h5> |
| <div class="paragraph"> |
| <p><strong>Syntax</strong></p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">sqoop import --connect jdbc:oracle:thin:@<Oracle host name:port>/<database> |
| --username <user-name> --password <password> --table <tablename> |
| --split-by <column-name> --hive-import --create-hive-table |
| --hive-table <hive-table-name> --hive-overwrite --null-string '' |
| --null-non-string '' --hive-drop-import-delims--verbose</code></pre> |
| </div> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 40%;"> |
| <col style="width: 60%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Parameter</th> |
| <th class="tableblock halign-left valign-top">Guidelines</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>--split-by <column-name></code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">By default, if not specified, sqoop uses the primary key column as a splitting column, which is not optimal most of the time. |
| If the table does not contain a primary key, then you must manually specify the splitting column.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>--null-string <null-string></code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">This is the string to be written for a null value in a string column.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>--null-non-string <null-string></code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">This is the string to be written for a null value in a non-string column.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>--hive-drop-import-delims</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">This drops <code>\n</code>, <code>\r</code>, and <code>\01</code> string fields when importing to Hive.<br> |
| <br> |
| <strong>NOTE</strong>: If the data contains \n or \r and if you do not use the hive-drop-import-delims option, then data is truncated. |
| You need to use additional Sqoop options during migration by specifying the delimiter that you would like to use, |
| which does not exist in the data itself.</p></td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="_example_4">4.4.3. Example</h4> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">sqoop import --connect jdbc:oracle:thin:@localhost:1521/orcl |
| --username trafdemo --password traf123 --table CUSTOMER |
| --split-by CUSTNUM --hive-import --create-hive-table |
| --hive-table customers --hive-overwrite --null-string '' |
| --null-non-string '' --hive-drop-import-delims--verbose</code></pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="trickle-load">5. Trickle Load</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Trafodion Trickle Load allows data to be committed in batches, with sizes ranging from 1 row to a several |
| thousand rows in each commit. Trickle Load uses the following SQL statements (defined in the |
| <a href="http://trafodion.apache.org/docs/sql_reference/index.html">Trafodion SQL Reference Manual</a>:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#insert_statement">INSERT</a></p> |
| </li> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT</a></p> |
| </li> |
| <li> |
| <p><a href="http://trafodion.apache.org/docs/sql_reference/index.html#upsert_statement">UPSERT USING LOAD</a></p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>Contrary to <a href="#bulk-load">Bulk Load</a>, committed rows are immediately visible from other transactions |
| thereby leading to minimal latency in making newly ingested rows visible to applications and end users.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You use Trickle Load in the following situations:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Inserting and/or updating data on an ongoing basis. Typically, you create a custom JDBC or ODBC |
| application for this approach.</p> |
| </li> |
| <li> |
| <p>You want to migrate a smaller amount of data (a few millions rows). Typically, you use JDBC- or |
| ODBC-based ETL tools for this approach; for example:</p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="#trickle-load-odb">Trafodion odb<sup>1</sup></a></p> |
| </li> |
| <li> |
| <p><a href="http://squirrel-sql.sourceforge.net">SQuirrel-SQL</a></p> |
| </li> |
| <li> |
| <p><a href="http://www.pentaho.com/">Pentaho</a></p> |
| </li> |
| <li> |
| <p><a href="http://www.informatica.com/us/">Informatica</a>.</p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p><sup>1</sup> Trafodion obd typically achieves better load throughput than third-party ETL tools.</p> |
| </div> |
| <div class="sect2"> |
| <h3 id="trickle-load-improving-throughput">5.1. Improving Throughput</h3> |
| <div class="paragraph"> |
| <p>Trickle Load uses the HBase write path, with every row being written to the WAL (Write-Ahead Log) and HBase MemStore. |
| When memstore is full data is flushed to HStorefile in background.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Throughput can be improved by use of:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Rowsets or Batch Updates.</p> |
| </li> |
| <li> |
| <p>UPSERT instead of INSERT statements, if applicable.</p> |
| </li> |
| <li> |
| <p>Multiple simultaneous connections.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>In addition, when using INSERT or UPSERT USING LOAD with the objective of maximizing data ingestion throughput, |
| increasing the HBase table attribute MEMSTORE_FLUSHSIZE from its default value helps.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The actual value you use depends on the heap size allocated to each Region Server, the concurrent query workload, and the |
| number of tables for which simultaneous fast data ingestion is needed. With a heap size of 31 GB for each Region Server in |
| an environment with heavy concurrent query workload, setting this attribute 1 GB gives good performance.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You can specify this attribute in the HBASE_OPTIONS clause when creating the table. Alternatively, you can also set it from |
| the <code>hbase shell</code> through an <code>ALTER 'TRAFODION.<schema-name>.<table-name>', MEMSTORE_FLUSHSIZE >= '1073741824'</code> command.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="trickle-load-odb">5.2. odb</h3> |
| <div class="paragraph"> |
| <p>odb is a Linux and Windows Trafodion client that is:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>ODBC based</p> |
| </li> |
| <li> |
| <p>Database agnostic query driver</p> |
| </li> |
| <li> |
| <p>Query interpreter</p> |
| </li> |
| <li> |
| <p>Loader and extractor</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>odb may be installed on:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>The Trafodion cluster.</p> |
| </li> |
| <li> |
| <p>The machine that contains source data</p> |
| </li> |
| <li> |
| <p>An intermediate machine that is being used for data loading.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>Source data can be in any database (local or remote) that supports ODBC or in flat files local to the machine hosting |
| the odb tool.</p> |
| </div> |
| <div class="paragraph"> |
| <p>odd uses threads to achieve parallelism, rowsets to improve throughput. You can specify INSERT, UPSERT or UPSERT USING LOAD |
| insert types.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| odb does not use the bulk load command LOAD, and, therefore, throughput when using odb may be lower than what can be achieved |
| with the bulk loader. However, when using the odb tool, source data need not be moved to the Trafodion cluster in a separate step. |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="paragraph"> |
| <p>odb allows you to access Hadoop data using one of the following methods:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p><strong>Use Hive and its ODBC Driver</strong>: odb can access HIVE like any other relational database. |
| For example, you can copy to from HIVE and other databases using odb’s copy option.</p> |
| </li> |
| <li> |
| <p><strong>Add the hdfs.</strong>` prefix to the input or output file during loads/extracts*: The file is read/written |
| from/to Hadoop. odb interacts directly with the HDFS file system using <strong>libhdfs</strong>.</p> |
| <div class="paragraph"> |
| <p>This option is currently available only under Linux.</p> |
| </div> |
| </li> |
| </ol> |
| </div> |
| <div class="paragraph"> |
| <p>The following odb commands/features are discussed in this guide:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><a href="#trickle-load-odb-load">odb Load</a></p> |
| </li> |
| <li> |
| <p><a href="#trickle-load-odb-copy">odb Copy</a></p> |
| </li> |
| <li> |
| <p><a href="#trickle-load-odb-extract">odb Extract</a></p> |
| </li> |
| <li> |
| <p><a href="#trickle-load-odb-transform">odb Transform</a></p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>See the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a> |
| for installation instructions and usage syntax for the odb tool.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The following subsections assume that you’ve installed odb.</p> |
| </div> |
| <div class="sect3"> |
| <h4 id="trickle-load-odb-throughput">5.2.1. odb Throughput</h4> |
| <div class="paragraph"> |
| <p>You achieve the best throughput with odb if using the UPSERT USING LOAD option.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The default insert type used by odb is INSERT; to use UPSERT USING LOAD, please specify <code>:loadcmd=UL</code> in odb’s <code>load</code> or |
| <code>copy</code> command.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Copy the table <code>mytable</code> from <code><source_catalog>.<source_schema></code> on the source database to <code>trafodion.my |
| schema.mytable</code> |
| on Trafodion.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">odb64luo -u <src_username>:<tgt_username> -p <src_pswd>:<tgt_pswd> |
| -d <src_dsn>:<tgt_dsn> |
| -cp src:<source_catalog>.<source_schema>.mytable tgt:trafodion.myschema.mytable |
| :splitby=<col-name>:parallel=4:loadcmd=UL</code></pre> |
| </div> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 25%;"> |
| <col style="width: 75%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Option</th> |
| <th class="tableblock halign-left valign-top">Defines</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>src_username</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">User name for the source database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>src_pswd</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Password for the source database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>src_dsn</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">ODBC DSN for the source database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt_username</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">User name for the Trafodion database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt_pswd</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Password for the Trafodion database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt_dsn</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">ODBC DSN for the Trafodion database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>splitby</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Defines the column used to evenly distributed values for parallelism. Consider using a leading key column.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>parallel=4</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Use four connections to extract data from the source database and another four connections to write data to the target Trafodion database.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>loadcmd=UL</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Use UPSERT USING LOAD syntax to write data.</p></td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <div class="sect3"> |
| <h4 id="trickle-load-odb-load">5.2.2. odb Load</h4> |
| <div class="paragraph"> |
| <p>Refer to the <a href="http://trafodion.apache.org/docs/odb/index.html#_load_files">Load Files</a> section |
| in the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a> for complete |
| documentation of this option.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You use the <code>-l</code> option to load into a table from:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>File or standard input (pipe)</p> |
| </li> |
| <li> |
| <p>gzip compressed files with no external dependencies</p> |
| </li> |
| <li> |
| <p>HDFS</p> |
| </li> |
| <li> |
| <p>Load XML files</p> |
| </li> |
| <li> |
| <p>Delimited and fixed format files</p> |
| </li> |
| <li> |
| <p>"Binary" files (example images)</p> |
| </li> |
| <li> |
| <p>Generic SQL scripts before/after loads</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The <code>-l</code> option provides:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Data generation (constant, sequences, random values, from external datasets)</p> |
| </li> |
| <li> |
| <p>Configurable rowsets</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>You can load single tables or list of tables in the same session using single/parallel threads. |
| Limited "ETL like" functionalities are provided; for example: |
| SUBSTR, TRANSLITERATION, TRUNCATE target, DATE/TIME format conversion, and TOUPPER.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Important Options</strong></p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 25%;"> |
| <col style="width: 75%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Option</th> |
| <th class="tableblock halign-left valign-top">Defines</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>src</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Source file. If empty, then odb generates sample data.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>fs</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Field separator.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Target table, required.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>map</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Map file. A text file describing which input column is mapped to which target table column. See |
| <a href="#trickle-load-odb-transform">odb Transform</a> below.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>rows</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Rowset size to be used.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>parallel</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Number of connections/threads to be used.`</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>loadcmd</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>IN</code>, <code>UP</code> or <code>UL</code>. INSERT, UPSERT or UPSERT USING LOAD. Use UL for best throughput.</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u user -p xx -d dsn -l src=customer.tbl:tgt=TRAFODION.MAURIZIO.CUSTOMER \ |
| :fs=\|:rows=1000:loadcmd=UL:truncate:parallel=4</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>This command:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Loads the file named <code>customer.tbl</code> (<code>src=customer.tbl</code>)</p> |
| </li> |
| <li> |
| <p>in the table <code>TRAFODION.MAURIZIO.CUSTOMER</code> (<code>tgt=TRAFODION.MAURIZIO.CUSTOMER</code>)</p> |
| </li> |
| <li> |
| <p>using <code>|</code> (vertical bar) as a field separator (<code>fs=\|</code>)</p> |
| </li> |
| <li> |
| <p>using <code>1000 rows</code> as row-set buffer (<code>rows=1000</code>)</p> |
| </li> |
| <li> |
| <p>using UPSERT USING LOAD syntax to achieve better throughput</p> |
| </li> |
| <li> |
| <p>truncating the target table before loading (<code>truncate</code>)</p> |
| </li> |
| <li> |
| <p>using <code>4 parallel threads</code> to load the target table (<code>parallel=4</code>)</p> |
| </li> |
| </ul> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">./odb64luo -u xx -p yy -d traf_sqws125 -l src=myfile:fs=|:tgt=TRAFODION.SEABASE.REGION:map=region.map:max=10000:rows=500:parallel=2:loadcmd=UL</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>You can load multiple files using different <code>-l</code> options. By default odb creates as many threads (and ODBC connections) as the sum |
| of parallel load threads.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Truncates and load the CUSTOMER, ORDERS and LINEITEM tables in parallel.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">odb64luo -u user -p xx -d dsn -T 5 \ |
| -l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.CUSTOMER:fs=\ |
| |:rows=m2:truncate:norb:parallel=4 \ |
| -l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.ORDERS:fs=\ |
| |:rows=1000:truncate:norb:parallel=4 \ |
| -l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.LINEITEM:fs=\ |
| |:rows=m10:truncate:norb:parallel=4</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="trickle-load-odb-copy">5.2.3. odb Copy</h4> |
| <div class="paragraph"> |
| <p>Refer to the |
| <a href="http://trafodion.apache.org/docs/odb/index.html#_copy_tables_from_one_database_to_another">Copy Tables From One Database to Another</a> |
| section in the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a> for complete documentation of this option.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Use the <code>-cp</code> option to copy tables <strong>directly</strong> from one data-source to another using ODBC (for example, from Trafodion to Teradata |
| or vice-versa):</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Single/Multiple table(s) copy from one database to another</p> |
| </li> |
| <li> |
| <p>Data never lands to disk (ODBC buffers moved from source to target)</p> |
| </li> |
| <li> |
| <p>Multi-threaded copy: single/multiple tables in parallel using single/multiple "data streams"/table</p> |
| </li> |
| <li> |
| <p>Each "data stream" consists of one "extractor" and one or more "loaders"</p> |
| </li> |
| <li> |
| <p>Table subsets copy (columns and/or rows)</p> |
| </li> |
| <li> |
| <p>No data conversion required</p> |
| </li> |
| <li> |
| <p>Other functionalities: sequence creation, limit text col length, max rows to copy, . . .</p> |
| </li> |
| <li> |
| <p>Each data stream is "multiple buffered" with loaders and extractors working in parallel (no need to extract before loading).</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The target table has to be be created in advance and should have a compatible structure.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Important Options</strong></p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 25%;"> |
| <col style="width: 75%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Option</th> |
| <th class="tableblock halign-left valign-top">Defines</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>src</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Source file. If empty, then odb generates sample data.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>fs</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Field separator.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Target table, required.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>parallel</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Number of connections/threads to be used.`</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>splitby</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Source column to parallelize copy operation on.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>pwhere</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>where</code> condition on source</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>loadcmd</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>IN</code>, <code>UP</code> or <code>UL</code>. INSERT, UPSERT or UPSERT USING LOAD. Use UL for best throughput.</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>When copying data from one data source to another, odb needs user/password/dsn for both source and target system. |
| User credentials and DSN for the target system are specified this way:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u src_user:tgt_user -p src_pwd:tgt:pwd -d src_dsn:tgt_dsn ... -cp src=...:tgt=...</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>You can use odb to copy a list of tables from one database to another.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ cat tlist.txt |
| # List of tables to extract |
| src=TRAFODION.MAURIZIO.ORDERS |
| src=TRAFODION.MAURIZIO.CUSTOMER |
| src=TRAFODION.MAURIZIO.PART |
| src=TRAFODION.MAURIZIO.LINEITEM</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>You can extract all these tables by running:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u user1:user2 -p xx:yy -d dsn1:dsn2 \ |
| -cp src=-tlist.txt:tgt=tpch.stg_%t:rows=m2:truncate:parallel=4</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Please note the <code>src=-tlist.txt</code>. This command copies:</p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 50%;"> |
| <col style="width: 50%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Source</th> |
| <th class="tableblock halign-left valign-top">Target</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>TRAFODION.MAURIZIO.ORDERS</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tpch.stg_orders</code></p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>TRAFODION.MAURIZIO.CUSTOMER</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tpch.stg_customer</code></p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>TRAFODION.MAURIZIO.PART</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tpch.stg_part</code></p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>TRAFODION.MAURIZIO.LINEITEM</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tpch.stg_lineitem</code></p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p>Optionally, you can define any other command-line options in the input file.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Using different <em>splitby columns</em>.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ cat tlist2.txt |
| # List of tables to extract and their "splitby columns" |
| src=TRAFODION.MAURIZIO.ORDERS:splitby=O_ORDERKEY |
| src=TRAFODION.MAURIZIO.CUSTOMER:splitby=C_CUSTOMERKEY |
| src=TRAFODION.MAURIZIO.PART:splitby=P_PARTKEY |
| src=TRAFODION.MAURIZIO.LINEITEM:splitby=L_PARTKEY</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="trickle-load-odb-extract">5.2.4. odb Extract</h4> |
| <div class="paragraph"> |
| <p>Refer to the <a href="http://trafodion.apache.org/docs/odb/index.html#_extract_tables">Extract Tables</a> |
| section in the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a> for complete documentation of this option.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Use then -e option to extract from data a table and write it to standard files or named pipes.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You can:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Export single tables, list of tables or generic SQL output.</p> |
| </li> |
| <li> |
| <p>Export table subsets (columns and/or rows).</p> |
| </li> |
| <li> |
| <p>Exports one or multiple tables in parallel using one or multiple data streams for each table</p> |
| </li> |
| <li> |
| <p>Invoke other functionalities (trim, remote trim, cast, limit text col length, max rows to export,. . .)</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>You can write the extracted data to:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Single/multiple files or standard output (pipe).</p> |
| </li> |
| <li> |
| <p>gzip compressed files (no external libraries required).</p> |
| </li> |
| <li> |
| <p>XML formatted files (no external libraries required).</p> |
| </li> |
| <li> |
| <p>Hadoop File System (requires libhdfs).</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>Other useful features:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Configurable NULL/EMPTY strings, field/record separators</p> |
| </li> |
| <li> |
| <p>Configurable rowset</p> |
| </li> |
| <li> |
| <p>Possibility to run generic SQL scripts before/after extracts</p> |
| </li> |
| <li> |
| <p>Multi-threaded export</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Important Options</strong></p> |
| </div> |
| <table class="tableblock frame-all grid-all spread"> |
| <colgroup> |
| <col style="width: 25%;"> |
| <col style="width: 75%;"> |
| </colgroup> |
| <thead> |
| <tr> |
| <th class="tableblock halign-left valign-top">Option</th> |
| <th class="tableblock halign-left valign-top">Defines</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>src</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Source file. If empty, then odb generates sample data.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>fs</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Field separator.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>tgt</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Target table, required.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>parallel</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Number of connections/threads to be used.`</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>splitby</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock">Source column to parallelize extract operation on.</p></td> |
| </tr> |
| <tr> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>pwhere</code></p></td> |
| <td class="tableblock halign-left valign-top"><p class="tableblock"><code>where</code> condition on source</p></td> |
| </tr> |
| </tbody> |
| </table> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u user -p xx -d dsn -T 3 \ |
| -e src=TRAFODION.MAURIZIO.LIN%:tgt=$\{DATA}/ext_%t.csv.gz:rows=m10:fs=\|:trim:gzip: \ |
| -e src=TRAFODION.MAURIZIO.REGION:tgt=$\{DATA}/ext_%t.csv.gz:rows=m10:fs=\|:trim:gzip \ |
| -e src=TRAFODION.MAURIZIO.NATION:tgt=$\{DATA}/ext_%t.csv.gz:rows=m10:fs=\|:trim:gzip</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The example above:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Extracts tables <code>REGION</code>, <code>NATION</code>, and all tables starting with <code>LIN</code> from the <code>TRAFODION.MAURIZIO</code> schema.</p> |
| </li> |
| <li> |
| <p>Saves data into files <code>ext_%t.csv.gz</code> (<code>%t</code> is expanded to the real table name).</p> |
| </li> |
| <li> |
| <p>Compresses the output file (gzip) on the fly (uncompressed data never lands to disk).</p> |
| </li> |
| <li> |
| <p>Trims text fields.</p> |
| </li> |
| <li> |
| <p>Uses a 10 MB IO buffer.</p> |
| </li> |
| <li> |
| <p>Uses three threads (ODBC connection) for the extraction process.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Use odb to extract all tables listed in a file.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ cat tlist.txt |
| |
| # List of tables to extract src=TRAFODION.MAURIZIO.ORDERS |
| src=TRAFODION.MAURIZIO.CUSTOMER src=TRAFODION.MAURIZIO.PART |
| src=TRAFODION.MAURIZIO.LINEITEM</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Extract all these tables by running:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u user -p xx -d dsn -e src=-tlist.txt:tgt=%t_%d%m:rows=m20:sq=\"</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The example above:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Reads the list of source tables from <code>tlist.txt</code>.</p> |
| </li> |
| <li> |
| <p>Extracts the data into file using the table name in lowercase (<code>%t</code>). |
| appending extraction data and time (<code>_%d%m</code>) for the target file name.</p> |
| </li> |
| <li> |
| <p>Uses a 20MB I/O buffer for each extraction thread.</p> |
| </li> |
| <li> |
| <p>Encloses strings with double-quote characters (<code>sq=\"</code>).</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="trickle-load-odb-transform">5.2.5. odb Transform</h4> |
| <div class="paragraph"> |
| <p>Refer to the <a href="http://trafodion.apache.org/docs/odb/index.html#load_map_fields">Map Source File Fields to Target Table Columns</a> |
| section in the <a href="http://trafodion.apache.org/docs/odb/index.html">Trafodion odb User Guide</a> for complete documentation of |
| odb’s mapping/transformation capabilities.</p> |
| </div> |
| <div class="paragraph"> |
| <p>odb provides mapping/transformation capabilities though mapfiles. By specifying <code>map=<mapfile></code> load option you can:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Associate any input file field to any table column</p> |
| </li> |
| <li> |
| <p>Skip input file fields</p> |
| </li> |
| <li> |
| <p>Generate sequences</p> |
| </li> |
| <li> |
| <p>Insert constants</p> |
| </li> |
| <li> |
| <p>Transform dates/timestamps formats</p> |
| </li> |
| <li> |
| <p>Extract substrings</p> |
| </li> |
| <li> |
| <p>Replace input file strings. For example: insert Maurizio Felici when you read MF</p> |
| </li> |
| <li> |
| <p>Generate random values</p> |
| </li> |
| <li> |
| <p>And much more</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>A generic mapfile contains:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><strong>Comments</strong> (line starting with #)</p> |
| </li> |
| <li> |
| <p><strong>Mappings</strong> to link input file fields to the corresponding target table columns.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>Mappings use the following syntax:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text"><colname>:<field>[:transformation operator]</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>Suppose you have a target table like this:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">+------+---------------+----+-------+------------+ |
| |COLUMN|TYPE |NULL|DEFAULT|INDEX | |
| +------+---------------+----+-------+------------+ |
| |ID |INTEGER SIGNED |NO | |mf_pkey 1 U | |
| |NAME |CHAR(10) |YES | | | |
| |AGE |SMALLINT SIGNED|YES | | | |
| |BDATE |DATE |YES | | | |
| +------+---------------+----+-------+------------+</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>And an input file like this:</p> |
| </div> |
| <hr> |
| <div class="paragraph"> |
| <p>uno,00,<strong>51</strong>,due,<em>Maurizio</em>,tre,<span class="underline">07 Mar 1959</span>, ignore,remaining, fields<br> |
| uno,00,<strong>46</strong>,due,<em>Lucia</em>,tre,<span class="underline">13 Oct 1964</span>, ignore, this<br> |
| uno,00,<strong>34</strong>,due,<em>Giovanni</em>,tre,<span class="underline">30 Mar 1976</span><br> |
| uno,00,<strong>48</strong>,due,<em>Antonella</em>,tre,<span class="underline">24 Apr 1962</span> |
| <strong>*</strong></p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p><strong>Bold text</strong> represents age.</p> |
| </li> |
| <li> |
| <p><em>Italics text</em> represents name.</p> |
| </li> |
| <li> |
| <p><span class="underline">Underline text</span> represents birth date.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>You want to load the marked fields into the appropriate column, generate a unique key for ID and ignore the remaining fields, |
| In addition, you need to convert the date format and replace all occurrences of <code>Lucia</code> with <code>Lucy</code>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The following map file accomplishes these goals:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ cat test/load_map/ml1.map + |
| # Map file to load TRAFODION.MFTEST.FRIENDS from friends.dat |
| ID:seq:1 # Inserts into ID column a sequence starting from 1 |
| NAME:4:REPLACE:Lucia:Lucy # Loads field #4 into NAME and replace all occurrences of Lucia with Lucy |
| AGE:2 # Loads field #2 (they start from zero) into AGE |
| BDATE:6:DCONV:d.b.y # Loads field #6 into BDATE converting date format from dd mmm yyyy</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>Load as follows:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">$ odb64luo -u user -p xx -d dsn \ |
| -l src=friends.dat:tgt=TRAFODION.MFTEST.FRIENDS:map=ml1.map:fs=,</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The above example:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Reads data from <code>friends.dat</code> (<code>src</code>).</p> |
| </li> |
| <li> |
| <p>Writes data to the <code>TRAFODION.MFTEST.FRIENDS</code> Trafodion table (<code>tgt</code>).</p> |
| </li> |
| <li> |
| <p>Uses <code>ml1.map</code> to define transformation specifications (<code>map</code>).</p> |
| </li> |
| <li> |
| <p>Uses comma as a field separator (<code>fs</code>).</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="bulk-unload">6. Bulk Unload</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>The <a href="http://trafodion.apache.org/docs/sql_reference/index.html#unload_statement">UNLOAD</a> statement is a |
| Trafodion extension that exports data from Trafodion tables into an HDFS location that you specify. |
| Refer to the <a href="http://trafodion.apache.org/docs/sql_reference/index.html#unload_statement">Trafodion SQL Reference Manual</a> |
| for complete documentation.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The extracted data can be either compressed or uncompressed based on what you choose. |
| UNLOAD performs multiple steps based on the options you give; for example:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>If using snapshot scan:</p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Get list of Trafodion tables from the query plan.</p> |
| </li> |
| <li> |
| <p>Create/verify snapshots.</p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| <li> |
| <p>Purge target location, if specified</p> |
| </li> |
| <li> |
| <p>Extract:</p> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Copy table data to data files.</p> |
| </li> |
| <li> |
| <p>Non-compressed is straight copy.</p> |
| </li> |
| <li> |
| <p>Compressed means compression takes place while writing data (no extra step)</p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| <li> |
| <p>Merge Data Files if specified</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Example</strong></p> |
| </div> |
| <div class="paragraph"> |
| <p>This example shows how the UNLOAD statement extracts data from a |
| Trafodion table, <code>TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS</code>, into an HDFS |
| folder, <code>/bulkload/customer_demographics</code>:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">>>UNLOAD |
| +>WITH PURGEDATA FROM TARGET |
| +>MERGE FILE 'merged_customer_demogs.gz' OVERWRITE |
| +>COMPRESSION GZIP |
| +>INTO '/bulkload/customer_demographics' |
| +>SELECT * FROM trafodion.hbase.customer_demographics |
| +><<+ cardinality 10e10 ,+ cardinality 10e10 >>; |
| Task: UNLOAD Status: Started |
| Task: EMPTY TARGET Status: Started |
| Task: EMPTY TARGET Status: Ended ET: 00:00:00.014 |
| Task: EXTRACT Status: Started |
| Rows Processed: 200000 |
| Task: EXTRACT Status: Ended ET: 00:00:04.743 Task: MERGE FILES Status: Started |
| Task: MERGE FILES Status: Ended ET: 00:00:00.063 |
| |
| --- 200000 row(s) unloaded.</code></pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="monitor-progress">7. Monitor Progress</h2> |
| <div class="sectionbody"> |
| <div class="sect2"> |
| <h3 id="_insert_and_upsert">7.1. INSERT and UPSERT</h3> |
| <div class="paragraph"> |
| <p>For an INSERT statement, rows are written to the HBase table that represents the Trafodion table when the transaction commits. |
| It is more difficult to see query progress here.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_upsert_using_load">7.2. UPSERT USING LOAD</h3> |
| <div class="paragraph"> |
| <p>For an UPSERT USING LOAD statement, rows added are visible in the Trafodion table after each <code>ListOfPut</code> call succeeds. |
| You can use a <code>SELECT COUNT(*)</code> statement to monitor progress. That way, you know how many rows are already in the table when the |
| statement starts executing.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">SELECT COUNT(*) FROM trafodion.sch.demo ;</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_load">7.3. LOAD</h3> |
| <div class="paragraph"> |
| <p>For LOAD, query progress goes through a few phases, which sometimes overlap:</p> |
| </div> |
| <div class="olist arabic"> |
| <ol class="arabic"> |
| <li> |
| <p>Hive scan.</p> |
| </li> |
| <li> |
| <p>Sort.</p> |
| </li> |
| <li> |
| <p>Create prep HFiles in HDFS bulkload staging directory (<code>/bulkload</code> by default).</p> |
| </li> |
| <li> |
| <p>Move HFiles into HBase.</p> |
| </li> |
| </ol> |
| </div> |
| <div class="paragraph"> |
| <p>You can monitor progress in step 2, sort, with this shell command:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">lsof +L1 | grep SCR | wc -l</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>This command returns a count of the number of overflow files for sort. Each file is 2GB in size. |
| You need to have an approximate idea of the volume of data being loaded to know how much more |
| data needs to be sorted. On a cluster, sort is done on all nodes with a pdsh-like utility. |
| Trafodion data volume can also be larger than Hive data volume by a factor of 2 or 3.</p> |
| </div> |
| <div class="paragraph"> |
| <p>In step 3, create prep HFiles, use the following command to monitor the volume of data written |
| out to the staging directory:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">hadoop fs -dus /bulkload</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>The <code>hadoop fs</code> command must be run from one node and does not have to be repeated across the cluster.</p> |
| </div> |
| <div class="paragraph"> |
| <p>If compression and encoding are used, then the size should be similar to the Hive source data volume. |
| There may be some remnant data in the staging directory from previous commands, so we have to |
| take that into account. This step will start only when sort has completed.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Step 4 is usually the shortest and typically does not exceed a few minutes.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="troubleshoot">8. Troubleshoot</h2> |
| <div class="sectionbody"> |
| <div class="sect2"> |
| <h3 id="_improving_throughput">8.1. Improving Throughput</h3> |
| <div class="sect3"> |
| <h4 id="_tuplelists_or_rowsets">8.1.1. Tuplelists or Rowsets</h4> |
| <div class="paragraph"> |
| <p>When Tuplelists or Rowsets are used as the data source, performance typically increases with the |
| number of rows in the Tuplelist or Rowset. Performance peaks at some value for the number of rows |
| and remain more or less steady after that. This peak value depends on row size. |
| Typically a value in the range of 100 to few thousand is reasonable.</p> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="_native_hbase_tables">8.1.2. Native HBase Tables</h4> |
| <div class="paragraph"> |
| <p>When native HBase tables are used as the data source, it is important to override the default value |
| for the attribute HBASE_MAX_COLUMN_VALUE_LENGTH (columnwise mode) or HBASE_MAX_COLUMN_INFO_LENGTH (rowwise mode) |
| and set the value to the maximum for the table being used as the source. |
| The default values may be too large.</p> |
| </div> |
| </div> |
| <div class="sect3"> |
| <h4 id="_hive_tables">8.1.3. Hive Tables</h4> |
| <div class="paragraph"> |
| <p>When Hive tables are used as the data source, it is important to override the default value for the |
| attribute HIVE_MAX_STRING_LENGTH when the Hive source table has columns of type string. Please set the |
| value to the length of the longest string in the Hive table.</p> |
| </div> |
| <div class="paragraph"> |
| <p>To determine that length, run this query from a Hive shell:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">SELECT MAX(LENGTH(<col-name>)) FROM <hive-tab-name>;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>If the query returns a value less than the current HIVE_MAX_STRING_LENGTH, then you need to |
| increase that value and retry. If the query returns a value that is far less than the current |
| HIVE_MAX_STRING_LENGTH, then you can achieve better performance by reducing the value. |
| An approximate value can be used, too. The Trafodion default of 32000 may be too generous in some cases.</p> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_checking_plan_quality">8.2. Checking Plan Quality</h3> |
| <div class="paragraph"> |
| <p>It is good practice to check the quality of the plan generated by the SQL compiler before executing a |
| data loading statement that may take a long time to complete.</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>For INSERT and UPSERT USING LOAD statements, use the EXPLAIN statement, which is described in the |
| <a href="http://trafodion.apache.org/docs/sql_reference/index.html#explain_statement">Trafodion SQL Reference Manual</a>.</p> |
| </li> |
| <li> |
| <p>For the LOAD statement, which is implemented as a utility operator (that is, a collection of secondary SQL statements), |
| use the following SQL statements to see the plan that it uses to add data to the target table:</p> |
| </li> |
| </ul> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CONTROL QUERY DEFAULT COMP_BOOL_226 'ON' ; |
| PREPARE s1 FROM LOAD TRANSFORM INTO <target-table> <select-query-used-as-source> ; |
| EXPLAIN OPTIONS 'f' s1 ;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>A typical problem with the plan is that the scan is not parallel enough. For Trafodion tables, you can address this |
| issue with the default attribute, PARALLEL_NUM_ESPS. Using this attribute, a Trafodion scan can be parallelized to |
| as many number of SALT partitions that are defined for the table. For Hive source tables, the default attributes, |
| HIVE_NUM_ESPS_PER_DATANODE and HIVE_MIN_BYTES_PER_ESP_PARTITION, can be used to adjust the degree of parallelism.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_update_statistics_times_out_during_sampling">8.3. UPDATE STATISTICS Times Out During Sampling</h3> |
| <div class="paragraph"> |
| <p>Sampling in update statistics is implemented using the HBase Random RowFilter. For very large tables with several |
| billion rows, the sampling ratio required to get a sample of one million rows is very small. This can result in |
| HBase client connection timeout errors since there may be no row returned by a RegionServer for an extended period of time.</p> |
| </div> |
| <div class="paragraph"> |
| <p>You can avoid this problem by:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Choosing a sampling percentage higher than the default setting of 1 million rows for large tables.</p> |
| <div class="paragraph"> |
| <p>For example, suppose table T has one billion rows. Use the following UPDATE STATISTICS statement to sample a million rows, |
| or approximately one-tenth of one percent of the total rows:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t ON EVERY COLUMN SAMPLE;</code></pre> |
| </div> |
| </div> |
| <div class="paragraph"> |
| <p>To sample one percent of the rows, regardless of the table size, you must explicitly state the sampling rate as follows:</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">UPDATE STATISTICS FOR TABLE t ON EVERY COLUMN SAMPLE RANDOM 1 PERCENT;</code></pre> |
| </div> |
| </div> |
| </li> |
| <li> |
| <p>Setting <code>hbase.rpc.timeout</code> to a higher value than currently specified in the HBase settings.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_index_creation_takes_too_long">8.4. Index Creation Takes Too Long</h3> |
| <div class="paragraph"> |
| <p>When creating an index, all rows of the Trafodion table must be scanned and a subset of columns is returned to the client. |
| This can take a while to complete. If there is a Hive table with the same data as the Trafodion table being scanned, then |
| you can specify the default attribute, USE_HIVE_SOURCE. This causes the Hive table to be used as the source creating the index.</p> |
| </div> |
| <div class="admonitionblock note"> |
| <table> |
| <tr> |
| <td class="icon"> |
| <i class="fa icon-note" title="Note"></i> |
| </td> |
| <td class="content"> |
| The name of the Hive table must use the Trafodion table name as its prefix. |
| For example, if the Trafodion table is TRAFODION.SCH.DEMO, then the Hive |
| table name can be DEMO_SRC. In this case, set the attribute as follows:<br> |
| <br> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">CONTROL QUERY DEFAULT USE_HIVE_SOURCE '_SRC' ; |
| CREATE INDEX demo_ix ON sch.demo(name) ;</code></pre> |
| </div> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="large-deletes">8.5. Large Deletes Take Too Long or Error Out</h3> |
| <div class="paragraph"> |
| <p>If a large number of rows is either updated or deleted in a single SQL statement, then it is likely that the |
| statement does not complete successfully.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Deleting or updating more than 10,000 rows with a single statement is not recommended. Instead, a large delete |
| or update should be broken up into multiple statements each affecting less than 10,000*n rows, if possible. |
| <code>n</code> is number of nodes in the cluster.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_large_upsert_using_load_on_a_table_with_index_errors_out">8.6. Large UPSERT USING LOAD On a Table With Index Errors Out</h3> |
| <div class="paragraph"> |
| <p>UPSERT USING LOAD automatically reverts to a transactional UPSERT when used on a table with an index. This causes Trafodion |
| to run into the limitation discusses in <a href="#large-deletes">Large Deletes Take Too Long or Error Out</a> above: |
| no more than 10,000*n rows (n = number of nodes) can be affected in a single statement.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>Workaround</strong>: The UPSERT USING LOAD operation can be placed in a LOAD statement as shown below. The LOAD statement disables |
| indexes on the table before the UPSERT USING LOAD starts. Once the UPSERT USING LOAD completes indexes are populated by |
| the LOAD statement.</p> |
| </div> |
| <div class="listingblock"> |
| <div class="content"> |
| <pre class="CodeRay highlight"><code data-lang="text">LOAD WITH UPSERT USING LOAD INTO trafodion.sch.demo SELECT * FROM hive.hive.demo; |
| |
| Task: LOAD Status: Started Object: TRAFODION.SCH.DEMO |
| Task: DISABLE INDEXE Status: Started Object: TRAFODION.SCH.DEMO |
| Task: DISABLE INDEXE Status: Ended Object: TRAFODION.SCH.DEMO |
| Task: UPSERT USING L Status: Started Object: TRAFODION.SCH.DEMO |
| Rows Processed: 200000 |
| Task: UPSERT USING L Status: Ended ET: 00:01:03.715 |
| Task: POPULATE INDEX Status: Started Object: TRAFODION.SCH.DEMO |
| Task: POPULATE INDEX Status: Ended ET: 00:03:11.323</code></pre> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div id="footer"> |
| <div id="footer-text"> |
| Version 2.4.0<br> |
| Last updated 2019-04-02 21:53:04 UTC |
| </div> |
| </div> |
| <script> |
| var _gaq = _gaq || []; |
| _gaq.push(['_setAccount','UA-72491210-1']); |
| _gaq.push(['_trackPageview']); |
| (function() { |
| var ga = document.createElement('script'); |
| ga.type = 'text/javascript'; |
| ga.async = true; |
| ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; |
| var s = document.getElementsByTagName('script')[0]; |
| s.parentNode.insertBefore(ga, s); |
| })(); |
| </script> |
| </body> |
| </html> |