blob: 509b673fba80718dcb6d78ffa45213f8fee40c4c [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="generator" content="Asciidoctor 0.1.4">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Apache Accumulo User Manual Version 1.6</title>
<style>
/* Asciidoctor default stylesheet | MIT License | http://asciidoctor.org */
article, aside, details, figcaption, figure, footer, header, hgroup, main, nav, section, summary { display: block; }
audio, canvas, video { display: inline-block; }
audio:not([controls]) { display: none; height: 0; }
[hidden] { display: none; }
html { background: #fff; color: #000; font-family: sans-serif; -ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%; }
body { margin: 0; }
a:focus { outline: thin dotted; }
a:active, a:hover { outline: 0; }
h1 { font-size: 2em; margin: 0.67em 0; }
abbr[title] { border-bottom: 1px dotted; }
b, strong { font-weight: bold; }
dfn { font-style: italic; }
hr { -moz-box-sizing: content-box; box-sizing: content-box; height: 0; }
mark { background: #ff0; color: #000; }
code, kbd, pre, samp { font-family: monospace, serif; font-size: 1em; }
pre { white-space: pre-wrap; }
q { quotes: "\201C" "\201D" "\2018" "\2019"; }
small { font-size: 80%; }
sub, sup { font-size: 75%; line-height: 0; position: relative; vertical-align: baseline; }
sup { top: -0.5em; }
sub { bottom: -0.25em; }
img { border: 0; }
svg:not(:root) { overflow: hidden; }
figure { margin: 0; }
fieldset { border: 1px solid #c0c0c0; margin: 0 2px; padding: 0.35em 0.625em 0.75em; }
legend { border: 0; padding: 0; }
button, input, select, textarea { font-family: inherit; font-size: 100%; margin: 0; }
button, input { line-height: normal; }
button, select { text-transform: none; }
button, html input[type="button"], input[type="reset"], input[type="submit"] { -webkit-appearance: button; cursor: pointer; }
button[disabled], html input[disabled] { cursor: default; }
input[type="checkbox"], input[type="radio"] { box-sizing: border-box; padding: 0; }
input[type="search"] { -webkit-appearance: textfield; -moz-box-sizing: content-box; -webkit-box-sizing: content-box; box-sizing: content-box; }
input[type="search"]::-webkit-search-cancel-button, input[type="search"]::-webkit-search-decoration { -webkit-appearance: none; }
button::-moz-focus-inner, input::-moz-focus-inner { border: 0; padding: 0; }
textarea { overflow: auto; vertical-align: top; }
table { border-collapse: collapse; border-spacing: 0; }
*, *:before, *:after { -moz-box-sizing: border-box; -webkit-box-sizing: border-box; box-sizing: border-box; }
html, body { font-size: 100%; }
body { background: white; color: #222222; padding: 0; margin: 0; font-family: "Helvetica Neue", "Helvetica", Helvetica, Arial, sans-serif; font-weight: normal; font-style: normal; line-height: 1; position: relative; cursor: auto; }
a:hover { cursor: pointer; }
a:focus { outline: none; }
img, object, embed { max-width: 100%; height: auto; }
object, embed { height: 100%; }
img { -ms-interpolation-mode: bicubic; }
#map_canvas img, #map_canvas embed, #map_canvas object, .map_canvas img, .map_canvas embed, .map_canvas object { max-width: none !important; }
.left { float: left !important; }
.right { float: right !important; }
.text-left { text-align: left !important; }
.text-right { text-align: right !important; }
.text-center { text-align: center !important; }
.text-justify { text-align: justify !important; }
.hide { display: none; }
.antialiased, body { -webkit-font-smoothing: antialiased; }
img { display: inline-block; vertical-align: middle; }
textarea { height: auto; min-height: 50px; }
select { width: 100%; }
p.lead, .paragraph.lead > p, #preamble > .sectionbody > .paragraph:first-of-type p { font-size: 1.21875em; line-height: 1.6; }
.subheader, #content #toctitle, .admonitionblock td.content > .title, .exampleblock > .title, .imageblock > .title, .videoblock > .title, .listingblock > .title, .literalblock > .title, .openblock > .title, .paragraph > .title, .quoteblock > .title, .sidebarblock > .title, .tableblock > .title, .verseblock > .title, .dlist > .title, .olist > .title, .ulist > .title, .qlist > .title, .hdlist > .title, .tableblock > caption { line-height: 1.4; color: #7a2518; font-weight: 300; margin-top: 0.2em; margin-bottom: 0.5em; }
div, dl, dt, dd, ul, ol, li, h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6, pre, form, p, blockquote, th, td { margin: 0; padding: 0; direction: ltr; }
a { color: #005498; text-decoration: underline; line-height: inherit; }
a:hover, a:focus { color: #00467f; }
a img { border: none; }
p { font-family: inherit; font-weight: normal; font-size: 1em; line-height: 1.6; margin-bottom: 1.25em; text-rendering: optimizeLegibility; }
p aside { font-size: 0.875em; line-height: 1.35; font-style: italic; }
h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { font-family: Georgia, "URW Bookman L", Helvetica, Arial, sans-serif; font-weight: normal; font-style: normal; color: #ba3925; text-rendering: optimizeLegibility; margin-top: 1em; margin-bottom: 0.5em; line-height: 1.2125em; }
h1 small, h2 small, h3 small, #toctitle small, .sidebarblock > .content > .title small, h4 small, h5 small, h6 small { font-size: 60%; color: #e99b8f; line-height: 0; }
h1 { font-size: 2.125em; }
h2 { font-size: 1.6875em; }
h3, #toctitle, .sidebarblock > .content > .title { font-size: 1.375em; }
h4 { font-size: 1.125em; }
h5 { font-size: 1.125em; }
h6 { font-size: 1em; }
hr { border: solid #dddddd; border-width: 1px 0 0; clear: both; margin: 1.25em 0 1.1875em; height: 0; }
em, i { font-style: italic; line-height: inherit; }
strong, b { font-weight: bold; line-height: inherit; }
small { font-size: 60%; line-height: inherit; }
code { font-family: Consolas, "Liberation Mono", Courier, monospace; font-weight: normal; color: #6d180b; }
ul, ol, dl { font-size: 1em; line-height: 1.6; margin-bottom: 1.25em; list-style-position: outside; font-family: inherit; }
ul, ol { margin-left: 1.5em; }
ul li ul, ul li ol { margin-left: 1.25em; margin-bottom: 0; font-size: 1em; }
ul.square li ul, ul.circle li ul, ul.disc li ul { list-style: inherit; }
ul.square { list-style-type: square; }
ul.circle { list-style-type: circle; }
ul.disc { list-style-type: disc; }
ul.no-bullet { list-style: none; }
ol li ul, ol li ol { margin-left: 1.25em; margin-bottom: 0; }
dl dt { margin-bottom: 0.3125em; font-weight: bold; }
dl dd { margin-bottom: 1.25em; }
abbr, acronym { text-transform: uppercase; font-size: 90%; color: #222222; border-bottom: 1px dotted #dddddd; cursor: help; }
abbr { text-transform: none; }
blockquote { margin: 0 0 1.25em; padding: 0.5625em 1.25em 0 1.1875em; border-left: 1px solid #dddddd; }
blockquote cite { display: block; font-size: inherit; color: #555555; }
blockquote cite:before { content: "\2014 \0020"; }
blockquote cite a, blockquote cite a:visited { color: #555555; }
blockquote, blockquote p { line-height: 1.6; color: #6f6f6f; }
.vcard { display: inline-block; margin: 0 0 1.25em 0; border: 1px solid #dddddd; padding: 0.625em 0.75em; }
.vcard li { margin: 0; display: block; }
.vcard .fn { font-weight: bold; font-size: 0.9375em; }
.vevent .summary { font-weight: bold; }
.vevent abbr { cursor: auto; text-decoration: none; font-weight: bold; border: none; padding: 0 0.0625em; }
@media only screen and (min-width: 768px) { h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { line-height: 1.4; }
h1 { font-size: 2.75em; }
h2 { font-size: 2.3125em; }
h3, #toctitle, .sidebarblock > .content > .title { font-size: 1.6875em; }
h4 { font-size: 1.4375em; } }
.print-only { display: none !important; }
@media print { * { background: transparent !important; color: #000 !important; box-shadow: none !important; text-shadow: none !important; }
a, a:visited { text-decoration: underline; }
a[href]:after { content: " (" attr(href) ")"; }
abbr[title]:after { content: " (" attr(title) ")"; }
.ir a:after, a[href^="javascript:"]:after, a[href^="#"]:after { content: ""; }
pre, blockquote { border: 1px solid #999; page-break-inside: avoid; }
thead { display: table-header-group; }
tr, img { page-break-inside: avoid; }
img { max-width: 100% !important; }
@page { margin: 0.5cm; }
p, h2, h3, #toctitle, .sidebarblock > .content > .title { orphans: 3; widows: 3; }
h2, h3, #toctitle, .sidebarblock > .content > .title { page-break-after: avoid; }
.hide-on-print { display: none !important; }
.print-only { display: block !important; }
.hide-for-print { display: none !important; }
.show-for-print { display: inherit !important; } }
table { background: white; margin-bottom: 1.25em; border: solid 1px #dddddd; }
table thead, table tfoot { background: whitesmoke; font-weight: bold; }
table thead tr th, table thead tr td, table tfoot tr th, table tfoot tr td { padding: 0.5em 0.625em 0.625em; font-size: inherit; color: #222222; text-align: left; }
table tr th, table tr td { padding: 0.5625em 0.625em; font-size: inherit; color: #222222; }
table tr.even, table tr.alt, table tr:nth-of-type(even) { background: #f9f9f9; }
table thead tr th, table tfoot tr th, table tbody tr td, table tr td, table tfoot tr td { display: table-cell; line-height: 1.6; }
.clearfix:before, .clearfix:after, .float-group:before, .float-group:after { content: " "; display: table; }
.clearfix:after, .float-group:after { clear: both; }
*:not(pre) > code { font-size: 0.9375em; padding: 1px 3px 0; white-space: nowrap; background-color: #f2f2f2; border: 1px solid #cccccc; -webkit-border-radius: 4px; border-radius: 4px; text-shadow: none; }
pre, pre > code { line-height: 1.4; color: inherit; font-family: Consolas, "Liberation Mono", Courier, monospace; font-weight: normal; }
kbd.keyseq { color: #555555; }
kbd:not(.keyseq) { display: inline-block; color: #222222; font-size: 0.75em; line-height: 1.4; background-color: #F7F7F7; border: 1px solid #ccc; -webkit-border-radius: 3px; border-radius: 3px; -webkit-box-shadow: 0 1px 0 rgba(0, 0, 0, 0.2), 0 0 0 2px white inset; box-shadow: 0 1px 0 rgba(0, 0, 0, 0.2), 0 0 0 2px white inset; margin: -0.15em 0.15em 0 0.15em; padding: 0.2em 0.6em 0.2em 0.5em; vertical-align: middle; white-space: nowrap; }
kbd kbd:first-child { margin-left: 0; }
kbd kbd:last-child { margin-right: 0; }
.menuseq, .menu { color: #090909; }
p a > code:hover { color: #561309; }
#header, #content, #footnotes, #footer { width: 100%; margin-left: auto; margin-right: auto; margin-top: 0; margin-bottom: 0; max-width: 62.5em; *zoom: 1; position: relative; padding-left: 0.9375em; padding-right: 0.9375em; }
#header:before, #header:after, #content:before, #content:after, #footnotes:before, #footnotes:after, #footer:before, #footer:after { content: " "; display: table; }
#header:after, #content:after, #footnotes:after, #footer:after { clear: both; }
#header { margin-bottom: 2.5em; }
#header > h1 { color: black; font-weight: normal; border-bottom: 1px solid #dddddd; margin-bottom: -28px; padding-bottom: 32px; }
#header span { color: #6f6f6f; }
#header #revnumber { text-transform: capitalize; }
#header br { display: none; }
#header br + span { padding-left: 3px; }
#header br + span:before { content: "\2013 \0020"; }
#header br + span.author { padding-left: 0; }
#header br + span.author:before { content: ", "; }
#toc { border-bottom: 3px double #ebebeb; padding-bottom: 1.25em; }
#toc > ul { margin-left: 0.25em; }
#toc ul.sectlevel0 > li > a { font-style: italic; }
#toc ul.sectlevel0 ul.sectlevel1 { margin-left: 0; margin-top: 0.5em; margin-bottom: 0.5em; }
#toc ul { list-style-type: none; }
#toctitle { color: #7a2518; }
@media only screen and (min-width: 1280px) { body.toc2 { padding-left: 20em; }
#toc.toc2 { position: fixed; width: 20em; left: 0; top: 0; border-right: 1px solid #ebebeb; border-bottom: 0; z-index: 1000; padding: 1em; height: 100%; overflow: auto; }
#toc.toc2 #toctitle { margin-top: 0; }
#toc.toc2 > ul { font-size: .95em; }
#toc.toc2 ul ul { margin-left: 0; padding-left: 1.25em; }
#toc.toc2 ul.sectlevel0 ul.sectlevel1 { padding-left: 0; margin-top: 0.5em; margin-bottom: 0.5em; }
body.toc2.toc-right { padding-left: 0; padding-right: 20em; }
body.toc2.toc-right #toc.toc2 { border-right: 0; border-left: 1px solid #ebebeb; left: auto; right: 0; } }
#content #toc { border-style: solid; border-width: 1px; border-color: #d9d9d9; margin-bottom: 1.25em; padding: 1.25em; background: #f2f2f2; border-width: 0; -webkit-border-radius: 4px; border-radius: 4px; }
#content #toc > :first-child { margin-top: 0; }
#content #toc > :last-child { margin-bottom: 0; }
#content #toc a { text-decoration: none; }
#content #toctitle { font-weight: bold; font-family: "Helvetica Neue", "Helvetica", Helvetica, Arial, sans-serif; font-size: 1em; padding-left: 0.125em; }
#footer { max-width: 100%; background-color: #222222; padding: 1.25em; }
#footer-text { color: #dddddd; line-height: 1.44; }
.sect1 { padding-bottom: 1.25em; }
.sect1 + .sect1 { border-top: 3px double #ebebeb; }
#content h1 > a.anchor, h2 > a.anchor, h3 > a.anchor, #toctitle > a.anchor, .sidebarblock > .content > .title > a.anchor, h4 > a.anchor, h5 > a.anchor, h6 > a.anchor { position: absolute; width: 1em; margin-left: -1em; display: block; text-decoration: none; visibility: hidden; text-align: center; font-weight: normal; }
#content h1 > a.anchor:before, h2 > a.anchor:before, h3 > a.anchor:before, #toctitle > a.anchor:before, .sidebarblock > .content > .title > a.anchor:before, h4 > a.anchor:before, h5 > a.anchor:before, h6 > a.anchor:before { content: '\00A7'; font-size: .85em; vertical-align: text-top; display: block; margin-top: 0.05em; }
#content h1:hover > a.anchor, #content h1 > a.anchor:hover, h2:hover > a.anchor, h2 > a.anchor:hover, h3:hover > a.anchor, #toctitle:hover > a.anchor, .sidebarblock > .content > .title:hover > a.anchor, h3 > a.anchor:hover, #toctitle > a.anchor:hover, .sidebarblock > .content > .title > a.anchor:hover, h4:hover > a.anchor, h4 > a.anchor:hover, h5:hover > a.anchor, h5 > a.anchor:hover, h6:hover > a.anchor, h6 > a.anchor:hover { visibility: visible; }
#content h1 > a.link, h2 > a.link, h3 > a.link, #toctitle > a.link, .sidebarblock > .content > .title > a.link, h4 > a.link, h5 > a.link, h6 > a.link { color: #ba3925; text-decoration: none; }
#content h1 > a.link:hover, h2 > a.link:hover, h3 > a.link:hover, #toctitle > a.link:hover, .sidebarblock > .content > .title > a.link:hover, h4 > a.link:hover, h5 > a.link:hover, h6 > a.link:hover { color: #a53221; }
.imageblock, .literalblock, .listingblock, .verseblock, .videoblock { margin-bottom: 1.25em; }
.admonitionblock td.content > .title, .exampleblock > .title, .imageblock > .title, .videoblock > .title, .listingblock > .title, .literalblock > .title, .openblock > .title, .paragraph > .title, .quoteblock > .title, .sidebarblock > .title, .tableblock > .title, .verseblock > .title, .dlist > .title, .olist > .title, .ulist > .title, .qlist > .title, .hdlist > .title { text-align: left; font-weight: bold; }
.tableblock > caption { text-align: left; font-weight: bold; white-space: nowrap; overflow: visible; max-width: 0; }
table.tableblock #preamble > .sectionbody > .paragraph:first-of-type p { font-size: inherit; }
.admonitionblock > table { border: 0; background: none; width: 100%; }
.admonitionblock > table td.icon { text-align: center; width: 80px; }
.admonitionblock > table td.icon img { max-width: none; }
.admonitionblock > table td.icon .title { font-weight: bold; text-transform: uppercase; }
.admonitionblock > table td.content { padding-left: 1.125em; padding-right: 1.25em; border-left: 1px solid #dddddd; color: #6f6f6f; }
.admonitionblock > table td.content > :last-child > :last-child { margin-bottom: 0; }
.exampleblock > .content { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 4px; border-radius: 4px; }
.exampleblock > .content > :first-child { margin-top: 0; }
.exampleblock > .content > :last-child { margin-bottom: 0; }
.exampleblock > .content h1, .exampleblock > .content h2, .exampleblock > .content h3, .exampleblock > .content #toctitle, .sidebarblock.exampleblock > .content > .title, .exampleblock > .content h4, .exampleblock > .content h5, .exampleblock > .content h6, .exampleblock > .content p { color: #333333; }
.exampleblock > .content h1, .exampleblock > .content h2, .exampleblock > .content h3, .exampleblock > .content #toctitle, .sidebarblock.exampleblock > .content > .title, .exampleblock > .content h4, .exampleblock > .content h5, .exampleblock > .content h6 { line-height: 1; margin-bottom: 0.625em; }
.exampleblock > .content h1.subheader, .exampleblock > .content h2.subheader, .exampleblock > .content h3.subheader, .exampleblock > .content .subheader#toctitle, .sidebarblock.exampleblock > .content > .subheader.title, .exampleblock > .content h4.subheader, .exampleblock > .content h5.subheader, .exampleblock > .content h6.subheader { line-height: 1.4; }
.exampleblock.result > .content { -webkit-box-shadow: 0 1px 8px #d9d9d9; box-shadow: 0 1px 8px #d9d9d9; }
.sidebarblock { border-style: solid; border-width: 1px; border-color: #d9d9d9; margin-bottom: 1.25em; padding: 1.25em; background: #f2f2f2; -webkit-border-radius: 4px; border-radius: 4px; }
.sidebarblock > :first-child { margin-top: 0; }
.sidebarblock > :last-child { margin-bottom: 0; }
.sidebarblock h1, .sidebarblock h2, .sidebarblock h3, .sidebarblock #toctitle, .sidebarblock > .content > .title, .sidebarblock h4, .sidebarblock h5, .sidebarblock h6, .sidebarblock p { color: #333333; }
.sidebarblock h1, .sidebarblock h2, .sidebarblock h3, .sidebarblock #toctitle, .sidebarblock > .content > .title, .sidebarblock h4, .sidebarblock h5, .sidebarblock h6 { line-height: 1; margin-bottom: 0.625em; }
.sidebarblock h1.subheader, .sidebarblock h2.subheader, .sidebarblock h3.subheader, .sidebarblock .subheader#toctitle, .sidebarblock > .content > .subheader.title, .sidebarblock h4.subheader, .sidebarblock h5.subheader, .sidebarblock h6.subheader { line-height: 1.4; }
.sidebarblock > .content > .title { color: #7a2518; margin-top: 0; line-height: 1.6; }
.exampleblock > .content > :last-child > :last-child, .exampleblock > .content .olist > ol > li:last-child > :last-child, .exampleblock > .content .ulist > ul > li:last-child > :last-child, .exampleblock > .content .qlist > ol > li:last-child > :last-child, .sidebarblock > .content > :last-child > :last-child, .sidebarblock > .content .olist > ol > li:last-child > :last-child, .sidebarblock > .content .ulist > ul > li:last-child > :last-child, .sidebarblock > .content .qlist > ol > li:last-child > :last-child { margin-bottom: 0; }
.literalblock > .content pre, .listingblock > .content pre { background: none; border-width: 1px 0; border-style: dotted; border-color: #bfbfbf; -webkit-border-radius: 4px; border-radius: 4px; padding: 0.75em 0.75em 0.5em 0.75em; word-wrap: break-word; }
.literalblock > .content pre.nowrap, .listingblock > .content pre.nowrap { overflow-x: auto; white-space: pre; word-wrap: normal; }
.literalblock > .content pre > code, .listingblock > .content pre > code { display: block; }
@media only screen { .literalblock > .content pre, .listingblock > .content pre { font-size: 0.8em; } }
@media only screen and (min-width: 768px) { .literalblock > .content pre, .listingblock > .content pre { font-size: 0.9em; } }
@media only screen and (min-width: 1280px) { .literalblock > .content pre, .listingblock > .content pre { font-size: 1em; } }
.listingblock > .content { position: relative; }
.listingblock:hover code[class*=" language-"]:before { text-transform: uppercase; font-size: 0.9em; color: #999; position: absolute; top: 0.375em; right: 0.375em; }
.listingblock:hover code.asciidoc:before { content: "asciidoc"; }
.listingblock:hover code.clojure:before { content: "clojure"; }
.listingblock:hover code.css:before { content: "css"; }
.listingblock:hover code.groovy:before { content: "groovy"; }
.listingblock:hover code.html:before { content: "html"; }
.listingblock:hover code.java:before { content: "java"; }
.listingblock:hover code.javascript:before { content: "javascript"; }
.listingblock:hover code.python:before { content: "python"; }
.listingblock:hover code.ruby:before { content: "ruby"; }
.listingblock:hover code.scss:before { content: "scss"; }
.listingblock:hover code.xml:before { content: "xml"; }
.listingblock:hover code.yaml:before { content: "yaml"; }
.listingblock.terminal pre .command:before { content: attr(data-prompt); padding-right: 0.5em; color: #999; }
.listingblock.terminal pre .command:not([data-prompt]):before { content: '$'; }
table.pyhltable { border: 0; margin-bottom: 0; }
table.pyhltable td { vertical-align: top; padding-top: 0; padding-bottom: 0; }
table.pyhltable td.code { padding-left: .75em; padding-right: 0; }
.highlight.pygments .lineno, table.pyhltable td:not(.code) { color: #999; padding-left: 0; padding-right: .5em; border-right: 1px solid #dddddd; }
.highlight.pygments .lineno { display: inline-block; margin-right: .25em; }
table.pyhltable .linenodiv { background-color: transparent !important; padding-right: 0 !important; }
.quoteblock { margin: 0 0 1.25em; padding: 0.5625em 1.25em 0 1.1875em; border-left: 1px solid #dddddd; }
.quoteblock blockquote { margin: 0 0 1.25em 0; padding: 0 0 0.5625em 0; border: 0; }
.quoteblock blockquote > .paragraph:last-child p { margin-bottom: 0; }
.quoteblock .attribution { margin-top: -.25em; padding-bottom: 0.5625em; font-size: inherit; color: #555555; }
.quoteblock .attribution br { display: none; }
.quoteblock .attribution cite { display: block; margin-bottom: 0.625em; }
table thead th, table tfoot th { font-weight: bold; }
table.tableblock.grid-all { border-collapse: separate; border-spacing: 1px; -webkit-border-radius: 4px; border-radius: 4px; border-top: 1px solid #dddddd; border-bottom: 1px solid #dddddd; }
table.tableblock.frame-topbot, table.tableblock.frame-none { border-left: 0; border-right: 0; }
table.tableblock.frame-sides, table.tableblock.frame-none { border-top: 0; border-bottom: 0; }
table.tableblock td .paragraph:last-child p, table.tableblock td > p:last-child { margin-bottom: 0; }
th.tableblock.halign-left, td.tableblock.halign-left { text-align: left; }
th.tableblock.halign-right, td.tableblock.halign-right { text-align: right; }
th.tableblock.halign-center, td.tableblock.halign-center { text-align: center; }
th.tableblock.valign-top, td.tableblock.valign-top { vertical-align: top; }
th.tableblock.valign-bottom, td.tableblock.valign-bottom { vertical-align: bottom; }
th.tableblock.valign-middle, td.tableblock.valign-middle { vertical-align: middle; }
p.tableblock.header { color: #222222; font-weight: bold; }
td > div.verse { white-space: pre; }
ol { margin-left: 1.75em; }
ul li ol { margin-left: 1.5em; }
dl dd { margin-left: 1.125em; }
dl dd:last-child, dl dd:last-child > :last-child { margin-bottom: 0; }
ol > li p, ul > li p, ul dd, ol dd, .olist .olist, .ulist .ulist, .ulist .olist, .olist .ulist { margin-bottom: 0.625em; }
ul.unstyled, ol.unnumbered, ul.checklist, ul.none { list-style-type: none; }
ul.unstyled, ol.unnumbered, ul.checklist { margin-left: 0.625em; }
ul.checklist li > p:first-child > i[class^="icon-check"]:first-child, ul.checklist li > p:first-child > input[type="checkbox"]:first-child { margin-right: 0.25em; }
ul.checklist li > p:first-child > input[type="checkbox"]:first-child { position: relative; top: 1px; }
ul.inline { margin: 0 auto 0.625em auto; margin-left: -1.375em; margin-right: 0; padding: 0; list-style: none; overflow: hidden; }
ul.inline > li { list-style: none; float: left; margin-left: 1.375em; display: block; }
ul.inline > li > * { display: block; }
.unstyled dl dt { font-weight: normal; font-style: normal; }
ol.arabic { list-style-type: decimal; }
ol.decimal { list-style-type: decimal-leading-zero; }
ol.loweralpha { list-style-type: lower-alpha; }
ol.upperalpha { list-style-type: upper-alpha; }
ol.lowerroman { list-style-type: lower-roman; }
ol.upperroman { list-style-type: upper-roman; }
ol.lowergreek { list-style-type: lower-greek; }
.hdlist > table, .colist > table { border: 0; background: none; }
.hdlist > table > tbody > tr, .colist > table > tbody > tr { background: none; }
td.hdlist1 { padding-right: .8em; font-weight: bold; }
td.hdlist1, td.hdlist2 { vertical-align: top; }
.literalblock + .colist, .listingblock + .colist { margin-top: -0.5em; }
.colist > table tr > td:first-of-type { padding: 0 .8em; line-height: 1; }
.colist > table tr > td:last-of-type { padding: 0.25em 0; }
.qanda > ol > li > p > em:only-child { color: #00467f; }
.thumb, .th { line-height: 0; display: inline-block; border: solid 4px white; -webkit-box-shadow: 0 0 0 1px #dddddd; box-shadow: 0 0 0 1px #dddddd; }
.imageblock.left, .imageblock[style*="float: left"] { margin: 0.25em 0.625em 1.25em 0; }
.imageblock.right, .imageblock[style*="float: right"] { margin: 0.25em 0 1.25em 0.625em; }
.imageblock > .title { margin-bottom: 0; }
.imageblock.thumb, .imageblock.th { border-width: 6px; }
.imageblock.thumb > .title, .imageblock.th > .title { padding: 0 0.125em; }
.image.left, .image.right { margin-top: 0.25em; margin-bottom: 0.25em; display: inline-block; line-height: 0; }
.image.left { margin-right: 0.625em; }
.image.right { margin-left: 0.625em; }
a.image { text-decoration: none; }
span.footnote, span.footnoteref { vertical-align: super; font-size: 0.875em; }
span.footnote a, span.footnoteref a { text-decoration: none; }
#footnotes { padding-top: 0.75em; padding-bottom: 0.75em; margin-bottom: 0.625em; }
#footnotes hr { width: 20%; min-width: 6.25em; margin: -.25em 0 .75em 0; border-width: 1px 0 0 0; }
#footnotes .footnote { padding: 0 0.375em; line-height: 1.3; font-size: 0.875em; margin-left: 1.2em; text-indent: -1.2em; margin-bottom: .2em; }
#footnotes .footnote a:first-of-type { font-weight: bold; text-decoration: none; }
#footnotes .footnote:last-of-type { margin-bottom: 0; }
#content #footnotes { margin-top: -0.625em; margin-bottom: 0; padding: 0.75em 0; }
.gist .file-data > table { border: none; background: #fff; width: 100%; margin-bottom: 0; }
.gist .file-data > table td.line-data { width: 99%; }
div.unbreakable { page-break-inside: avoid; }
.big { font-size: larger; }
.small { font-size: smaller; }
.underline { text-decoration: underline; }
.overline { text-decoration: overline; }
.line-through { text-decoration: line-through; }
.aqua { color: #00bfbf; }
.aqua-background { background-color: #00fafa; }
.black { color: black; }
.black-background { background-color: black; }
.blue { color: #0000bf; }
.blue-background { background-color: #0000fa; }
.fuchsia { color: #bf00bf; }
.fuchsia-background { background-color: #fa00fa; }
.gray { color: #606060; }
.gray-background { background-color: #7d7d7d; }
.green { color: #006000; }
.green-background { background-color: #007d00; }
.lime { color: #00bf00; }
.lime-background { background-color: #00fa00; }
.maroon { color: #600000; }
.maroon-background { background-color: #7d0000; }
.navy { color: #000060; }
.navy-background { background-color: #00007d; }
.olive { color: #606000; }
.olive-background { background-color: #7d7d00; }
.purple { color: #600060; }
.purple-background { background-color: #7d007d; }
.red { color: #bf0000; }
.red-background { background-color: #fa0000; }
.silver { color: #909090; }
.silver-background { background-color: #bcbcbc; }
.teal { color: #006060; }
.teal-background { background-color: #007d7d; }
.white { color: #bfbfbf; }
.white-background { background-color: #fafafa; }
.yellow { color: #bfbf00; }
.yellow-background { background-color: #fafa00; }
span.icon > [class^="icon-"], span.icon > [class*=" icon-"] { cursor: default; }
.admonitionblock td.icon [class^="icon-"]:before { font-size: 2.5em; text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.5); cursor: default; }
.admonitionblock td.icon .icon-note:before { content: "\f05a"; color: #005498; color: #003f72; }
.admonitionblock td.icon .icon-tip:before { content: "\f0eb"; text-shadow: 1px 1px 2px rgba(155, 155, 0, 0.8); color: #111; }
.admonitionblock td.icon .icon-warning:before { content: "\f071"; color: #bf6900; }
.admonitionblock td.icon .icon-caution:before { content: "\f06d"; color: #bf3400; }
.admonitionblock td.icon .icon-important:before { content: "\f06a"; color: #bf0000; }
.conum { display: inline-block; color: white !important; background-color: #222222; -webkit-border-radius: 100px; border-radius: 100px; text-align: center; width: 20px; height: 20px; font-size: 12px; font-weight: bold; line-height: 20px; font-family: Arial, sans-serif; font-style: normal; position: relative; top: -2px; letter-spacing: -1px; }
.conum * { color: white !important; }
.conum + b { display: none; }
.conum:after { content: attr(data-value); }
.conum:not([data-value]):empty { display: none; }
.literalblock > .content > pre, .listingblock > .content > pre { -webkit-border-radius: 0; border-radius: 0; }
</style>
<link rel="stylesheet" href="http://cdnjs.cloudflare.com/ajax/libs/highlight.js/7.3/styles/default.min.css">
<script src="http://cdnjs.cloudflare.com/ajax/libs/highlight.js/7.3/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad()</script>
</head>
<body class="book toc2 toc-left">
<div id="header">
<h1>Apache Accumulo User Manual Version 1.6</h1>
<span id="author" class="author">Apache Accumulo Project</span><br>
<span id="email" class="email"><a href="mailto:dev@accumulo.apache.org">dev@accumulo.apache.org</a></span><br>
<div id="toc" class="toc2">
<div id="toctitle">Apache Accumulo 1.6</div>
<ul class="sectlevel1">
<li><a href="#_introduction">1. Introduction</a></li>
<li><a href="#_accumulo_design">2. Accumulo Design</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_data_model">2.1. Data Model</a></li>
<li><a href="#_architecture">2.2. Architecture</a></li>
<li><a href="#_components">2.3. Components</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_tablet_server">2.3.1. Tablet Server</a></li>
<li><a href="#_garbage_collector">2.3.2. Garbage Collector</a></li>
<li><a href="#_master">2.3.3. Master</a></li>
<li><a href="#_tracer">2.3.4. Tracer</a></li>
<li><a href="#_monitor">2.3.5. Monitor</a></li>
<li><a href="#_client">2.3.6. Client</a></li>
</ul>
</li>
<li><a href="#_data_management">2.4. Data Management</a></li>
<li><a href="#_tablet_service">2.5. Tablet Service</a></li>
<li><a href="#_compactions">2.6. Compactions</a></li>
<li><a href="#_splitting">2.7. Splitting</a></li>
<li><a href="#_fault_tolerance">2.8. Fault-Tolerance</a></li>
</ul>
</li>
<li><a href="#_accumulo_shell">3. Accumulo Shell</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_basic_administration">3.1. Basic Administration</a></li>
<li><a href="#_table_maintenance">3.2. Table Maintenance</a></li>
<li><a href="#_user_administration">3.3. User Administration</a></li>
</ul>
</li>
<li><a href="#_writing_accumulo_clients">4. Writing Accumulo Clients</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_running_client_code">4.1. Running Client Code</a></li>
<li><a href="#_connecting">4.2. Connecting</a></li>
<li><a href="#_writing_data">4.3. Writing Data</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_batchwriter">4.3.1. BatchWriter</a></li>
<li><a href="#_conditionalwriter">4.3.2. ConditionalWriter</a></li>
</ul>
</li>
<li><a href="#_reading_data">4.4. Reading Data</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_scanner">4.4.1. Scanner</a></li>
<li><a href="#_isolated_scanner">4.4.2. Isolated Scanner</a></li>
<li><a href="#_batchscanner">4.4.3. BatchScanner</a></li>
</ul>
</li>
<li><a href="#_proxy">4.5. Proxy</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_prequisites">4.5.1. Prequisites</a></li>
<li><a href="#_configuration">4.5.2. Configuration</a></li>
<li><a href="#_running_the_proxy_server">4.5.3. Running the Proxy Server</a></li>
<li><a href="#_creating_a_proxy_client">4.5.4. Creating a Proxy Client</a></li>
<li><a href="#_using_a_proxy_client">4.5.5. Using a Proxy Client</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#_development_clients">5. Development Clients</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_mock_accumulo">5.1. Mock Accumulo</a></li>
<li><a href="#_mini_accumulo_cluster">5.2. Mini Accumulo Cluster</a></li>
</ul>
</li>
<li><a href="#_table_configuration">6. Table Configuration</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_locality_groups">6.1. Locality Groups</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_managing_locality_groups_via_the_shell">6.1.1. Managing Locality Groups via the Shell</a></li>
<li><a href="#_managing_locality_groups_via_the_client_api">6.1.2. Managing Locality Groups via the Client API</a></li>
</ul>
</li>
<li><a href="#_constraints">6.2. Constraints</a></li>
<li><a href="#_bloom_filters">6.3. Bloom Filters</a></li>
<li><a href="#_iterators">6.4. Iterators</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_setting_iterators_via_the_shell">6.4.1. Setting Iterators via the Shell</a></li>
<li><a href="#_setting_iterators_programmatically">6.4.2. Setting Iterators Programmatically</a></li>
<li><a href="#_versioning_iterators_and_timestamps">6.4.3. Versioning Iterators and Timestamps</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_logical_time">Logical Time</a></li>
<li><a href="#_deletes">Deletes</a></li>
</ul>
</li>
<li><a href="#_filters">6.4.4. Filters</a></li>
<li><a href="#_combiners">6.4.5. Combiners</a></li>
</ul>
</li>
<li><a href="#_block_cache">6.5. Block Cache</a></li>
<li><a href="#_compaction">6.6. Compaction</a></li>
<li><a href="#_pre_splitting_tables">6.7. Pre-splitting tables</a></li>
<li><a href="#_merging_tablets">6.8. Merging tablets</a></li>
<li><a href="#_delete_range">6.9. Delete Range</a></li>
<li><a href="#_cloning_tables">6.10. Cloning Tables</a></li>
<li><a href="#_exporting_tables">6.11. Exporting Tables</a></li>
</ul>
</li>
<li><a href="#_table_design">7. Table Design</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_basic_table">7.1. Basic Table</a></li>
<li><a href="#_rowid_design">7.2. RowID Design</a></li>
<li><a href="#_lexicoders">7.3. Lexicoders</a></li>
<li><a href="#_indexing">7.4. Indexing</a></li>
<li><a href="#_entity_attribute_and_graph_tables">7.5. Entity-Attribute and Graph Tables</a></li>
<li><a href="#_document_partitioned_indexing">7.6. Document-Partitioned Indexing</a></li>
</ul>
</li>
<li><a href="#_high_speed_ingest">8. High-Speed Ingest</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_pre_splitting_new_tables">8.1. Pre-Splitting New Tables</a></li>
<li><a href="#_multiple_ingester_clients">8.2. Multiple Ingester Clients</a></li>
<li><a href="#_bulk_ingest">8.3. Bulk Ingest</a></li>
<li><a href="#_logical_time_for_bulk_ingest">8.4. Logical Time for Bulk Ingest</a></li>
<li><a href="#_mapreduce_ingest">8.5. MapReduce Ingest</a></li>
</ul>
</li>
<li><a href="#_analytics">9. Analytics</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_mapreduce">9.1. MapReduce</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_mapper_and_reducer_classes">9.1.1. Mapper and Reducer classes</a></li>
<li><a href="#_accumuloinputformat_options">9.1.2. AccumuloInputFormat options</a></li>
<li><a href="#_accumulomultitableinputformat_options">9.1.3. AccumuloMultiTableInputFormat options</a></li>
<li><a href="#_accumulooutputformat_options">9.1.4. AccumuloOutputFormat options</a></li>
</ul>
</li>
<li><a href="#_combiners_2">9.2. Combiners</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_feature_vectors">9.2.1. Feature Vectors</a></li>
</ul>
</li>
<li><a href="#_statistical_modeling">9.3. Statistical Modeling</a></li>
</ul>
</li>
<li><a href="#_security">10. Security</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_security_label_expressions">10.1. Security Label Expressions</a></li>
<li><a href="#_security_label_expression_syntax">10.2. Security Label Expression Syntax</a></li>
<li><a href="#_authorization">10.3. Authorization</a></li>
<li><a href="#_user_authorizations">10.4. User Authorizations</a></li>
<li><a href="#_pluggable_security">10.5. Pluggable Security</a></li>
<li><a href="#_secure_authorizations_handling">10.6. Secure Authorizations Handling</a></li>
<li><a href="#_query_services_layer">10.7. Query Services Layer</a></li>
</ul>
</li>
<li><a href="#_ssl">11. SSL</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_ssl_server_configuration">11.1. Server Configuration</a></li>
<li><a href="#_ssl_client_configuration">11.2. Client Configuration</a></li>
<li><a href="#_ssl_generate_ssl_material_openssl">11.3. Generating SSL material using OpenSSL</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_ssl_generate_ca">11.3.1. Generate a certificate authority</a></li>
<li><a href="#_ssl_generate_certs">11.3.2. Generate a certificate/keystore per host</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#_implementation_details">12. Implementation Details</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_implementation_fate">12.1. Fault-Tolerant Executor (FATE)</a></li>
<ul class="sectlevel3">
<li><a href="#_implementation_fate_overview">12.1.1. Overview</a></li>
<li><a href="#_implementation_fate_administration">12.1.2. Administration</a></li>
<li><a href="#_implementation_fate_list_print">12.1.3. List/Print</a></li>
<li><a href="#_implementation_fate_fail">12.1.3. Fail</a></li>
<li><a href="#_implementation_fate_delete">12.1.3. Delete</a></li>
</ul>
</ul>
</li>
<li><a href="#_administration">13. Administration</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_hardware">13.1. Hardware</a></li>
<li><a href="#_network">13.2. Network</a></li>
<li><a href="#_installation">13.3. Installation</a></li>
<li><a href="#_dependencies">13.4. Dependencies</a></li>
<li><a href="#_configuration_2">12.5. Configuration</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_edit_conf_accumulo_env_sh">13.5.1. Edit conf/accumulo-env.sh</a></li>
<li><a href="#_native_map">13.5.2. Native Map</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_native_map_building">13.5.2.1 Building</a></li>
</ul>
</li>
<li><a href="#_administration_configuration">13.5.3. Configuration</a></li>
<li><a href="#_cluster_specification">13.5.4. Cluster Specification</a></li>
<li><a href="#_accumulo_settings">13.5.5. Accumulo Settings</a></li>
<li><a href="#_deploy_configuration">13.5.6. Deploy Configuration</a></li>
<li><a href="#_sensitive_configuration_values">13.5.7 Sensitive Configuration Values</a></li>
<li><a href="#_using_a_javakeystorecredentialprovider_for_storage">13.5.8. Using a JavaKeyStoreCredentialProvider for storage</a></li>
</ul>
</li>
<li><a href="#_initialization">13.6. Initialization</a></li>
<li><a href="#_running">13.7. Running</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_starting_accumulo">13.7.1. Starting Accumulo</a></li>
<li><a href="#_stopping_accumulo">13.7.2. Stopping Accumulo</a></li>
<li><a href="#_adding_a_node">13.7.3. Adding a Node</a></li>
<li><a href="#_decomissioning_a_node">13.7.4. Decomissioning a Node</a></li>
<li><a href="#_restarting_process_on_a_node">13.7.5. Restarting process on a node</a></li>
<li><a href="#_running_multiple_tabletservers_on_a_single_node">13.7.6. Running multiple TabletServers on a single node</a></li>
</ul>
</li>
<li><a href="#_monitoring">13.8. Monitoring</a></li>
<li><a href="#_tracing">13.9. Tracing</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_tracers">13.9.1. Tracers</a></li>
<li><a href="#_instrumenting_a_client">13.9.2. Instrumenting a Client</a></li>
<li><a href="#_viewing_collected_traces">13.9.3. Viewing Collected Traces</a></li>
<li><a href="#_tracing_from_the_shell">13.9.4. Tracing from the Shell</a></li>
</ul>
</li>
<li><a href="#_logging">13.10. Logging</a></li>
<li><a href="#_recovery">13.11. Recovery</a></li>
<li><a href="#_migrating_from_non_ha_to_ha">13.12 Migrating Accumulo from non-HA Namenode to HA Namenode</a></li>
</ul>
</li>
<li><a href="#_multi_volume_installations">14. Multi-Volume Installations</a></li>
<li><a href="#_troubleshooting">15. Troubleshooting</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_logs">15.1. Logs</a></li>
<li><a href="#_monitor_2">15.2. Monitor</a></li>
<li><a href="#_hdfs">15.3. HDFS</a></li>
<li><a href="#_zookeeper">15.4. Zookeeper</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_keeping_the_tablet_server_lock">15.4.1. Keeping the tablet server lock</a></li>
</ul>
</li>
<li><a href="#_tools">15.5. Tools</a></li>
<li><a href="#metadata">15.6. System Metadata Tables</a></li>
<li><a href="#_simple_system_recovery">15.7. Simple System Recovery</a></li>
<li><a href="#_advanced_system_recovery">15.8. Advanced System Recovery</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_hdfs_failure">15.8.1. HDFS Failure</a></li>
<li><a href="#zookeeper_failure">15.8.2. ZooKeeper Failure</a></li>
</ul>
</li>
<li><a href="#_upgrade_issues">15.9. Upgrade Issues</a></li>
<li><a href="#_file_naming_conventions">15.10. File Naming Conventions</a></li>
</ul>
</li>
<li><a href="#configuration">16. Appendix A: Configuration Management</a></li>
<li>
<ul class="sectlevel2">
<li><a href="#_configuration_overview">16.1. Configuration Overview</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_zookeeper_table_properties">16.1.1. Zookeeper table properties</a></li>
<li><a href="#_zookeeper_system_properties">16.1.2. Zookeeper system properties</a></li>
<li><a href="#_accumulo_site_xml">16.1.3. accumulo-site.xml</a></li>
<li><a href="#_default_values">16.1.4. Default Values</a></li>
</ul>
</li>
<li><a href="#_configuration_in_the_shell">16.2. Configuration in the Shell</a></li>
<li><a href="#_available_properties">16.3. Available Properties</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#RPC_PREFIX">16.3.1. rpc.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_rpc_javax_net_ssl_keystore">rpc.javax.net.ssl.keyStore</a></li>
<li><a href="#_rpc_javax_net_ssl_keystorepassword">rpc.javax.net.ssl.keyStorePassword</a></li>
<li><a href="#_rpc_javax_net_ssl_keystoretype">rpc.javax.net.ssl.keyStoreType</a></li>
<li><a href="#_rpc_javax_net_ssl_truststore">rpc.javax.net.ssl.trustStore</a></li>
<li><a href="#_rpc_javax_net_ssl_truststorepassword">rpc.javax.net.ssl.trustStorePassword</a></li>
<li><a href="#_rpc_javax_net_ssl_truststoretype">rpc.javax.net.ssl.trustStoreType</a></li>
<li><a href="#_rpc_usejsse">rpc.useJsse</a></li>
</ul>
</li>
<li><a href="#INSTANCE_PREFIX">16.3.2. instance.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_instance_dfs_dir">instance.dfs.dir</a></li>
<li><a href="#_instance_dfs_uri">instance.dfs.uri</a></li>
<li><a href="#_instance_rpc_ssl_clientauth">instance.rpc.ssl.clientAuth</a></li>
<li><a href="#_instance_rpc_ssl_enabled">instance.rpc.ssl.enabled</a></li>
<li><a href="#_instance_secret">instance.secret</a></li>
<li><a href="#_instance_security_authenticator">instance.security.authenticator</a></li>
<li><a href="#_instance_security_authorizor">instance.security.authorizor</a></li>
<li><a href="#_instance_security_permissionhandler">instance.security.permissionHandler</a></li>
<li><a href="#_instance_volumes">instance.volumes</a></li>
<li><a href="#_instance_volumes_replacements">instance.volumes.replacements</a></li>
<li><a href="#_instance_zookeeper_host">instance.zookeeper.host</a></li>
<li><a href="#_instance_zookeeper_timeout">instance.zookeeper.timeout</a></li>
</ul>
</li>
<li><a href="#GENERAL_PREFIX">16.3.3. general.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_general_classpaths">general.classpaths</a></li>
<li><a href="#_general_dynamic_classpaths">general.dynamic.classpaths</a></li>
<li><a href="#_general_kerberos_keytab">general.kerberos.keytab</a></li>
<li><a href="#_general_kerberos_principal">general.kerberos.principal</a></li>
<li><a href="#_general_rpc_timeout">general.rpc.timeout</a></li>
<li><a href="#_general_server_message_size_max">general.server.message.size.max</a></li>
<li><a href="#_general_server_simpletimer_threadpool_size">general.server.simpletimer.threadpool.size</a></li>
<li><a href="#_general_vfs_cache_dir">general.vfs.cache.dir</a></li>
<li><a href="#_general_vfs_classpaths">general.vfs.classpaths</a></li>
</ul>
</li>
<li><a href="#MASTER_PREFIX">16.3.4. master.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_master_bulk_retries">master.bulk.retries</a></li>
<li><a href="#_master_bulk_threadpool_size">master.bulk.threadpool.size</a></li>
<li><a href="#_master_bulk_timeout">master.bulk.timeout</a></li>
<li><a href="#_master_fate_threadpool_size">master.fate.threadpool.size</a></li>
<li><a href="#_master_lease_recovery_interval">master.lease.recovery.interval</a></li>
<li><a href="#_master_port_client">master.port.client</a></li>
<li><a href="#_master_recovery_delay">master.recovery.delay</a></li>
<li><a href="#_master_recovery_max_age">master.recovery.max.age</a></li>
<li><a href="#_master_recovery_time_max">master.recovery.time.max</a></li>
<li><a href="#_master_server_threadcheck_time">master.server.threadcheck.time</a></li>
<li><a href="#_master_server_threads_minimum">master.server.threads.minimum</a></li>
<li><a href="#_master_tablet_balancer">master.tablet.balancer</a></li>
<li><a href="#_master_walog_closer_implementation">master.walog.closer.implementation</a></li>
</ul>
</li>
<li><a href="#TSERV_PREFIX">16.3.5. tserver.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_tserver_archive_walogs">tserver.archive.walogs</a></li>
<li><a href="#_tserver_bloom_load_concurrent_max">tserver.bloom.load.concurrent.max</a></li>
<li><a href="#_tserver_bulk_assign_threads">tserver.bulk.assign.threads</a></li>
<li><a href="#_tserver_bulk_process_threads">tserver.bulk.process.threads</a></li>
<li><a href="#_tserver_bulk_retry_max">tserver.bulk.retry.max</a></li>
<li><a href="#_tserver_bulk_timeout">tserver.bulk.timeout</a></li>
<li><a href="#_tserver_cache_data_size">tserver.cache.data.size</a></li>
<li><a href="#_tserver_cache_index_size">tserver.cache.index.size</a></li>
<li><a href="#_tserver_client_timeout">tserver.client.timeout</a></li>
<li><a href="#_tserver_compaction_major_concurrent_max">tserver.compaction.major.concurrent.max</a></li>
<li><a href="#_tserver_compaction_major_delay">tserver.compaction.major.delay</a></li>
<li><a href="#_tserver_compaction_major_thread_files_open_max">tserver.compaction.major.thread.files.open.max</a></li>
<li><a href="#_tserver_compaction_minor_concurrent_max">tserver.compaction.minor.concurrent.max</a></li>
<li><a href="#_tserver_compaction_warn_time">tserver.compaction.warn.time</a></li>
<li><a href="#_tserver_default_blocksize">tserver.default.blocksize</a></li>
<li><a href="#_tserver_dir_memdump">tserver.dir.memdump</a></li>
<li><a href="#_tserver_files_open_idle">tserver.files.open.idle</a></li>
<li><a href="#_tserver_hold_time_max">tserver.hold.time.max</a></li>
<li><a href="#_tserver_memory_manager">tserver.memory.manager</a></li>
<li><a href="#_tserver_memory_maps_max">tserver.memory.maps.max</a></li>
<li><a href="#_tserver_memory_maps_native_enabled">tserver.memory.maps.native.enabled</a></li>
<li><a href="#_tserver_metadata_readahead_concurrent_max">tserver.metadata.readahead.concurrent.max</a></li>
<li><a href="#_tserver_migrations_concurrent_max">tserver.migrations.concurrent.max</a></li>
<li><a href="#_tserver_monitor_fs">tserver.monitor.fs</a></li>
<li><a href="#_tserver_mutation_queue_max">tserver.mutation.queue.max</a></li>
<li><a href="#_tserver_port_client">tserver.port.client</a></li>
<li><a href="#_tserver_port_search">tserver.port.search</a></li>
<li><a href="#_tserver_readahead_concurrent_max">tserver.readahead.concurrent.max</a></li>
<li><a href="#_tserver_recovery_concurrent_max">tserver.recovery.concurrent.max</a></li>
<li><a href="#_tserver_scan_files_open_max">tserver.scan.files.open.max</a></li>
<li><a href="#_tserver_server_message_size_max">tserver.server.message.size.max</a></li>
<li><a href="#_tserver_server_threadcheck_time">tserver.server.threadcheck.time</a></li>
<li><a href="#_tserver_server_threads_minimum">tserver.server.threads.minimum</a></li>
<li><a href="#_tserver_session_idle_max">tserver.session.idle.max</a></li>
<li><a href="#_tserver_sort_buffer_size">tserver.sort.buffer.size</a></li>
<li><a href="#_tserver_tablet_split_midpoint_files_max">tserver.tablet.split.midpoint.files.max</a></li>
<li><a href="#_tserver_wal_blocksize">tserver.wal.blocksize</a></li>
<li><a href="#_tserver_wal_replication">tserver.wal.replication</a></li>
<li><a href="#_tserver_wal_sync">tserver.wal.sync</a></li>
<li><a href="#_tserver_wal_sync_method">tserver.wal.sync.method</a></li>
<li><a href="#_tserver_walog_max_size">tserver.walog.max.size</a></li>
<li><a href="#_tserver_walog_max_age">tserver.walog.max.age</a></li>
<li><a href="#_tserver_workq_threads">tserver.workq.threads</a></li>
</ul>
</li>
<li><a href="#LOGGER_PREFIX">16.3.6. logger.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_logger_dir_walog">logger.dir.walog</a></li>
</ul>
</li>
<li><a href="#GC_PREFIX">16.3.7. gc.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_gc_cycle_delay">gc.cycle.delay</a></li>
<li><a href="#_gc_cycle_start">gc.cycle.start</a></li>
<li><a href="#_gc_port_client">gc.port.client</a></li>
<li><a href="#_gc_threads_delete">gc.threads.delete</a></li>
<li><a href="#_gc_trash_ignore">gc.trash.ignore</a></li>
<li><a href="#_gc_file_archive">gc.file.archive</a></li>
<li><a href="#_gc_wal_dead_server_wait">gc.wal.dead.server.wait</a></li>
</ul>
</li>
<li><a href="#MONITOR_PREFIX">16.3.8. monitor.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_monitor_banner_background">monitor.banner.background</a></li>
<li><a href="#_monitor_banner_color">monitor.banner.color</a></li>
<li><a href="#_monitor_banner_text">monitor.banner.text</a></li>
<li><a href="#_monitor_lock_check_interval">monitor.lock.check.interval</a></li>
<li><a href="#_monitor_port_client">monitor.port.client</a></li>
<li><a href="#_monitor_port_log4j">monitor.port.log4j</a></li>
</ul>
</li>
<li><a href="#TRACE_PREFIX">16.3.9. trace.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_trace_password">trace.password</a></li>
<li><a href="#_trace_port_client">trace.port.client</a></li>
<li><a href="#_trace_table">trace.table</a></li>
<li><a href="#_trace_token_type">trace.token.type</a></li>
<li><a href="#_trace_user">trace.user</a></li>
</ul>
</li>
<li><a href="#TRACE_TOKEN_PROPERTY_PREFIX">16.3.10. trace.token.property.*</a></li>
<li><a href="#TABLE_PREFIX">16.3.11. table.*</a></li>
<li>
<ul class="sectlevel4">
<li><a href="#_table_balancer">table.balancer</a></li>
<li><a href="#_table_bloom_enabled">table.bloom.enabled</a></li>
<li><a href="#_table_bloom_error_rate">table.bloom.error.rate</a></li>
<li><a href="#_table_bloom_hash_type">table.bloom.hash.type</a></li>
<li><a href="#_table_bloom_key_functor">table.bloom.key.functor</a></li>
<li><a href="#_table_bloom_load_threshold">table.bloom.load.threshold</a></li>
<li><a href="#_table_bloom_size">table.bloom.size</a></li>
<li><a href="#_table_cache_block_enable">table.cache.block.enable</a></li>
<li><a href="#_table_cache_index_enable">table.cache.index.enable</a></li>
<li><a href="#_table_classpath_context">table.classpath.context</a></li>
<li><a href="#_table_compaction_major_everything_idle">table.compaction.major.everything.idle</a></li>
<li><a href="#_table_compaction_major_ratio">table.compaction.major.ratio</a></li>
<li><a href="#_table_compaction_minor_idle">table.compaction.minor.idle</a></li>
<li><a href="#_table_compaction_minor_logs_threshold">table.compaction.minor.logs.threshold</a></li>
<li><a href="#_table_failures_ignore">table.failures.ignore</a></li>
<li><a href="#_table_file_blocksize">table.file.blocksize</a></li>
<li><a href="#_table_file_compress_blocksize">table.file.compress.blocksize</a></li>
<li><a href="#_table_file_compress_blocksize_index">table.file.compress.blocksize.index</a></li>
<li><a href="#_table_file_compress_type">table.file.compress.type</a></li>
<li><a href="#_table_file_max">table.file.max</a></li>
<li><a href="#_table_file_replication">table.file.replication</a></li>
<li><a href="#_table_file_type">table.file.type</a></li>
<li><a href="#_table_formatter">table.formatter</a></li>
<li><a href="#_table_groups_enabled">table.groups.enabled</a></li>
<li><a href="#_table_interepreter">table.interepreter</a></li>
<li><a href="#_table_majc_compaction_strategy">table.majc.compaction.strategy</a></li>
<li><a href="#_table_scan_max_memory">table.scan.max.memory</a></li>
<li><a href="#_table_security_scan_visibility_default">table.security.scan.visibility.default</a></li>
<li><a href="#_table_split_threshold">table.split.threshold</a></li>
<li><a href="#_table_walog_enabled">table.walog.enabled</a></li>
</ul>
</li>
<li><a href="#TABLE_CONSTRAINT_PREFIX">16.3.12. table.constraint.*</a></li>
<li><a href="#TABLE_ITERATOR_PREFIX">16.3.13. table.iterator.*</a></li>
<li><a href="#TABLE_LOCALITY_GROUP_PREFIX">16.3.14. table.group.*</a></li>
<li><a href="#TABLE_COMPACTION_STRATEGY_PREFIX">16.3.15. table.majc.compaction.strategy.opts.*</a></li>
<li><a href="#VFS_CONTEXT_CLASSPATH_PROPERTY">16.3.16. general.vfs.context.classpath.*</a></li>
</ul>
</li>
<li><a href="#_property_types">16.4. Property Types</a></li>
<li>
<ul class="sectlevel3">
<li><a href="#_duration">16.4.1. duration</a></li>
<li><a href="#_date_time">16.4.2. date/time</a></li>
<li><a href="#_memory">16.4.3. memory</a></li>
<li><a href="#_host_list">16.4.4. host list</a></li>
<li><a href="#_port">16.4.5. port</a></li>
<li><a href="#_count">16.4.6. count</a></li>
<li><a href="#_fraction_percentage">16.4.7. fraction/percentage</a></li>
<li><a href="#_path">16.4.8. path</a></li>
<li><a href="#_absolute_path">16.4.9. absolute path</a></li>
<li><a href="#_java_class">16.4.10. java class</a></li>
<li><a href="#_string">16.4.11. string</a></li>
<li><a href="#_boolean">16.4.12. boolean</a></li>
<li><a href="#_uri">16.4.13. uri</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="imageblock">
<div class="content">
<img src="" alt="accumulo logo">
</div>
</div>
<div class="paragraph">
<p>Copyright © 2011-2016 The Apache Software Foundation, Licensed under the Apache
License, Version 2.0. Apache Accumulo, Accumulo, Apache, and the Apache
Accumulo project logo are trademarks of the Apache Software Foundation.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_introduction">1. Introduction</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Apache Accumulo is a highly scalable structured store based on Google&#8217;s BigTable.
Accumulo is written in Java and operates over the Hadoop Distributed File System
(HDFS), which is part of the popular Apache Hadoop project. Accumulo supports
efficient storage and retrieval of structured data, including queries for ranges, and
provides support for using Accumulo tables as input and output for MapReduce
jobs.</p>
</div>
<div class="paragraph">
<p>Accumulo features automatic load-balancing and partitioning, data compression
and fine-grained security labels.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_accumulo_design">2. Accumulo Design</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_data_model">2.1. Data Model</h3>
<div class="paragraph">
<p>Accumulo provides a richer data model than simple key-value stores, but is not a
fully relational database. Data is represented as key-value pairs, where the key and
value are comprised of the following elements:</p>
</div>
<table class="tableblock frame-all grid-all" style="width:75%; ">
<colgroup>
<col style="width:16%;">
<col style="width:16%;">
<col style="width:16%;">
<col style="width:16%;">
<col style="width:16%;">
<col style="width:16%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-center valign-top" colspan="5"><p class="tableblock">Key</p></td>
<td class="tableblock halign-center valign-middle" rowspan="3"><p class="tableblock">Value</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-middle" rowspan="2"><p class="tableblock">Row ID</p></td>
<td class="tableblock halign-center valign-top" colspan="3"><p class="tableblock">Column</p></td>
<td class="tableblock halign-center valign-middle" rowspan="2"><p class="tableblock">Timestamp</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">Family</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Qualifier</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Visibility</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>All elements of the Key and the Value are represented as byte arrays except for
Timestamp, which is a Long. Accumulo sorts keys by element and lexicographically
in ascending order. Timestamps are sorted in descending order so that later
versions of the same Key appear first in a sequential scan. Tables consist of a set of
sorted key-value pairs.</p>
</div>
</div>
<div class="sect2">
<h3 id="_architecture">2.2. Architecture</h3>
<div class="paragraph">
<p>Accumulo is a distributed data storage and retrieval system and as such consists of
several architectural components, some of which run on many individual servers.
Much of the work Accumulo does involves maintaining certain properties of the
data, such as organization, availability, and integrity, across many commodity-class
machines.</p>
</div>
</div>
<div class="sect2">
<h3 id="_components">2.3. Components</h3>
<div class="paragraph">
<p>An instance of Accumulo includes many TabletServers, one Garbage Collector process,
one Master server and many Clients.</p>
</div>
<div class="sect3">
<h4 id="_tablet_server">2.3.1. Tablet Server</h4>
<div class="paragraph">
<p>The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a
write-ahead log, sorting new key-value pairs in memory, periodically
flushing sorted key-value pairs to new files in HDFS, and responding
to reads from clients, forming a merge-sorted view of all keys and
values from all the files it has created and the sorted in-memory
store.</p>
</div>
<div class="paragraph">
<p>TabletServers also perform recovery of a tablet
that was previously on a server that failed, reapplying any writes
found in the write-ahead log to the tablet.</p>
</div>
</div>
<div class="sect3">
<h4 id="_garbage_collector">2.3.2. Garbage Collector</h4>
<div class="paragraph">
<p>Accumulo processes will share files stored in HDFS. Periodically, the Garbage
Collector will identify files that are no longer needed by any process, and
delete them. Multiple garbage collectors can be run to provide hot-standby support.
They will perform leader election among themselves to choose a single active instance.</p>
</div>
</div>
<div class="sect3">
<h4 id="_master">2.3.3. Master</h4>
<div class="paragraph">
<p>The Accumulo Master is responsible for detecting and responding to TabletServer
failure. It tries to balance the load across TabletServer by assigning tablets carefully
and instructing TabletServers to unload tablets when necessary. The Master ensures all
tablets are assigned to one TabletServer each, and handles table creation, alteration,
and deletion requests from clients. The Master also coordinates startup, graceful
shutdown and recovery of changes in write-ahead logs when Tablet servers fail.</p>
</div>
<div class="paragraph">
<p>Multiple masters may be run. The masters will choose among themselves a single master,
and the others will become backups if the master should fail.</p>
</div>
</div>
<div class="sect3">
<h4 id="_tracer">2.3.4. Tracer</h4>
<div class="paragraph">
<p>The Accumulo Tracer process supports the distributed timing API provided by Accumulo.
One to many of these processes can be run on a cluster which will write the timing
information to a given Accumulo table for future reference. Seeing the section on
Tracing for more information on this support.</p>
</div>
</div>
<div class="sect3">
<h4 id="_monitor">2.3.5. Monitor</h4>
<div class="paragraph">
<p>The Accumulo Monitor is a web application that provides a wealth of information about
the state of an instance. The Monitor shows graphs and tables which contain information
about read/write rates, cache hit/miss rates, and Accumulo table information such as scan
rate and active/queued compactions. Additionally, the Monitor should always be the first
point of entry when attempting to debug an Accumulo problem as it will show high-level problems
in addition to aggregated errors from all nodes in the cluster. See the section on Monitoring
for more information.</p>
</div>
<div class="paragraph">
<p>Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the
forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active
at one time. Leader election will be performed internally to choose the active Monitor.</p>
</div>
</div>
<div class="sect3">
<h4 id="_client">2.3.6. Client</h4>
<div class="paragraph">
<p>Accumulo includes a client library that is linked to every application. The client
library contains logic for finding servers managing a particular tablet, and
communicating with TabletServers to write and retrieve key-value pairs.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_data_management">2.4. Data Management</h3>
<div class="paragraph">
<p>Accumulo stores data in tables, which are partitioned into tablets. Tablets are
partitioned on row boundaries so that all of the columns and values for a particular
row are found together within the same tablet. The Master assigns Tablets to one
TabletServer at a time. This enables row-level transactions to take place without
using distributed locking or some other complicated synchronization mechanism. As
clients insert and query data, and as machines are added and removed from the
cluster, the Master migrates tablets to ensure they remain available and that the
ingest and query load is balanced across the cluster.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="" alt="data distribution" width="500">
</div>
</div>
</div>
<div class="sect2">
<h3 id="_tablet_service">2.5. Tablet Service</h3>
<div class="paragraph">
<p>When a write arrives at a TabletServer it is written to a Write-Ahead Log and
then inserted into a sorted data structure in memory called a MemTable. When the
MemTable reaches a certain size the TabletServer writes out the sorted key-value
pairs to a file in HDFS called Indexed Sequential Access Method (ISAM)
file. This process is called a minor compaction. A new MemTable is then created
and the fact of the compaction is recorded in the Write-Ahead Log.</p>
</div>
<div class="paragraph">
<p>When a request to read data arrives at a TabletServer, the TabletServer does a
binary search across the MemTable as well as the in-memory indexes associated
with each ISAM file to find the relevant values. If clients are performing a
scan, several key-value pairs are returned to the client in order from the
MemTable and the set of ISAM files by performing a merge-sort as they are read.</p>
</div>
</div>
<div class="sect2">
<h3 id="_compactions">2.6. Compactions</h3>
<div class="paragraph">
<p>In order to manage the number of files per tablet, periodically the TabletServer
performs Major Compactions of files within a tablet, in which some set of ISAM
files are combined into one file. The previous files will eventually be removed
by the Garbage Collector. This also provides an opportunity to permanently
remove deleted key-value pairs by omitting key-value pairs suppressed by a
delete entry when the new file is created.</p>
</div>
</div>
<div class="sect2">
<h3 id="_splitting">2.7. Splitting</h3>
<div class="paragraph">
<p>When a table is created it has one tablet. As the table grows its initial
tablet eventually splits into two tablets. Its likely that one of these
tablets will migrate to another tablet server. As the table continues to grow,
its tablets will continue to split and be migrated. The decision to
automatically split a tablet is based on the size of a tablets files. The
size threshold at which a tablet splits is configurable per table. In addition
to automatic splitting, a user can manually add split points to a table to
create new tablets. Manually splitting a new table can parallelize reads and
writes giving better initial performance without waiting for automatic
splitting.</p>
</div>
<div class="paragraph">
<p>As data is deleted from a table, tablets may shrink. Over time this can lead
to small or empty tablets. To deal with this, merging of tablets was
introduced in Accumulo 1.4. This is discussed in more detail later.</p>
</div>
</div>
<div class="sect2">
<h3 id="_fault_tolerance">2.8. Fault-Tolerance</h3>
<div class="paragraph">
<p>If a TabletServer fails, the Master detects it and automatically reassigns the tablets
assigned from the failed server to other servers. Any key-value pairs that were in
memory at the time the TabletServer fails are automatically reapplied from the Write-Ahead
Log(WAL) to prevent any loss of data.</p>
</div>
<div class="paragraph">
<p>Tablet servers write their WALs directly to HDFS so the logs are available to all tablet
servers for recovery. To make the recovery process efficient, the updates within a log are
grouped by tablet. TabletServers can quickly apply the mutations from the sorted logs
that are destined for the tablets they have now been assigned.</p>
</div>
<div class="paragraph">
<p>TabletServer failures are noted on the Master&#8217;s monitor page, accessible via
<code><a href="http://master-address:50095/monitor">http://master-address:50095/monitor</a></code>.</p>
</div>
<div class="imageblock">
<div class="content">
<img src="" alt="failure handling" width="500">
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_accumulo_shell">3. Accumulo Shell</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Accumulo provides a simple shell that can be used to examine the contents and
configuration settings of tables, insert/update/delete values, and change
configuration settings.</p>
</div>
<div class="paragraph">
<p>The shell can be started by the following command:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/accumulo shell -u [username]</pre>
</div>
</div>
<div class="paragraph">
<p>The shell will prompt for the corresponding password to the username specified
and then display the following prompt:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>Shell - Apache Accumulo Interactive Shell
-
- version 1.6
- instance name: myinstance
- instance id: 00000000-0000-0000-0000-000000000000
-
- type 'help' for a list of available commands
-</pre>
</div>
</div>
<div class="sect2">
<h3 id="_basic_administration">3.1. Basic Administration</h3>
<div class="paragraph">
<p>The Accumulo shell can be used to create and delete tables, as well as to configure
table and instance specific options.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@myinstance&gt; tables
accumulo.metadata
accumulo.root
root@myinstance&gt; createtable mytable
root@myinstance mytable&gt;
root@myinstance mytable&gt; tables
accumulo.metadata
accumulo.root
mytable
root@myinstance mytable&gt; createtable testtable
root@myinstance testtable&gt;
root@myinstance testtable&gt; deletetable testtable
deletetable { testtable } (yes|no)? yes
Table: [testtable] has been deleted.
root@myinstance&gt;</pre>
</div>
</div>
<div class="paragraph">
<p>The Shell can also be used to insert updates and scan tables. This is useful for
inspecting tables.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@myinstance mytable&gt; scan
root@myinstance mytable&gt; insert row1 colf colq value1
insert successful
root@myinstance mytable&gt; scan
row1 colf:colq [] value1</pre>
</div>
</div>
<div class="paragraph">
<p>The value in brackets &#8220;[]&#8221; would be the visibility labels. Since none were used, this is empty for this row.
You can use the <code>-st</code> option to scan to see the timestamp for the cell, too.</p>
</div>
</div>
<div class="sect2">
<h3 id="_table_maintenance">3.2. Table Maintenance</h3>
<div class="paragraph">
<p>The <strong>compact</strong> command instructs Accumulo to schedule a compaction of the table during which
files are consolidated and deleted entries are removed.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance mytable&gt; compact -t mytable
07 16:13:53,201 [shell.Shell] INFO : Compaction of table mytable started for given range</pre>
</div>
</div>
<div class="paragraph">
<p>The <strong>flush</strong> command instructs Accumulo to write all entries currently in memory for a given table
to disk.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance mytable&gt; flush -t mytable
07 16:14:19,351 [shell.Shell] INFO : Flush of table mytable
initiated...</pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_user_administration">3.3. User Administration</h3>
<div class="paragraph">
<p>The Shell can be used to add, remove, and grant privileges to users.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@myinstance mytable&gt; createuser bob
Enter new password for 'bob': *********
Please confirm new password for 'bob': *********
root@myinstance mytable&gt; authenticate bob
Enter current password for 'bob': *********
Valid
root@myinstance mytable&gt; grant System.CREATE_TABLE -s -u bob
root@myinstance mytable&gt; user bob
Enter current password for 'bob': *********
bob@myinstance mytable&gt; userpermissions
System permissions: System.CREATE_TABLE
Table permissions (accumulo.metadata): Table.READ
Table permissions (mytable): NONE
bob@myinstance mytable&gt; createtable bobstable
bob@myinstance bobstable&gt;
bob@myinstance bobstable&gt; user root
Enter current password for 'root': *********
root@myinstance bobstable&gt; revoke System.CREATE_TABLE -s -u bob</pre>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_writing_accumulo_clients">4. Writing Accumulo Clients</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_running_client_code">4.1. Running Client Code</h3>
<div class="paragraph">
<p>There are multiple ways to run Java code that uses Accumulo. Below is a list
of the different ways to execute client code.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>using java executable</p>
</li>
<li>
<p>using the accumulo script</p>
</li>
<li>
<p>using the tool script</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>In order to run client code written to run against Accumulo, you will need to
include the jars that Accumulo depends on in your classpath. Accumulo client
code depends on Hadoop and Zookeeper. For Hadoop add the hadoop client jar, all
of the jars in the Hadoop lib directory, and the conf directory to the
classpath. For Zookeeper 3.3 you only need to add the Zookeeper jar, and not
what is in the Zookeeper lib directory. You can run the following command on a
configured Accumulo system to see what its using for its classpath.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/accumulo classpath</pre>
</div>
</div>
<div class="paragraph">
<p>Another option for running your code is to put a jar file in
<code>$ACCUMULO_HOME/lib/ext</code>. After doing this you can use the accumulo
script to execute your code. For example if you create a jar containing the
class <code>com.foo.Client</code> and placed that in <code>lib/ext</code>, then you could use the command
<code>$ACCUMULO_HOME/bin/accumulo com.foo.Client</code> to execute your code.</p>
</div>
<div class="paragraph">
<p>If you are writing map reduce job that access Accumulo, then you can use the
bin/tool.sh script to run those jobs. See the map reduce example.</p>
</div>
</div>
<div class="sect2">
<h3 id="_connecting">4.2. Connecting</h3>
<div class="paragraph">
<p>All clients must first identify the Accumulo instance to which they will be
communicating. Code to do this is as follows:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">String instanceName = "myinstance";
String zooServers = "zooserver-one,zooserver-two"
Instance inst = new ZooKeeperInstance(instanceName, zooServers);
Connector conn = inst.getConnector("user", new PasswordToken("passwd"));</code></pre>
</div>
</div>
<div class="paragraph">
<p>The PasswordToken is the most common implementation of an <code>AuthenticationToken</code>.
This general interface allows authentication as an Accumulo user to come from a variety of sources
or means. The CredentialProviderToken leverages the Hadoop CredentialProviders (new in Hadoop 2.6).</p>
</div>
<div class="paragraph">
<p>For example, the CredentialProviderToken can be used in conjunction with a Java KeyStore
to alleviate passwords stored in cleartext. When stored in HDFS, a single KeyStore can be
used across an entire instance. Be aware that KeyStores stored on the local filesystem
must be made available to all nodes in the Accumulo cluster.</p>
</div>
</div>
<div class="sect2">
<h3 id="_writing_data">4.3. Writing Data</h3>
<div class="paragraph">
<p>Data are written to Accumulo by creating Mutation objects that represent all the
changes to the columns of a single row. The changes are made atomically in the
TabletServer. Clients then add Mutations to a BatchWriter which submits them to
the appropriate TabletServers.</p>
</div>
<div class="paragraph">
<p>Mutations can be created thus:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Text rowID = new Text("row1");
Text colFam = new Text("myColFam");
Text colQual = new Text("myColQual");
ColumnVisibility colVis = new ColumnVisibility("public");
long timestamp = System.currentTimeMillis();
Value value = new Value("myValue".getBytes());
Mutation mutation = new Mutation(rowID);
mutation.put(colFam, colQual, colVis, timestamp, value);</code></pre>
</div>
</div>
<div class="sect3">
<h4 id="_batchwriter">4.3.1. BatchWriter</h4>
<div class="paragraph">
<p>The BatchWriter is highly optimized to send Mutations to multiple TabletServers
and automatically batches Mutations destined for the same TabletServer to
amortize network overhead. Care must be taken to avoid changing the contents of
any Object passed to the BatchWriter since it keeps objects in memory while
batching.</p>
</div>
<div class="paragraph">
<p>Mutations are added to a BatchWriter thus:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// BatchWriterConfig has reasonable defaults
BatchWriterConfig config = new BatchWriterConfig();
config.setMaxMemory(10000000L); // bytes available to batchwriter for buffering mutations
BatchWriter writer = conn.createBatchWriter("table", config)
writer.add(mutation);
writer.close();</code></pre>
</div>
</div>
<div class="paragraph">
<p>An example of using the batch writer can be found at
<code>accumulo/docs/examples/README.batch</code>.</p>
</div>
</div>
<div class="sect3">
<h4 id="_conditionalwriter">4.3.2. ConditionalWriter</h4>
<div class="paragraph">
<p>The ConditionalWriter enables efficient, atomic read-modify-write operations on
rows. The ConditionalWriter writes special Mutations which have a list of per
column conditions that must all be met before the mutation is applied. The
conditions are checked in the tablet server while a row lock is
held (Mutations written by the BatchWriter will not obtain a row
lock). The conditions that can be checked for a column are equality and
absence. For example a conditional mutation can require that column A is
absent inorder to be applied. Iterators can be applied when checking
conditions. Using iterators, many other operations besides equality and
absence can be checked. For example, using an iterator that converts values
less than 5 to 0 and everything else to 1, its possible to only apply a
mutation when a column is less than 5.</p>
</div>
<div class="paragraph">
<p>In the case when a tablet server dies after a client sent a conditional
mutation, its not known if the mutation was applied or not. When this happens
the ConditionalWriter reports a status of UNKNOWN for the ConditionalMutation.
In many cases this situation can be dealt with by simply reading the row again
and possibly sending another conditional mutation. If this is not sufficient,
then a higher level of abstraction can be built by storing transactional
information within a row.</p>
</div>
<div class="paragraph">
<p>An example of using the batch writer can be found at
<code>accumulo/docs/examples/README.reservations</code>.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_reading_data">4.4. Reading Data</h3>
<div class="paragraph">
<p>Accumulo is optimized to quickly retrieve the value associated with a given key, and
to efficiently return ranges of consecutive keys and their associated values.</p>
</div>
<div class="sect3">
<h4 id="_scanner">4.4.1. Scanner</h4>
<div class="paragraph">
<p>To retrieve data, Clients use a Scanner, which acts like an Iterator over
keys and values. Scanners can be configured to start and stop at particular keys, and
to return a subset of the columns available.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// specify which visibilities we are allowed to see
Authorizations auths = new Authorizations("public");
Scanner scan =
conn.createScanner("table", auths);
scan.setRange(new Range("harry","john"));
scan.fetchColumnFamily(new Text("attributes"));
for(Entry&lt;Key,Value&gt; entry : scan) {
Text row = entry.getKey().getRow();
Value value = entry.getValue();
}</code></pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_isolated_scanner">4.4.2. Isolated Scanner</h4>
<div class="paragraph">
<p>Accumulo supports the ability to present an isolated view of rows when
scanning. There are three possible ways that a row could change in Accumulo :</p>
</div>
<div class="ulist">
<ul>
<li>
<p>a mutation applied to a table</p>
</li>
<li>
<p>iterators executed as part of a minor or major compaction</p>
</li>
<li>
<p>bulk import of new files</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Isolation guarantees that either all or none of the changes made by these
operations on a row are seen. Use the IsolatedScanner to obtain an isolated
view of an Accumulo table. When using the regular scanner it is possible to see
a non isolated view of a row. For example if a mutation modifies three
columns, it is possible that you will only see two of those modifications.
With the isolated scanner either all three of the changes are seen or none.</p>
</div>
<div class="paragraph">
<p>The IsolatedScanner buffers rows on the client side so a large row will not
crash a tablet server. By default rows are buffered in memory, but the user
can easily supply their own buffer if they wish to buffer to disk when rows are
large.</p>
</div>
<div class="paragraph">
<p>For an example, look at the following</p>
</div>
<div class="literalblock">
<div class="content">
<pre>examples/simple/src/main/java/org/apache/accumulo/examples/simple/isolation/InterferenceTest.java</pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_batchscanner">4.4.3. BatchScanner</h4>
<div class="paragraph">
<p>For some types of access, it is more efficient to retrieve several ranges
simultaneously. This arises when accessing a set of rows that are not consecutive
whose IDs have been retrieved from a secondary index, for example.</p>
</div>
<div class="paragraph">
<p>The BatchScanner is configured similarly to the Scanner; it can be configured to
retrieve a subset of the columns available, but rather than passing a single Range,
BatchScanners accept a set of Ranges. It is important to note that the keys returned
by a BatchScanner are not in sorted order since the keys streamed are from multiple
TabletServers in parallel.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">ArrayList&lt;Range&gt; ranges = new ArrayList&lt;Range&gt;();
// populate list of ranges ...
BatchScanner bscan =
conn.createBatchScanner("table", auths, 10);
bscan.setRanges(ranges);
bscan.fetchColumnFamily(new Text("attributes"));
for(Entry&lt;Key,Value&gt; entry : bscan) {
System.out.println(entry.getValue());
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>An example of the BatchScanner can be found at
<code>accumulo/docs/examples/README.batch</code>.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_proxy">4.5. Proxy</h3>
<div class="paragraph">
<p>The proxy API allows the interaction with Accumulo with languages other than Java.
A proxy server is provided in the codebase and a client can further be generated.</p>
</div>
<div class="sect3">
<h4 id="_prequisites">4.5.1. Prequisites</h4>
<div class="paragraph">
<p>The proxy server can live on any node in which the basic client API would work. That
means it must be able to communicate with the Master, ZooKeepers, NameNode, and the
DataNodes. A proxy client only needs the ability to communicate with the proxy server.</p>
</div>
</div>
<div class="sect3">
<h4 id="_configuration">4.5.2. Configuration</h4>
<div class="paragraph">
<p>The configuration options for the proxy server live inside of a properties file. At
the very least, you need to supply the following properties:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken
port=42424
instance=test
zookeepers=localhost:2181</pre>
</div>
</div>
<div class="paragraph">
<p>You can find a sample configuration file in your distribution:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/proxy/proxy.properties.</pre>
</div>
</div>
<div class="paragraph">
<p>This sample configuration file further demonstrates an ability to back the proxy server
by MockAccumulo or the MiniAccumuloCluster.</p>
</div>
</div>
<div class="sect3">
<h4 id="_running_the_proxy_server">4.5.3. Running the Proxy Server</h4>
<div class="paragraph">
<p>After the properties file holding the configuration is created, the proxy server
can be started using the following command in the Accumulo distribution (assuming
your properties file is named <code>config.properties</code>):</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/accumulo proxy -p config.properties</pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_creating_a_proxy_client">4.5.4. Creating a Proxy Client</h4>
<div class="paragraph">
<p>Aside from installing the Thrift compiler, you will also need the language-specific library
for Thrift installed to generate client code in that language. Typically, your operating
system&#8217;s package manager will be able to automatically install these for you in an expected
location such as <code>/usr/lib/python/site-packages/thrift</code>.</p>
</div>
<div class="paragraph">
<p>You can find the thrift file for generating the client:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/proxy/proxy.thrift.</pre>
</div>
</div>
<div class="paragraph">
<p>After a client is generated, the port specified in the configuration properties above will be
used to connect to the server.</p>
</div>
</div>
<div class="sect3">
<h4 id="_using_a_proxy_client">4.5.5. Using a Proxy Client</h4>
<div class="paragraph">
<p>The following examples have been written in Java and the method signatures may be
slightly different depending on the language specified when generating client with
the Thrift compiler. After initiating a connection to the Proxy (see Apache Thrift&#8217;s
documentation for examples of connecting to a Thrift service), the methods on the
proxy client will be available. The first thing to do is log in:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Map password = new HashMap&lt;String,String&gt;();
password.put("password", "secret");
ByteBuffer token = client.login("root", password);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Once logged in, the token returned will be used for most subsequent calls to the client.
Let&#8217;s create a table, add some data, scan the table, and delete it.</p>
</div>
<div class="paragraph">
<p>First, create a table.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">client.createTable(token, "myTable", true, TimeType.MILLIS);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Next, add some data:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// first, create a writer on the server
String writer = client.createWriter(token, "myTable", new WriterOptions());
// build column updates
Map&lt;ByteBuffer, List&lt;ColumnUpdate&gt; cells&gt; cellsToUpdate = //...
// send updates to the server
client.updateAndFlush(writer, "myTable", cellsToUpdate);
client.closeWriter(writer);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Scan for the data and batch the return of the results on the server:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">String scanner = client.createScanner(token, "myTable", new ScanOptions());
ScanResult results = client.nextK(scanner, 100);
for(KeyValue keyValue : results.getResultsIterator()) {
// do something with results
}
client.closeScanner(scanner);</code></pre>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_development_clients">5. Development Clients</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of
Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc. If
you want to write a unit test that uses Accumulo, you need a lot of infrastructure
in place before your test can run.</p>
</div>
<div class="sect2">
<h3 id="_mock_accumulo">5.1. Mock Accumulo</h3>
<div class="paragraph">
<p>Mock Accumulo supplies mock implementations for much of the client API. It presently
does not enforce users, logins, permissions, etc. It does support Iterators and Combiners.
Note that MockAccumulo holds all data in memory, and will not retain any data or
settings between runs.</p>
</div>
<div class="paragraph">
<p>While normal interaction with the Accumulo client looks like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Instance instance = new ZooKeeperInstance(...);
Connector conn = instance.getConnector(user, passwordToken);</code></pre>
</div>
</div>
<div class="paragraph">
<p>To interact with the MockAccumulo, just replace the ZooKeeperInstance with MockInstance:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Instance instance = new MockInstance();</code></pre>
</div>
</div>
<div class="paragraph">
<p>In fact, you can use the <code>--fake</code> option to the Accumulo shell and interact with
MockAccumulo:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>$ ./bin/accumulo shell --fake -u root -p ''
Shell - Apache Accumulo Interactive Shell
-
- version: 1.6
- instance name: fake
- instance id: mock-instance-id
-
- type 'help' for a list of available commands
-
root@fake&gt; createtable test
root@fake test&gt; insert row1 cf cq value
root@fake test&gt; insert row2 cf cq value2
root@fake test&gt; insert row3 cf cq value3
root@fake test&gt; scan
row1 cf:cq [] value
row2 cf:cq [] value2
row3 cf:cq [] value3
root@fake test&gt; scan -b row2 -e row2
row2 cf:cq [] value2
root@fake test&gt;</pre>
</div>
</div>
<div class="paragraph">
<p>When testing Map Reduce jobs, you can also set the Mock Accumulo on the AccumuloInputFormat
and AccumuloOutputFormat classes:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">AccumuloInputFormat.setMockInstance(job, "mockInstance");
AccumuloOutputFormat.setMockInstance(job, "mockInstance");</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_mini_accumulo_cluster">5.2. Mini Accumulo Cluster</h3>
<div class="paragraph">
<p>While the Mock Accumulo provides a lightweight implementation of the client API for unit
testing, it is often necessary to write more realistic end-to-end integration tests that
take advantage of the entire ecosystem. The Mini Accumulo Cluster makes this possible by
configuring and starting Zookeeper, initializing Accumulo, and starting the Master as well
as some Tablet Servers. It runs against the local filesystem instead of having to start
up HDFS.</p>
</div>
<div class="paragraph">
<p>To start it up, you will need to supply an empty directory and a root password as arguments:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">File tempDirectory = // JUnit and Guava supply mechanisms for creating temp directories
MiniAccumuloCluster accumulo = new MiniAccumuloCluster(tempDirectory, "password");
accumulo.start();</code></pre>
</div>
</div>
<div class="paragraph">
<p>Once we have our mini cluster running, we will want to interact with the Accumulo client API:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Instance instance = new ZooKeeperInstance(accumulo.getInstanceName(), accumulo.getZooKeepers());
Connector conn = instance.getConnector("root", new PasswordToken("password"));</code></pre>
</div>
</div>
<div class="paragraph">
<p>Upon completion of our development code, we will want to shutdown our MiniAccumuloCluster:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">accumulo.stop();
// delete your temporary folder</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_table_configuration">6. Table Configuration</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Accumulo tables have a few options that can be configured to alter the default
behavior of Accumulo as well as improve performance based on the data stored.
These include locality groups, constraints, bloom filters, iterators, and block
cache. For a complete list of available configuration options, see <a href="#configuration">Configuration Management</a>.</p>
</div>
<div class="sect2">
<h3 id="_locality_groups">6.1. Locality Groups</h3>
<div class="paragraph">
<p>Accumulo supports storing sets of column families separately on disk to allow
clients to efficiently scan over columns that are frequently used together and to avoid
scanning over column families that are not requested. After a locality group is set,
Scanner and BatchScanner operations will automatically take advantage of them
whenever the fetchColumnFamilies() method is used.</p>
</div>
<div class="paragraph">
<p>By default, tables place all column families into the same &#8220;default&#8221; locality group.
Additional locality groups can be configured at any time via the shell or
programmatically as follows:</p>
</div>
<div class="sect3">
<h4 id="_managing_locality_groups_via_the_shell">6.1.1. Managing Locality Groups via the Shell</h4>
<div class="literalblock">
<div class="content">
<pre>usage: setgroups &lt;group&gt;=&lt;col fam&gt;{,&lt;col fam&gt;}{ &lt;group&gt;=&lt;col fam&gt;{,&lt;col fam&gt;}}
[-?] -t &lt;table&gt;</pre>
</div>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; setgroups group_one=colf1,colf2 -t mytable</pre>
</div>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; getgroups -t mytable</pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_managing_locality_groups_via_the_client_api">6.1.2. Managing Locality Groups via the Client API</h4>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Connector conn;
HashMap&lt;String,Set&lt;Text&gt;&gt; localityGroups = new HashMap&lt;String, Set&lt;Text&gt;&gt;();
HashSet&lt;Text&gt; metadataColumns = new HashSet&lt;Text&gt;();
metadataColumns.add(new Text("domain"));
metadataColumns.add(new Text("link"));
HashSet&lt;Text&gt; contentColumns = new HashSet&lt;Text&gt;();
contentColumns.add(new Text("body"));
contentColumns.add(new Text("images"));
localityGroups.put("metadata", metadataColumns);
localityGroups.put("content", contentColumns);
conn.tableOperations().setLocalityGroups("mytable", localityGroups);
// existing locality groups can be obtained as follows
Map&lt;String, Set&lt;Text&gt;&gt; groups =
conn.tableOperations().getLocalityGroups("mytable");</code></pre>
</div>
</div>
<div class="paragraph">
<p>The assignment of Column Families to Locality Groups can be changed at any time. The
physical movement of column families into their new locality groups takes place via
the periodic Major Compaction process that takes place continuously in the
background. Major Compaction can also be scheduled to take place immediately
through the shell:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; compact -t mytable</pre>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_constraints">6.2. Constraints</h3>
<div class="paragraph">
<p>Accumulo supports constraints applied on mutations at insert time. This can be
used to disallow certain inserts according to a user defined policy. Any mutation
that fails to meet the requirements of the constraint is rejected and sent back to the
client.</p>
</div>
<div class="paragraph">
<p>Constraints can be enabled by setting a table property as follows:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>user@myinstance mytable&gt; constraint -t mytable -a com.test.ExampleConstraint com.test.AnotherConstraint
user@myinstance mytable&gt; constraint -l
com.test.ExampleConstraint=1
com.test.AnotherConstraint=2</pre>
</div>
</div>
<div class="paragraph">
<p>Currently there are no general-purpose constraints provided with the Accumulo
distribution. New constraints can be created by writing a Java class that implements
the following interface:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>org.apache.accumulo.core.constraints.Constraint</pre>
</div>
</div>
<div class="paragraph">
<p>To deploy a new constraint, create a jar file containing the class implementing the
new constraint and place it in the lib directory of the Accumulo installation. New
constraint jars can be added to Accumulo and enabled without restarting but any
change to an existing constraint class requires Accumulo to be restarted.</p>
</div>
<div class="paragraph">
<p>An example of constraints can be found in
<code>accumulo/docs/examples/README.constraints</code> with corresponding code under
<code>accumulo/examples/simple/src/main/java/accumulo/examples/simple/constraints</code> .</p>
</div>
</div>
<div class="sect2">
<h3 id="_bloom_filters">6.3. Bloom Filters</h3>
<div class="paragraph">
<p>As mutations are applied to an Accumulo table, several files are created per tablet. If
bloom filters are enabled, Accumulo will create and load a small data structure into
memory to determine whether a file contains a given key before opening the file.
This can speed up lookups considerably.</p>
</div>
<div class="paragraph">
<p>To enable bloom filters, enter the following command in the Shell:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance&gt; config -t mytable -s table.bloom.enabled=true</pre>
</div>
</div>
<div class="paragraph">
<p>An extensive example of using Bloom Filters can be found at
<code>accumulo/docs/examples/README.bloom</code> .</p>
</div>
</div>
<div class="sect2">
<h3 id="_iterators">6.4. Iterators</h3>
<div class="paragraph">
<p>Iterators provide a modular mechanism for adding functionality to be executed by
TabletServers when scanning or compacting data. This allows users to efficiently
summarize, filter, and aggregate data. In fact, the built-in features of cell-level
security and column fetching are implemented using Iterators.
Some useful Iterators are provided with Accumulo and can be found in the
<strong><code>org.apache.accumulo.core.iterators.user</code></strong> package.
In each case, any custom Iterators must be included in Accumulo&#8217;s classpath,
typically by including a jar in <code>$ACCUMULO_HOME/lib</code> or
<code>$ACCUMULO_HOME/lib/ext</code>, although the VFS classloader allows for
classpath manipulation using a variety of schemes including URLs and HDFS URIs.</p>
</div>
<div class="sect3">
<h4 id="_setting_iterators_via_the_shell">6.4.1. Setting Iterators via the Shell</h4>
<div class="paragraph">
<p>Iterators can be configured on a table at scan, minor compaction and/or major
compaction scopes. If the Iterator implements the OptionDescriber interface, the
setiter command can be used which will interactively prompt the user to provide
values for the given necessary options.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>usage: setiter [-?] -ageoff | -agg | -class &lt;name&gt; | -regex |
-reqvis | -vers [-majc] [-minc] [-n &lt;itername&gt;] -p &lt;pri&gt;
[-scan] [-t &lt;table&gt;]</pre>
</div>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; setiter -t mytable -scan -p 15 -n myiter -class com.company.MyIterator</pre>
</div>
</div>
<div class="paragraph">
<p>The config command can always be used to manually configure iterators which is useful
in cases where the Iterator does not implement the OptionDescriber interface.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>config -t mytable -s table.iterator.scan.myiter=15,com.company.MyIterator
config -t mytable -s table.iterator.minc.myiter=15,com.company.MyIterator
config -t mytable -s table.iterator.majc.myiter=15,com.company.MyIterator
config -t mytable -s table.iterator.scan.myiter.opt.myoptionname=myoptionvalue
config -t mytable -s table.iterator.minc.myiter.opt.myoptionname=myoptionvalue
config -t mytable -s table.iterator.majc.myiter.opt.myoptionname=myoptionvalue</pre>
</div>
</div>
<div class="paragraph">
<p>Typically, a table will have multiple iterators. Accumulo configures a set of
system level iterators for each table. These iterators provide core functionality
like visibility label filtering and may not be removed by users. User level iterators
are applied in the order of their priority. Priority is a user-configured integer;
iterators with lower numbers go first, passing the results of their iteration on
to the other iterators up the stack.</p>
</div>
</div>
<div class="sect3">
<h4 id="_setting_iterators_programmatically">6.4.2. Setting Iterators Programmatically</h4>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">scanner.addIterator(new IteratorSetting(
15, // priority
"myiter", // name this iterator
"com.company.MyIterator" // class name
));</code></pre>
</div>
</div>
<div class="paragraph">
<p>Some iterators take additional parameters from client code, as in the following
example:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">IteratorSetting iter = new IteratorSetting(...);
iter.addOption("myoptionname", "myoptionvalue");
scanner.addIterator(iter)</code></pre>
</div>
</div>
<div class="paragraph">
<p>Tables support separate Iterator settings to be applied at scan time, upon minor
compaction and upon major compaction. For most uses, tables will have identical
iterator settings for all three to avoid inconsistent results.</p>
</div>
</div>
<div class="sect3">
<h4 id="_versioning_iterators_and_timestamps">6.4.3. Versioning Iterators and Timestamps</h4>
<div class="paragraph">
<p>Accumulo provides the capability to manage versioned data through the use of
timestamps within the Key. If a timestamp is not specified in the key created by the
client then the system will set the timestamp to the current time. Two keys with
identical rowIDs and columns but different timestamps are considered two versions
of the same key. If two inserts are made into Accumulo with the same rowID,
column, and timestamp, then the behavior is non-deterministic.</p>
</div>
<div class="paragraph">
<p>Timestamps are sorted in descending order, so the most recent data comes first.
Accumulo can be configured to return the top k versions, or versions later than a
given date. The default is to return the one most recent version.</p>
</div>
<div class="paragraph">
<p>The version policy can be changed by changing the VersioningIterator options for a
table as follows:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>user@myinstance mytable&gt; config -t mytable -s table.iterator.scan.vers.opt.maxVersions=3
user@myinstance mytable&gt; config -t mytable -s table.iterator.minc.vers.opt.maxVersions=3
user@myinstance mytable&gt; config -t mytable -s table.iterator.majc.vers.opt.maxVersions=3</pre>
</div>
</div>
<div class="paragraph">
<p>When a table is created, by default its configured to use the
VersioningIterator and keep one version. A table can be created without the
VersioningIterator with the -ndi option in the shell. Also the Java API
has the following method</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">connector.tableOperations.create(String tableName, boolean limitVersion);</code></pre>
</div>
</div>
<div class="sect4">
<h5 id="_logical_time">Logical Time</h5>
<div class="paragraph">
<p>Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
set by Accumulo always move forward. This helps avoid problems caused by
TabletServers that have different time settings. The per tablet counter gives unique
one up time stamps on a per mutation basis. When using time in milliseconds, if
two things arrive within the same millisecond then both receive the same
timestamp. When using time in milliseconds, Accumulo set times will still
always move forward and never backwards.</p>
</div>
<div class="paragraph">
<p>A table can be configured to use logical timestamps at creation time as follows:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance&gt; createtable -tl logical</pre>
</div>
</div>
</div>
<div class="sect4">
<h5 id="_deletes">Deletes</h5>
<div class="paragraph">
<p>Deletes are special keys in Accumulo that get sorted along will all the other data.
When a delete key is inserted, Accumulo will not show anything that has a
timestamp less than or equal to the delete key. During major compaction, any keys
older than a delete key are omitted from the new file created, and the omitted keys
are removed from disk as part of the regular garbage collection process.</p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_filters">6.4.4. Filters</h4>
<div class="paragraph">
<p>When scanning over a set of key-value pairs it is possible to apply an arbitrary
filtering policy through the use of a Filter. Filters are types of iterators that return
only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters
that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added
by writing a Java class that extends the
<code>org.apache.accumulo.core.iterators.Filter</code> class.</p>
</div>
<div class="paragraph">
<p>The AgeOff filter can be configured to remove data older than a certain date or a fixed
amount of time from the present. The following example sets a table to delete
everything inserted over 30 seconds ago:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>user@myinstance&gt; createtable filtertest
user@myinstance filtertest&gt; setiter -t filtertest -scan -minc -majc -p 10 -n myfilter -ageoff
AgeOffFilter removes entries with timestamps more than &lt;ttl&gt; milliseconds old
----------&gt; set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter negate, default false
keeps k/v that pass accept method, true rejects k/v that pass accept method:
----------&gt; set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter ttl, time to
live (milliseconds): 30000
----------&gt; set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter currentTime, if set,
use the given value as the absolute time in milliseconds as the current time of day:
user@myinstance filtertest&gt;
user@myinstance filtertest&gt; scan
user@myinstance filtertest&gt; insert foo a b c
user@myinstance filtertest&gt; scan
foo a:b [] c
user@myinstance filtertest&gt; sleep 4
user@myinstance filtertest&gt; scan
user@myinstance filtertest&gt;</pre>
</div>
</div>
<div class="paragraph">
<p>To see the iterator settings for a table, use:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@example filtertest&gt; config -t filtertest -f iterator
---------+---------------------------------------------+------------------
SCOPE | NAME | VALUE
---------+---------------------------------------------+------------------
table | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.majc.myfilter.opt.ttl ...... | 30000
table | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
table | table.iterator.majc.vers.opt.maxVersions .. | 1
table | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.minc.myfilter.opt.ttl ...... | 30000
table | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
table | table.iterator.minc.vers.opt.maxVersions .. | 1
table | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
table | table.iterator.scan.myfilter.opt.ttl ...... | 30000
table | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
table | table.iterator.scan.vers.opt.maxVersions .. | 1
---------+---------------------------------------------+------------------</pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_combiners">6.4.5. Combiners</h4>
<div class="paragraph">
<p>Accumulo allows Combiners to be configured on tables and column
families. When a Combiner is set it is applied across the values
associated with any keys that share rowID, column family, and column qualifier.
This is similar to the reduce step in MapReduce, which applied some function to all
the values associated with a particular key.</p>
</div>
<div class="paragraph">
<p>For example, if a summing combiner were configured on a table and the following
mutations were inserted:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>Row Family Qualifier Timestamp Value
rowID1 colfA colqA 20100101 1
rowID1 colfA colqA 20100102 1</pre>
</div>
</div>
<div class="paragraph">
<p>The table would reflect only one aggregate value:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>rowID1 colfA colqA - 2</pre>
</div>
</div>
<div class="paragraph">
<p>Combiners can be enabled for a table using the setiter command in the shell. Below is an example.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@a14 perDayCounts&gt; setiter -t perDayCounts -p 10 -scan -minc -majc -n daycount
-class org.apache.accumulo.core.iterators.user.SummingCombiner
TypedValueCombiner can interpret Values as a variety of number encodings
(VLong, Long, or String) before combining
----------&gt; set SummingCombiner parameter columns,
&lt;col fam&gt;[:&lt;col qual&gt;]{,&lt;col fam&gt;[:&lt;col qual&gt;]} : day
----------&gt; set SummingCombiner parameter type, &lt;VARNUM|LONG|STRING&gt;: STRING
root@a14 perDayCounts&gt; insert foo day 20080101 1
root@a14 perDayCounts&gt; insert foo day 20080101 1
root@a14 perDayCounts&gt; insert foo day 20080103 1
root@a14 perDayCounts&gt; insert bar day 20080101 1
root@a14 perDayCounts&gt; insert bar day 20080101 1
root@a14 perDayCounts&gt; scan
bar day:20080101 [] 2
foo day:20080101 [] 2
foo day:20080103 [] 1</pre>
</div>
</div>
<div class="paragraph">
<p>Accumulo includes some useful Combiners out of the box. To find these look in
the <strong><code>org.apache.accumulo.core.iterators.user</code></strong> package.</p>
</div>
<div class="paragraph">
<p>Additional Combiners can be added by creating a Java class that extends
<code>org.apache.accumulo.core.iterators.Combiner</code> and adding a jar containing that
class to Accumulo&#8217;s lib/ext directory.</p>
</div>
<div class="paragraph">
<p>An example of a Combiner can be found under</p>
</div>
<div class="literalblock">
<div class="content">
<pre>accumulo/examples/simple/src/main/java/org/apache/accumulo/examples/simple/combiner/StatsCombiner.java</pre>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_block_cache">6.5. Block Cache</h3>
<div class="paragraph">
<p>In order to increase throughput of commonly accessed entries, Accumulo employs a block cache.
This block cache buffers data in memory so that it doesn&#8217;t have to be read off of disk.
The RFile format that Accumulo prefers is a mix of index blocks and data blocks, where the index blocks are used to find the appropriate data blocks.
Typical queries to Accumulo result in a binary search over several index blocks followed by a linear scan of one or more data blocks.</p>
</div>
<div class="paragraph">
<p>The block cache can be configured on a per-table basis, and all tablets hosted on a tablet server share a single resource pool.
To configure the size of the tablet server&#8217;s block cache, set the following properties:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>tserver.cache.data.size: Specifies the size of the cache for file data blocks.
tserver.cache.index.size: Specifies the size of the cache for file indices.</pre>
</div>
</div>
<div class="paragraph">
<p>To enable the block cache for your table, set the following properties:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>table.cache.block.enable: Determines whether file (data) block cache is enabled.
table.cache.index.enable: Determines whether index cache is enabled.</pre>
</div>
</div>
<div class="paragraph">
<p>The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency.
It is enabled by default for the metadata tables.</p>
</div>
</div>
<div class="sect2">
<h3 id="_compaction">6.6. Compaction</h3>
<div class="paragraph">
<p>As data is written to Accumulo it is buffered in memory. The data buffered in
memory is eventually written to HDFS on a per tablet basis. Files can also be
added to tablets directly by bulk import. In the background tablet servers run
major compactions to merge multiple files into one. The tablet server has to
decide which tablets to compact and which files within a tablet to compact.
This decision is made using the compaction ratio, which is configurable on a
per table basis. To configure this ratio modify the following property:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>table.compaction.major.ratio</pre>
</div>
</div>
<div class="paragraph">
<p>Increasing this ratio will result in more files per tablet and less compaction
work. More files per tablet means more higher query latency. So adjusting
this ratio is a trade off between ingest and query performance. The ratio
defaults to 3.</p>
</div>
<div class="paragraph">
<p>The way the ratio works is that a set of files is compacted into one file if the
sum of the sizes of the files in the set is larger than the ratio multiplied by
the size of the largest file in the set. If this is not true for the set of all
files in a tablet, the largest file is removed from consideration, and the
remaining files are considered for compaction. This is repeated until a
compaction is triggered or there are no files left to consider.</p>
</div>
<div class="paragraph">
<p>The number of background threads tablet servers use to run major compactions is
configurable. To configure this modify the following property:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>tserver.compaction.major.concurrent.max</pre>
</div>
</div>
<div class="paragraph">
<p>Also, the number of threads tablet servers use for minor compactions is
configurable. To configure this modify the following property:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>tserver.compaction.minor.concurrent.max</pre>
</div>
</div>
<div class="paragraph">
<p>The numbers of minor and major compactions running and queued is visible on the
Accumulo monitor page. This allows you to see if compactions are backing up
and adjustments to the above settings are needed. When adjusting the number of
threads available for compactions, consider the number of cores and other tasks
running on the nodes such as maps and reduces.</p>
</div>
<div class="paragraph">
<p>If major compactions are not keeping up, then the number of files per tablet
will grow to a point such that query performance starts to suffer. One way to
handle this situation is to increase the compaction ratio. For example, if the
compaction ratio were set to 1, then every new file added to a tablet by minor
compaction would immediately queue the tablet for major compaction. So if a
tablet has a 200M file and minor compaction writes a 1M file, then the major
compaction will attempt to merge the 200M and 1M file. If the tablet server
has lots of tablets trying to do this sort of thing, then major compactions
will back up and the number of files per tablet will start to grow, assuming
data is being continuously written. Increasing the compaction ratio will
alleviate backups by lowering the amount of major compaction work that needs to
be done.</p>
</div>
<div class="paragraph">
<p>Another option to deal with the files per tablet growing too large is to adjust
the following property:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>table.file.max</pre>
</div>
</div>
<div class="paragraph">
<p>When a tablet reaches this number of files and needs to flush its in-memory
data to disk, it will choose to do a merging minor compaction. A merging minor
compaction will merge the tablet&#8217;s smallest file with the data in memory at
minor compaction time. Therefore the number of files will not grow beyond this
limit. This will make minor compactions take longer, which will cause ingest
performance to decrease. This can cause ingest to slow down until major
compactions have enough time to catch up. When adjusting this property, also
consider adjusting the compaction ratio. Ideally, merging minor compactions
never need to occur and major compactions will keep up. It is possible to
configure the file max and compaction ratio such that only merging minor
compactions occur and major compactions never occur. This should be avoided
because doing only merging minor compactions causes O(<em>N</em><sup>2</sup>) work to be done.
The amount of work done by major compactions is O(<em>N</em>*log<sub><em>R</em></sub>(<em>N</em>)) where
<em>R</em> is the compaction ratio.</p>
</div>
<div class="paragraph">
<p>Compactions can be initiated manually for a table. To initiate a minor
compaction, use the flush command in the shell. To initiate a major compaction,
use the compact command in the shell. The compact command will compact all
tablets in a table to one file. Even tablets with one file are compacted. This
is useful for the case where a major compaction filter is configured for a
table. In 1.4 the ability to compact a range of a table was added. To use this
feature specify start and stop rows for the compact command. This will only
compact tablets that overlap the given row range.</p>
</div>
</div>
<div class="sect2">
<h3 id="_pre_splitting_tables">6.7. Pre-splitting tables</h3>
<div class="paragraph">
<p>Accumulo will balance and distribute tables across servers. Before a
table gets large, it will be maintained as a single tablet on a single
server. This limits the speed at which data can be added or queried
to the speed of a single node. To improve performance when the a table
is new, or small, you can add split points and generate new tablets.</p>
</div>
<div class="paragraph">
<p>In the shell:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; createtable newTable
root@myinstance&gt; addsplits -t newTable g n t</pre>
</div>
</div>
<div class="paragraph">
<p>This will create a new table with 4 tablets. The table will be split
on the letters &#8220;g&#8221;, &#8220;n&#8221;, and &#8220;t&#8221; which will work nicely if the
row data start with lower-case alphabetic characters. If your row
data includes binary information or numeric information, or if the
distribution of the row information is not flat, then you would pick
different split points. Now ingest and query can proceed on 4 nodes
which can improve performance.</p>
</div>
</div>
<div class="sect2">
<h3 id="_merging_tablets">6.8. Merging tablets</h3>
<div class="paragraph">
<p>Over time, a table can get very large, so large that it has hundreds
of thousands of split points. Once there are enough tablets to spread
a table across the entire cluster, additional splits may not improve
performance, and may create unnecessary bookkeeping. The distribution
of data may change over time. For example, if row data contains date
information, and data is continually added and removed to maintain a
window of current information, tablets for older rows may be empty.</p>
</div>
<div class="paragraph">
<p>Accumulo supports tablet merging, which can be used to reduce
the number of split points. The following command will merge all rows
from &#8220;A&#8221; to &#8220;Z&#8221; into a single tablet:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; merge -t myTable -s A -e Z</pre>
</div>
</div>
<div class="paragraph">
<p>If the result of a merge produces a tablet that is larger than the
configured split size, the tablet may be split by the tablet server.
Be sure to increase your tablet size prior to any merges if the goal
is to have larger tablets:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; config -t myTable -s table.split.threshold=2G</pre>
</div>
</div>
<div class="paragraph">
<p>In order to merge small tablets, you can ask Accumulo to merge
sections of a table smaller than a given size.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; merge -t myTable -s 100M</pre>
</div>
</div>
<div class="paragraph">
<p>By default, small tablets will not be merged into tablets that are
already larger than the given size. This can leave isolated small
tablets. To force small tablets to be merged into larger tablets use
the <code>--force</code> option:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; merge -t myTable -s 100M --force</pre>
</div>
</div>
<div class="paragraph">
<p>Merging away small tablets works on one section at a time. If your
table contains many sections of small split points, or you are
attempting to change the split size of the entire table, it will be
faster to set the split point and merge the entire table:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; config -t myTable -s table.split.threshold=256M
root@myinstance&gt; merge -t myTable</pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_delete_range">6.9. Delete Range</h3>
<div class="paragraph">
<p>Consider an indexing scheme that uses date information in each row.
For example &#8220;20110823-15:20:25.013&#8221; might be a row that specifies a
date and time. In some cases, we might like to delete rows based on
this date, say to remove all the data older than the current year.
Accumulo supports a delete range operation which efficiently
removes data between two rows. For example:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; deleterange -t myTable -s 2010 -e 2011</pre>
</div>
</div>
<div class="paragraph">
<p>This will delete all rows starting with &#8220;2010&#8221; and it will stop at
any row starting &#8220;2011&#8221;. You can delete any data prior to 2011
with:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>root@myinstance&gt; deleterange -t myTable -e 2011 --force</pre>
</div>
</div>
<div class="paragraph">
<p>The shell will not allow you to delete an unbounded range (no start)
unless you provide the <code>--force</code> option.</p>
</div>
<div class="paragraph">
<p>Range deletion is implemented using splits at the given start/end
positions, and will affect the number of splits in the table.</p>
</div>
</div>
<div class="sect2">
<h3 id="_cloning_tables">6.10. Cloning Tables</h3>
<div class="paragraph">
<p>A new table can be created that points to an existing table&#8217;s data. This is a
very quick metadata operation, no data is actually copied. The cloned table
and the source table can change independently after the clone operation. One
use case for this feature is testing. For example to test a new filtering
iterator, clone the table, add the filter to the clone, and force a major
compaction. To perform a test on less data, clone a table and then use delete
range to efficiently remove a lot of data from the clone. Another use case is
generating a snapshot to guard against human error. To create a snapshot,
clone a table and then disable write permissions on the clone.</p>
</div>
<div class="paragraph">
<p>The clone operation will point to the source table&#8217;s files. This is why the
flush option is present and is enabled by default in the shell. If the flush
option is not enabled, then any data the source table currently has in memory
will not exist in the clone.</p>
</div>
<div class="paragraph">
<p>A cloned table copies the configuration of the source table. However the
permissions of the source table are not copied to the clone. After a clone is
created, only the user that created the clone can read and write to it.</p>
</div>
<div class="paragraph">
<p>In the following example we see that data inserted after the clone operation is
not visible in the clone.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@a14&gt; createtable people
root@a14 people&gt; insert 890435 name last Doe
root@a14 people&gt; insert 890435 name first John
root@a14 people&gt; clonetable people test
root@a14 people&gt; insert 890436 name first Jane
root@a14 people&gt; insert 890436 name last Doe
root@a14 people&gt; scan
890435 name:first [] John
890435 name:last [] Doe
890436 name:first [] Jane
890436 name:last [] Doe
root@a14 people&gt; table test
root@a14 test&gt; scan
890435 name:first [] John
890435 name:last [] Doe
root@a14 test&gt;</pre>
</div>
</div>
<div class="paragraph">
<p>The du command in the shell shows how much space a table is using in HDFS.
This command can also show how much overlapping space two cloned tables have in
HDFS. In the example below du shows table ci is using 428M. Then ci is cloned
to cic and du shows that both tables share 428M. After three entries are
inserted into cic and its flushed, du shows the two tables still share 428M but
cic has 226 bytes to itself. Finally, table cic is compacted and then du shows
that each table uses 428M.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@a14&gt; du ci
428,482,573 [ci]
root@a14&gt; clonetable ci cic
root@a14&gt; du ci cic
428,482,573 [ci, cic]
root@a14&gt; table cic
root@a14 cic&gt; insert r1 cf1 cq1 v1
root@a14 cic&gt; insert r1 cf1 cq2 v2
root@a14 cic&gt; insert r1 cf1 cq3 v3
root@a14 cic&gt; flush -t cic -w
27 15:00:13,908 [shell.Shell] INFO : Flush of table cic completed.
root@a14 cic&gt; du ci cic
428,482,573 [ci, cic]
226 [cic]
root@a14 cic&gt; compact -t cic -w
27 15:00:35,871 [shell.Shell] INFO : Compacting table ...
27 15:03:03,303 [shell.Shell] INFO : Compaction of table cic completed for given range
root@a14 cic&gt; du ci cic
428,482,573 [ci]
428,482,612 [cic]
root@a14 cic&gt;</pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_exporting_tables">6.11. Exporting Tables</h3>
<div class="paragraph">
<p>Accumulo supports exporting tables for the purpose of copying tables to another
cluster. Exporting and importing tables preserves the tables configuration,
splits, and logical time. Tables are exported and then copied via the hadoop
distcp command. To export a table, it must be offline and stay offline while
discp runs. The reason it needs to stay offline is to prevent files from being
deleted. A table can be cloned and the clone taken offline inorder to avoid
losing access to the table. See <code>docs/examples/README.export</code> for an example.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_table_design">7. Table Design</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_basic_table">7.1. Basic Table</h3>
<div class="paragraph">
<p>Since Accumulo tables are sorted by row ID, each table can be thought of as being
indexed by the row ID. Lookups performed by row ID can be executed quickly, by doing
a binary search, first across the tablets, and then within a tablet. Clients should
choose a row ID carefully in order to support their desired application. A simple rule
is to select a unique identifier as the row ID for each entity to be stored and assign
all the other attributes to be tracked to be columns under this row ID. For example,
if we have the following data in a comma-separated file:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>userid,age,address,account-balance</pre>
</div>
</div>
<div class="paragraph">
<p>We might choose to store this data using the userid as the rowID, the column
name in the column family, and a blank column qualifier:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Mutation m = new Mutation(userid);
final String column_qualifier = "";
m.put("age", column_qualifier, age);
m.put("address", column_qualifier, address);
m.put("balance", column_qualifier, account_balance);
writer.add(m);</code></pre>
</div>
</div>
<div class="paragraph">
<p>We could then retrieve any of the columns for a specific userid by specifying the
userid as the range of a scanner and fetching specific columns:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Range r = new Range(userid, userid); // single row
Scanner s = conn.createScanner("userdata", auths);
s.setRange(r);
s.fetchColumnFamily(new Text("age"));
for(Entry&lt;Key,Value&gt; entry : s) {
System.out.println(entry.getValue().toString());
}</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_rowid_design">7.2. RowID Design</h3>
<div class="paragraph">
<p>Often it is necessary to transform the rowID in order to have rows ordered in a way
that is optimal for anticipated access patterns. A good example of this is reversing
the order of components of internet domain names in order to group rows of the
same parent domain together:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>com.google.code
com.google.labs
com.google.mail
com.yahoo.mail
com.yahoo.research</pre>
</div>
</div>
<div class="paragraph">
<p>Some data may result in the creation of very large rows - rows with many columns.
In this case the table designer may wish to split up these rows for better load
balancing while keeping them sorted together for scanning purposes. This can be
done by appending a random substring at the end of the row:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>com.google.code_00
com.google.code_01
com.google.code_02
com.google.labs_00
com.google.mail_00
com.google.mail_01</pre>
</div>
</div>
<div class="paragraph">
<p>It could also be done by adding a string representation of some period of time such as date to the week
or month:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>com.google.code_201003
com.google.code_201004
com.google.code_201005
com.google.labs_201003
com.google.mail_201003
com.google.mail_201004</pre>
</div>
</div>
<div class="paragraph">
<p>Appending dates provides the additional capability of restricting a scan to a given
date range.</p>
</div>
</div>
<div class="sect2">
<h3 id="_lexicoders">7.3. Lexicoders</h3>
<div class="paragraph">
<p>Since Keys in Accumulo are sorted lexicographically by default, it&#8217;s often useful to encode
common data types into a byte format in which their sort order corresponds to the sort order
in their native form. An example of this is encoding dates and numerical data so that they can
be better seeked or searched in ranges.</p>
</div>
<div class="paragraph">
<p>The lexicoders are a standard and extensible way of encoding Java types. Here&#8217;s an example
of a lexicoder that encodes a java Date object so that it sorts lexicographically:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// create new date lexicoder
DateLexicoder dateEncoder = new DateLexicoder();
// truncate time to hours
long epoch = System.currentTimeMillis();
Date hour = new Date(epoch - (epoch % 3600000));
// encode the rowId so that it is sorted lexicographically
Mutation mutation = new Mutation(dateEncoder.encode(hour));
mutation.put(new Text("colf"), new Text("colq"), new Value(new byte[]{}));</code></pre>
</div>
</div>
<div class="paragraph">
<p>If we want to return the most recent date first, we can reverse the sort order
with the reverse lexicoder:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// create new date lexicoder and reverse lexicoder
DateLexicoder dateEncoder = new DateLexicoder();
ReverseLexicoder reverseEncoder = new ReverseLexicoder(dateEncoder);
// truncate date to hours
long epoch = System.currentTimeMillis();
Date hour = new Date(epoch - (epoch % 3600000));
// encode the rowId so that it sorts in reverse lexicographic order
Mutation mutation = new Mutation(reverseEncoder.encode(hour));
mutation.put(new Text("colf"), new Text("colq"), new Value(new byte[]{}));</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_indexing">7.4. Indexing</h3>
<div class="paragraph">
<p>In order to support lookups via more than one attribute of an entity, additional
indexes can be built. However, because Accumulo tables can support any number of
columns without specifying them beforehand, a single additional index will often
suffice for supporting lookups of records in the main table. Here, the index has, as
the rowID, the Value or Term from the main table, the column families are the same,
and the column qualifier of the index table contains the rowID from the main table.</p>
</div>
<table class="tableblock frame-all grid-rows" style="width:75%; ">
<colgroup>
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-center valign-top">RowID</th>
<th class="tableblock halign-center valign-top">Column Family</th>
<th class="tableblock halign-center valign-top">Column Qualifier</th>
<th class="tableblock halign-center valign-top">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">Term</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Field Name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">MainRowID</p></td>
<td class="tableblock halign-center valign-top"></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>Note: We store rowIDs in the column qualifier rather than the Value so that we can
have more than one rowID associated with a particular term within the index. If we
stored this in the Value we would only see one of the rows in which the value
appears since Accumulo is configured by default to return the one most recent
value associated with a key.</p>
</div>
<div class="paragraph">
<p>Lookups can then be done by scanning the Index Table first for occurrences of the
desired values in the columns specified, which returns a list of row ID from the main
table. These can then be used to retrieve each matching record, in their entirety, or a
subset of their columns, from the Main Table.</p>
</div>
<div class="paragraph">
<p>To support efficient lookups of multiple rowIDs from the same table, the Accumulo
client library provides a BatchScanner. Users specify a set of Ranges to the
BatchScanner, which performs the lookups in multiple threads to multiple servers
and returns an Iterator over all the rows retrieved. The rows returned are NOT in
sorted order, as is the case with the basic Scanner interface.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// first we scan the index for IDs of rows matching our query
Text term = new Text("mySearchTerm");
HashSet&lt;Range&gt; matchingRows = new HashSet&lt;Range&gt;();
Scanner indexScanner = createScanner("index", auths);
indexScanner.setRange(new Range(term, term));
// we retrieve the matching rowIDs and create a set of ranges
for(Entry&lt;Key,Value&gt; entry : indexScanner) {
matchingRows.add(new Range(entry.getKey().getColumnQualifier()));
}
// now we pass the set of rowIDs to the batch scanner to retrieve them
BatchScanner bscan = conn.createBatchScanner("table", auths, 10);
bscan.setRanges(matchingRows);
bscan.fetchColumnFamily(new Text("attributes"));
for(Entry&lt;Key,Value&gt; entry : bscan) {
System.out.println(entry.getValue());
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>One advantage of the dynamic schema capabilities of Accumulo is that different
fields may be indexed into the same physical table. However, it may be necessary to
create different index tables if the terms must be formatted differently in order to
maintain proper sort order. For example, real numbers must be formatted
differently than their usual notation in order to be sorted correctly. In these cases,
usually one index per unique data type will suffice.</p>
</div>
</div>
<div class="sect2">
<h3 id="_entity_attribute_and_graph_tables">7.5. Entity-Attribute and Graph Tables</h3>
<div class="paragraph">
<p>Accumulo is ideal for storing entities and their attributes, especially of the
attributes are sparse. It is often useful to join several datasets together on common
entities within the same table. This can allow for the representation of graphs,
including nodes, their attributes, and connections to other nodes.</p>
</div>
<div class="paragraph">
<p>Rather than storing individual events, Entity-Attribute or Graph tables store
aggregate information about the entities involved in the events and the
relationships between entities. This is often preferrable when single events aren&#8217;t
very useful and when a continuously updated summarization is desired.</p>
</div>
<div class="paragraph">
<p>The physical schema for an entity-attribute or graph table is as follows:</p>
</div>
<table class="tableblock frame-all grid-rows" style="width:75%; ">
<colgroup>
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-center valign-top">RowID</th>
<th class="tableblock halign-center valign-top">Column Family</th>
<th class="tableblock halign-center valign-top">Column Qualifier</th>
<th class="tableblock halign-center valign-top">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">EntityID</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Attribute Name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Attribute Value</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Weight</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">EntityID</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Edge Type</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Related EntityID</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Weight</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>For example, to keep track of employees, managers and products the following
entity-attribute table could be used. Note that the weights are not always necessary
and are set to 0 when not used.</p>
</div>
<table class="tableblock frame-all grid-rows" style="width:75%; ">
<colgroup>
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-center valign-top">RowID</th>
<th class="tableblock halign-center valign-top">Column Family</th>
<th class="tableblock halign-center valign-top">Column Qualifier</th>
<th class="tableblock halign-center valign-top">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">bob</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">department</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">sales</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">hire_date</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">20030102</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">units_sold</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">780</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">george</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">department</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">sales</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">manager_of</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">E001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">manager_of</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">harry</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">department</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">accounts_recv</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">hire_date</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">20000405</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">units_sold</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">566</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">E003</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">units_sold</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">232</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">product_name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">nike_airs</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">product_type</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">shoe</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">germany</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">900</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P001</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">brazil</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">200</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">product_name</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">basic_jacket</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">product_type</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">clothing</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">0</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">usa</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">3454</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">P002</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">in_stock</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">germany</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">700</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>To allow efficient updating of edge weights, an aggregating iterator can be
configured to add the value of all mutations applied with the same key. These types
of tables can easily be created from raw events by simply extracting the entities,
attributes, and relationships from individual events and inserting the keys into
Accumulo each with a count of 1. The aggregating iterator will take care of
maintaining the edge weights.</p>
</div>
</div>
<div class="sect2">
<h3 id="_document_partitioned_indexing">7.6. Document-Partitioned Indexing</h3>
<div class="paragraph">
<p>Using a simple index as described above works well when looking for records that
match one of a set of given criteria. When looking for records that match more than
one criterion simultaneously, such as when looking for documents that contain all of
the words &#8216;the&#8217; and &#8216;white&#8217; and &#8216;house&#8217;, there are several issues.</p>
</div>
<div class="paragraph">
<p>First is that the set of all records matching any one of the search terms must be sent
to the client, which incurs a lot of network traffic. The second problem is that the
client is responsible for performing set intersection on the sets of records returned
to eliminate all but the records matching all search terms. The memory of the client
may easily be overwhelmed during this operation.</p>
</div>
<div class="paragraph">
<p>For these reasons Accumulo includes support for a scheme known as sharded
indexing, in which these set operations can be performed at the TabletServers and
decisions about which records to include in the result set can be made without
incurring network traffic.</p>
</div>
<div class="paragraph">
<p>This is accomplished via partitioning records into bins that each reside on at most
one TabletServer, and then creating an index of terms per record within each bin as
follows:</p>
</div>
<table class="tableblock frame-all grid-rows" style="width:75%; ">
<colgroup>
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
<col style="width:25%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-center valign-top">RowID</th>
<th class="tableblock halign-center valign-top">Column Family</th>
<th class="tableblock halign-center valign-top">Column Qualifier</th>
<th class="tableblock halign-center valign-top">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">BinID</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Term</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">DocID</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Weight</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>Documents or records are mapped into bins by a user-defined ingest application. By
storing the BinID as the RowID we ensure that all the information for a particular
bin is contained in a single tablet and hosted on a single TabletServer since
Accumulo never splits rows across tablets. Storing the Terms as column families
serves to enable fast lookups of all the documents within this bin that contain the
given term.</p>
</div>
<div class="paragraph">
<p>Finally, we perform set intersection operations on the TabletServer via a special
iterator called the Intersecting Iterator. Since documents are partitioned into many
bins, a search of all documents must search every bin. We can use the BatchScanner
to scan all bins in parallel. The Intersecting Iterator should be enabled on a
BatchScanner within user query code as follows:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Text[] terms = {new Text("the"), new Text("white"), new Text("house")};
BatchScanner bscan = conn.createBatchScanner(table, auths, 20);
IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class);
IntersectingIterator.setColumnFamilies(iter, terms);
bscan.addScanIterator(iter);
bscan.setRanges(Collections.singleton(new Range()));
for(Entry&lt;Key,Value&gt; entry : bscan) {
System.out.println(" " + entry.getKey().getColumnQualifier());
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>This code effectively has the BatchScanner scan all tablets of a table, looking for
documents that match all the given terms. Because all tablets are being scanned for
every query, each query is more expensive than other Accumulo scans, which
typically involve a small number of TabletServers. This reduces the number of
concurrent queries supported and is subject to what is known as the &#8216;straggler&#8217;
problem in which every query runs as slow as the slowest server participating.</p>
</div>
<div class="paragraph">
<p>Of course, fast servers will return their results to the client which can display them
to the user immediately while they wait for the rest of the results to arrive. If the
results are unordered this is quite effective as the first results to arrive are as good
as any others to the user.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_high_speed_ingest">8. High-Speed Ingest</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Accumulo is often used as part of a larger data processing and storage system. To
maximize the performance of a parallel system involving Accumulo, the ingestion
and query components should be designed to provide enough parallelism and
concurrency to avoid creating bottlenecks for users and other systems writing to
and reading from Accumulo. There are several ways to achieve high ingest
performance.</p>
</div>
<div class="sect2">
<h3 id="_pre_splitting_new_tables">8.1. Pre-Splitting New Tables</h3>
<div class="paragraph">
<p>New tables consist of a single tablet by default. As mutations are applied, the table
grows and splits into multiple tablets which are balanced by the Master across
TabletServers. This implies that the aggregate ingest rate will be limited to fewer
servers than are available within the cluster until the table has reached the point
where there are tablets on every TabletServer.</p>
</div>
<div class="paragraph">
<p>Pre-splitting a table ensures that there are as many tablets as desired available
before ingest begins to take advantage of all the parallelism possible with the cluster
hardware. Tables can be split at any time by using the shell:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; addsplits -sf /local_splitfile -t mytable</pre>
</div>
</div>
<div class="paragraph">
<p>For the purposes of providing parallelism to ingest it is not necessary to create more
tablets than there are physical machines within the cluster as the aggregate ingest
rate is a function of the number of physical machines. Note that the aggregate ingest
rate is still subject to the number of machines running ingest clients, and the
distribution of rowIDs across the table. The aggregation ingest rate will be
suboptimal if there are many inserts into a small number of rowIDs.</p>
</div>
</div>
<div class="sect2">
<h3 id="_multiple_ingester_clients">8.2. Multiple Ingester Clients</h3>
<div class="paragraph">
<p>Accumulo is capable of scaling to very high rates of ingest, which is dependent upon
not just the number of TabletServers in operation but also the number of ingest
clients. This is because a single client, while capable of batching mutations and
sending them to all TabletServers, is ultimately limited by the amount of data that
can be processed on a single machine. The aggregate ingest rate will scale linearly
with the number of clients up to the point at which either the aggregate I/O of
TabletServers or total network bandwidth capacity is reached.</p>
</div>
<div class="paragraph">
<p>In operational settings where high rates of ingest are paramount, clusters are often
configured to dedicate some number of machines solely to running Ingester Clients.
The exact ratio of clients to TabletServers necessary for optimum ingestion rates
will vary according to the distribution of resources per machine and by data type.</p>
</div>
</div>
<div class="sect2">
<h3 id="_bulk_ingest">8.3. Bulk Ingest</h3>
<div class="paragraph">
<p>Accumulo supports the ability to import files produced by an external process such
as MapReduce into an existing table. In some cases it may be faster to load data this
way rather than via ingesting through clients using BatchWriters. This allows a large
number of machines to format data the way Accumulo expects. The new files can
then simply be introduced to Accumulo via a shell command.</p>
</div>
<div class="paragraph">
<p>To configure MapReduce to format data in preparation for bulk loading, the job
should be set to use a range partitioner instead of the default hash partitioner. The
range partitioner uses the split points of the Accumulo table that will receive the
data. The split points can be obtained from the shell and used by the MapReduce
RangePartitioner. Note that this is only useful if the existing table is already split
into multiple tablets.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; getsplits
aa
ab
ac
...
zx
zy
zz</pre>
</div>
</div>
<div class="paragraph">
<p>Run the MapReduce job, using the AccumuloFileOutputFormat to create the files to
be introduced to Accumulo. Once this is complete, the files can be added to
Accumulo via the shell:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@myinstance mytable&gt; importdirectory /files_dir /failures</pre>
</div>
</div>
<div class="paragraph">
<p>Note that the paths referenced are directories within the same HDFS instance over
which Accumulo is running. Accumulo places any files that failed to be added to the
second directory specified.</p>
</div>
<div class="paragraph">
<p>A complete example of using Bulk Ingest can be found at
<code>accumulo/docs/examples/README.bulkIngest</code>.</p>
</div>
</div>
<div class="sect2">
<h3 id="_logical_time_for_bulk_ingest">8.4. Logical Time for Bulk Ingest</h3>
<div class="paragraph">
<p>Logical time is important for bulk imported data, for which the client code may
be choosing a timestamp. At bulk import time, the user can choose to enable
logical time for the set of files being imported. When its enabled, Accumulo
uses a specialized system iterator to lazily set times in a bulk imported file.
This mechanism guarantees that times set by unsynchronized multi-node
applications (such as those running on MapReduce) will maintain some semblance
of causal ordering. This mitigates the problem of the time being wrong on the
system that created the file for bulk import. These times are not set when the
file is imported, but whenever it is read by scans or compactions. At import, a
time is obtained and always used by the specialized system iterator to set that
time.</p>
</div>
<div class="paragraph">
<p>The timestamp assigned by Accumulo will be the same for every key in the file.
This could cause problems if the file contains multiple keys that are identical
except for the timestamp. In this case, the sort order of the keys will be
undefined. This could occur if an insert and an update were in the same bulk
import file.</p>
</div>
</div>
<div class="sect2">
<h3 id="_mapreduce_ingest">8.5. MapReduce Ingest</h3>
<div class="paragraph">
<p>It is possible to efficiently write many mutations to Accumulo in parallel via a
MapReduce job. In this scenario the MapReduce is written to process data that lives
in HDFS and write mutations to Accumulo using the AccumuloOutputFormat. See
the MapReduce section under Analytics for details.</p>
</div>
<div class="paragraph">
<p>An example of using MapReduce can be found under
<code>accumulo/docs/examples/README.mapred</code>.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_analytics">9. Analytics</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Accumulo supports more advanced data processing than simply keeping keys
sorted and performing efficient lookups. Analytics can be developed by using
MapReduce and Iterators in conjunction with Accumulo tables.</p>
</div>
<div class="sect2">
<h3 id="_mapreduce">9.1. MapReduce</h3>
<div class="paragraph">
<p>Accumulo tables can be used as the source and destination of MapReduce jobs. To
use an Accumulo table with a MapReduce job (specifically with the new Hadoop API
as of version 0.20), configure the job parameters to use the AccumuloInputFormat
and AccumuloOutputFormat. Accumulo specific parameters can be set via these
two format classes to do the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Authenticate and provide user credentials for the input</p>
</li>
<li>
<p>Restrict the scan to a range of rows</p>
</li>
<li>
<p>Restrict the input to a subset of available columns</p>
</li>
</ul>
</div>
<div class="sect3">
<h4 id="_mapper_and_reducer_classes">9.1.1. Mapper and Reducer classes</h4>
<div class="paragraph">
<p>To read from an Accumulo table create a Mapper with the following class
parameterization and be sure to configure the AccumuloInputFormat.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">class MyMapper extends Mapper&lt;Key,Value,WritableComparable,Writable&gt; {
public void map(Key k, Value v, Context c) {
// transform key and value data here
}
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>To write to an Accumulo table, create a Reducer with the following class
parameterization and be sure to configure the AccumuloOutputFormat. The key
emitted from the Reducer identifies the table to which the mutation is sent. This
allows a single Reducer to write to more than one table if desired. A default table
can be configured using the AccumuloOutputFormat, in which case the output table
name does not have to be passed to the Context object within the Reducer.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">class MyReducer extends Reducer&lt;WritableComparable, Writable, Text, Mutation&gt; {
public void reduce(WritableComparable key, Iterable&lt;Text&gt; values, Context c) {
Mutation m;
// create the mutation based on input key and value
c.write(new Text("output-table"), m);
}
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>The Text object passed as the output should contain the name of the table to which
this mutation should be applied. The Text can be null in which case the mutation
will be applied to the default table name specified in the AccumuloOutputFormat
options.</p>
</div>
</div>
<div class="sect3">
<h4 id="_accumuloinputformat_options">9.1.2. AccumuloInputFormat options</h4>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Job job = new Job(getConf());
AccumuloInputFormat.setInputInfo(job,
"user",
"passwd".getBytes(),
"table",
new Authorizations());
AccumuloInputFormat.setZooKeeperInstance(job, "myinstance",
"zooserver-one,zooserver-two");</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>Optional Settings:</strong></p>
</div>
<div class="paragraph">
<p>To restrict Accumulo to a set of row ranges:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">ArrayList&lt;Range&gt; ranges = new ArrayList&lt;Range&gt;();
// populate array list of row ranges ...
AccumuloInputFormat.setRanges(job, ranges);</code></pre>
</div>
</div>
<div class="paragraph">
<p>To restrict Accumulo to a list of columns:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">ArrayList&lt;Pair&lt;Text,Text&gt;&gt; columns = new ArrayList&lt;Pair&lt;Text,Text&gt;&gt;();
// populate list of columns
AccumuloInputFormat.fetchColumns(job, columns);</code></pre>
</div>
</div>
<div class="paragraph">
<p>To use a regular expression to match row IDs:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">IteratorSetting is = new IteratorSetting(30, RexExFilter.class);
RegExFilter.setRegexs(is, ".*suffix", null, null, null, true);
AccumuloInputFormat.addIterator(job, is);</code></pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_accumulomultitableinputformat_options">9.1.3. AccumuloMultiTableInputFormat options</h4>
<div class="paragraph">
<p>The AccumuloMultiTableInputFormat allows the scanning over multiple tables
in a single MapReduce job. Separate ranges, columns, and iterators can be
used for each table.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">InputTableConfig tableOneConfig = new InputTableConfig();
InputTableConfig tableTwoConfig = new InputTableConfig();</code></pre>
</div>
</div>
<div class="paragraph">
<p>To set the configuration objects on the job:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Map&lt;String, InputTableConfig&gt; configs = new HashMap&lt;String,InputTableConfig&gt;();
configs.put("table1", tableOneConfig);
configs.put("table2", tableTwoConfig);
AccumuloMultiTableInputFormat.setInputTableConfigs(job, configs);</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>Optional settings:</strong></p>
</div>
<div class="paragraph">
<p>To restrict to a set of ranges:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">ArrayList&lt;Range&gt; tableOneRanges = new ArrayList&lt;Range&gt;();
ArrayList&lt;Range&gt; tableTwoRanges = new ArrayList&lt;Range&gt;();
// populate array lists of row ranges for tables...
tableOneConfig.setRanges(tableOneRanges);
tableTwoConfig.setRanges(tableTwoRanges);</code></pre>
</div>
</div>
<div class="paragraph">
<p>To restrict Accumulo to a list of columns:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">ArrayList&lt;Pair&lt;Text,Text&gt;&gt; tableOneColumns = new ArrayList&lt;Pair&lt;Text,Text&gt;&gt;();
ArrayList&lt;Pair&lt;Text,Text&gt;&gt; tableTwoColumns = new ArrayList&lt;Pair&lt;Text,Text&gt;&gt;();
// populate lists of columns for each of the tables ...
tableOneConfig.fetchColumns(tableOneColumns);
tableTwoConfig.fetchColumns(tableTwoColumns);</code></pre>
</div>
</div>
<div class="paragraph">
<p>To set scan iterators:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">List&lt;IteratorSetting&gt; tableOneIterators = new ArrayList&lt;IteratorSetting&gt;();
List&lt;IteratorSetting&gt; tableTwoIterators = new ArrayList&lt;IteratorSetting&gt;();
// populate the lists of iterator settings for each of the tables ...
tableOneConfig.setIterators(tableOneIterators);
tableTwoConfig.setIterators(tableTwoIterators);</code></pre>
</div>
</div>
<div class="paragraph">
<p>The name of the table can be retrieved from the input split:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">class MyMapper extends Mapper&lt;Key,Value,WritableComparable,Writable&gt; {
public void map(Key k, Value v, Context c) {
RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
String tableName = split.getTableName();
// do something with table name
}
}</code></pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_accumulooutputformat_options">9.1.4. AccumuloOutputFormat options</h4>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">boolean createTables = true;
String defaultTable = "mytable";
AccumuloOutputFormat.setOutputInfo(job,
"user",
"passwd".getBytes(),
createTables,
defaultTable);
AccumuloOutputFormat.setZooKeeperInstance(job, "myinstance",
"zooserver-one,zooserver-two");</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>Optional Settings:</strong></p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">AccumuloOutputFormat.setMaxLatency(job, 300000); // milliseconds
AccumuloOutputFormat.setMaxMutationBufferSize(job, 50000000); // bytes</code></pre>
</div>
</div>
<div class="paragraph">
<p>An example of using MapReduce with Accumulo can be found at
<code>accumulo/docs/examples/README.mapred</code>.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_combiners_2">9.2. Combiners</h3>
<div class="paragraph">
<p>Many applications can benefit from the ability to aggregate values across common
keys. This can be done via Combiner iterators and is similar to the Reduce step in
MapReduce. This provides the ability to define online, incrementally updated
analytics without the overhead or latency associated with batch-oriented
MapReduce jobs.</p>
</div>
<div class="paragraph">
<p>All that is needed to aggregate values of a table is to identify the fields over which
values will be grouped, insert mutations with those fields as the key, and configure
the table with a combining iterator that supports the summarizing operation
desired.</p>
</div>
<div class="paragraph">
<p>The only restriction on an combining iterator is that the combiner developer
should not assume that all values for a given key have been seen, since new
mutations can be inserted at anytime. This precludes using the total number of
values in the aggregation such as when calculating an average, for example.</p>
</div>
<div class="sect3">
<h4 id="_feature_vectors">9.2.1. Feature Vectors</h4>
<div class="paragraph">
<p>An interesting use of combining iterators within an Accumulo table is to store
feature vectors for use in machine learning algorithms. For example, many
algorithms such as k-means clustering, support vector machines, anomaly detection,
etc. use the concept of a feature vector and the calculation of distance metrics to
learn a particular model. The columns in an Accumulo table can be used to efficiently
store sparse features and their weights to be incrementally updated via the use of an
combining iterator.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_statistical_modeling">9.3. Statistical Modeling</h3>
<div class="paragraph">
<p>Statistical models that need to be updated by many machines in parallel could be
similarly stored within an Accumulo table. For example, a MapReduce job that is
iteratively updating a global statistical model could have each map or reduce worker
reference the parts of the model to be read and updated through an embedded
Accumulo client.</p>
</div>
<div class="paragraph">
<p>Using Accumulo this way enables efficient and fast lookups and updates of small
pieces of information in a random access pattern, which is complementary to
MapReduce&#8217;s sequential access model.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_security">10. Security</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Accumulo extends the BigTable data model to implement a security mechanism
known as cell-level security. Every key-value pair has its own security label, stored
under the column visibility element of the key, which is used to determine whether
a given user meets the security requirements to read the value. This enables data of
various security levels to be stored within the same row, and users of varying
degrees of access to query the same table, while preserving data confidentiality.</p>
</div>
<div class="sect2">
<h3 id="_security_label_expressions">10.1. Security Label Expressions</h3>
<div class="paragraph">
<p>When mutations are applied, users can specify a security label for each value. This is
done as the Mutation is created by passing a ColumnVisibility object to the put()
method:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Text rowID = new Text("row1");
Text colFam = new Text("myColFam");
Text colQual = new Text("myColQual");
ColumnVisibility colVis = new ColumnVisibility("public");
long timestamp = System.currentTimeMillis();
Value value = new Value("myValue");
Mutation mutation = new Mutation(rowID);
mutation.put(colFam, colQual, colVis, timestamp, value);</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_security_label_expression_syntax">10.2. Security Label Expression Syntax</h3>
<div class="paragraph">
<p>Security labels consist of a set of user-defined tokens that are required to read the
value the label is associated with. The set of tokens required can be specified using
syntax that supports logical AND <code>&</code> and OR <code>|</code> combinations of tokens, as well as nesting
groups <code>()</code> of tokens together.</p>
</div>
<div class="paragraph">
<p>Each term is comprised of one to many alpha-numeric characters, hyphens, underscores
or periods. Optionally, each term may be wrapped in quotation marks which removes the
restriction on valid characters. In quoted terms, quotation marks and backslash characters
can be used as characters in the term by escaping them with a backslash.</p>
</div>
<div class="paragraph">
<p>For example, suppose within our organization we want to label our data values with
security labels defined in terms of user roles. We might have tokens such as:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>admin
audit
system</pre>
</div>
</div>
<div class="paragraph">
<p>These can be specified alone or combined using logical operators:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>// Users must have admin privileges
admin
// Users must have admin and audit privileges
admin&amp;audit
// Users with either admin or audit privileges
admin|audit
// Users must have audit and one or both of admin or system
(admin|system)&amp;audit</pre>
</div>
</div>
<div class="paragraph">
<p>When both <code>|</code> and <code>&amp;</code> operators are used, parentheses must be used to specify
precedence of the operators.</p>
</div>
</div>
<div class="sect2">
<h3 id="_authorization">10.3. Authorization</h3>
<div class="paragraph">
<p>When clients attempt to read data from Accumulo, any security labels present are
examined against the set of authorizations passed by the client code when the
Scanner or BatchScanner are created. If the authorizations are determined to be
insufficient to satisfy the security label, the value is suppressed from the set of
results sent back to the client.</p>
</div>
<div class="paragraph">
<p>Authorizations are specified as a comma-separated list of tokens the user possesses:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">// user possesses both admin and system level access
Authorization auths = new Authorization("admin","system");
Scanner s = connector.createScanner("table", auths);</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_user_authorizations">10.4. User Authorizations</h3>
<div class="paragraph">
<p>Each Accumulo user has a set of associated security labels. To manipulate
these in the shell while using the default authorizor, use the setuaths and getauths commands.
These may also be modified for the default authorizor using the java security operations API.</p>
</div>
<div class="paragraph">
<p>When a user creates a scanner a set of Authorizations is passed. If the
authorizations passed to the scanner are not a subset of the users
authorizations, then an exception will be thrown.</p>
</div>
<div class="paragraph">
<p>To prevent users from writing data they can not read, add the visibility
constraint to a table. Use the -evc option in the createtable shell command to
enable this constraint. For existing tables use the following shell command to
enable the visibility constraint. Ensure the constraint number does not
conflict with any existing constraints.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>config -t table -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint</pre>
</div>
</div>
<div class="paragraph">
<p>Any user with the alter table permission can add or remove this constraint.
This constraint is not applied to bulk imported data, if this a concern then
disable the bulk import permission.</p>
</div>
</div>
<div class="sect2">
<h3 id="_pluggable_security">10.5. Pluggable Security</h3>
<div class="paragraph">
<p>New in 1.5 of Accumulo is a pluggable security mechanism. It can be broken into three actions&#8201;&#8212;&#8201;authentication, authorization, and permission handling. By default all of these are handled in
Zookeeper, which is how things were handled in Accumulo 1.4 and before. It is worth noting at this
point, that it is a new feature in 1.5 and may be adjusted in future releases without the standard
deprecation cycle.</p>
</div>
<div class="paragraph">
<p>Authentication simply handles the ability for a user to verify their integrity. A combination of
principal and authentication token are used to verify a user is who they say they are. An
authentication token should be constructed, either directly through its constructor, but it is
advised to use the <code>init(Property)</code> method to populate an authentication token. It is expected that a
user knows what the appropriate token to use for their system is. The default token is
<code>PasswordToken</code>.</p>
</div>
<div class="paragraph">
<p>Once a user is authenticated by the Authenticator, the user has access to the other actions within
Accumulo. All actions in Accumulo are ACLed, and this ACL check is handled by the Permission
Handler. This is what manages all of the permissions, which are divided in system and per table
level. From there, if a user is doing an action which requires authorizations, the Authorizor is
queried to determine what authorizations the user has.</p>
</div>
<div class="paragraph">
<p>This setup allows a variety of different mechanisms to be used for handling different aspects of
Accumulo&#8217;s security. A system like Kerberos can be used for authentication, then a system like LDAP
could be used to determine if a user has a specific permission, and then it may default back to the
default ZookeeperAuthorizor to determine what Authorizations a user is ultimately allowed to use.
This is a pluggable system so custom components can be created depending on your need.</p>
</div>
</div>
<div class="sect2">
<h3 id="_secure_authorizations_handling">10.6. Secure Authorizations Handling</h3>
<div class="paragraph">
<p>For applications serving many users, it is not expected that an Accumulo user
will be created for each application user. In this case an Accumulo user with
all authorizations needed by any of the applications users must be created. To
service queries, the application should create a scanner with the application
user&#8217;s authorizations. These authorizations could be obtained from a trusted 3rd
party.</p>
</div>
<div class="paragraph">
<p>Often production systems will integrate with Public-Key Infrastructure (PKI) and
designate client code within the query layer to negotiate with PKI servers in order
to authenticate users and retrieve their authorization tokens (credentials). This
requires users to specify only the information necessary to authenticate themselves
to the system. Once user identity is established, their credentials can be accessed by
the client code and passed to Accumulo outside of the reach of the user.</p>
</div>
</div>
<div class="sect2">
<h3 id="_query_services_layer">10.7. Query Services Layer</h3>
<div class="paragraph">
<p>Since the primary method of interaction with Accumulo is through the Java API,
production environments often call for the implementation of a Query layer. This
can be done using web services in containers such as Apache Tomcat, but is not a
requirement. The Query Services Layer provides a mechanism for providing a
platform on which user facing applications can be built. This allows the application
designers to isolate potentially complex query logic, and enables a convenient point
at which to perform essential security functions.</p>
</div>
<div class="paragraph">
<p>Several production environments choose to implement authentication at this layer,
where users identifiers are used to retrieve their access credentials which are then
cached within the query layer and presented to Accumulo through the
Authorizations mechanism.</p>
</div>
<div class="paragraph">
<p>Typically, the query services layer sits between Accumulo and user workstations.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_ssl">11. SSL</h2>
<div class="sectionbody">
<p>
Accumulo, through Thrift's TSSLTransport, provides the ability to encrypt
wire communication between Accumulo servers and clients using secure
sockets layer (SSL). SSL certifcates signed by the same certificate authority
control the "circle of trust" in which a secure connection can be established.
Typically, each host running Accumulo processes would be given a certificate
which identifies itself.
Clients can optionally also be given a certificate, when client-auth is enabled,
which prevents unwanted clients from accessing the system. The SSL integration
presently provides no authentication support within Accumulo (an Accumulo username
and password are still required) and is only used to establish a means for
secure communication.
</p>
</div>
<div class="sect2">
<h3 id="_ssl_server_configuration">11.1. Server Configuration</h3>
<div class="paragraph">
<p>As previously mentioned, the circle of trust is established by the certificate
authority which created the certificates in use. Because of the tight coupling
of certificate generation with an organization's policies, Accumulo does not
provide a method in which to automatically create the necessary SSL components.</p>
</div>
<div class="paragraph">
<p>Administrators without existing infrastructure built on SSL are encourage to
use OpenSSL and the \texttt{keytool} command. An example of these commands are
included in a section below. Accumulo servers require a certificate and keystore,
in the form of Java KeyStores, to enable SSL. The following configuration assumes
these files already exist.</p>
</div>
<div class="paragraph">
<p>In <code>$ACCUMULO_CONF_DIR/accumulo-site.xml</code>, the following properties are required:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight">
rpc.javax.net.ssl.keyStore=The path on the local filesystem to the keystore containing the server's certificate
rpc.javax.net.ssl.keyStorePassword=The password for the keystore containing the server's certificate
rpc.javax.net.ssl.trustStore=The path on the local filesystem to the keystore containing the certificate authority's public key
rpc.javax.net.ssl.trustStorePassword=The password for the keystore containing the certificate authority's public key
instance.rpc.ssl.enabled=true
</pre>
</div>
</div>
<div class="paragraph">
<p>Optionally, SSL client-authentication (two-way SSL) can also be enabled by setting
<code>instance.rpc.ssl.clientAuth=true</code> in <code>$ACCUMULO_CONF_DIR/accumulo-site.xml</code>.
This requires that each client has access to valid certificate to set up a secure connection
to the servers. By default, Accumulo uses one-way SSL which does not require clients to have
their own certificate.</p>
</div>
</div>
<div class="sect2">
<h3 id="_ssl_client_configuration">11.2. Client Configuration</h3>
<div class="paragraph">
<p>To establish a connection to Accumulo servers, each client must also have
special configuration. This is typically accomplished through the use of
the client configuration file whose default location is <code>~/.accumulo/config</code>.</p>
</div>
<div class="paragraph">
<p>The following properties must be set to connect to an Accumulo instance using SSL:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight">
rpc.javax.net.ssl.trustStore=The path on the local filesystem to the keystore containing the certificate authority's public key
rpc.javax.net.ssl.trustStorePassword=The password for the keystore containing the certificate authority's public key
instance.rpc.ssl.enabled=true</pre>
</div>
</div>
<div class="paragraph">
<p>If two-way SSL if enabled (\texttt{instance.rpc.ssl.clientAuth=true}) for the instance, the client must also define
their own certificate and enable client authenticate as well.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight">
rpc.javax.net.ssl.keyStore=The path on the local filesystem to the keystore containing the server's certificate
rpc.javax.net.ssl.keyStorePassword=The password for the keystore containing the server's certificate
instance.rpc.ssl.clientAuth=true</pre>
</div>
</div>
</div>
<div class="sect3">
<h3 id="_ssl_generate_ssl_material_openssl">11.3. Generating SSL material using OpenSSL</h3>
<div class="paragraph">
<p>The following is included as an example for generating your own SSL material (certificate authority and server/client
certificates) using OpenSSL and Java's KeyTool command.</p>
</div>
<div class="sect2">
<h4 id="_ssl_generate_ca">11.3.1. Generating a certificate authority</h3>
<div class="literalblock">
<div class="content">
<pre>
# Create a private key
openssl genrsa -des3 -out root.key 4096
# Create a certificate request using the private key
openssl req -x509 -new -key root.key -days 365 -out root.pem
# Generate a Base64-encoded version of the PEM just created
openssl x509 -outform der -in root.pem -out root.der
# Import the key into a Java KeyStore
keytool -import -alias root-key -keystore truststore.jks -file root.der
# Remove the DER formatted key file (as we don't need it anymore)
rm root.der
</pre>
</div>
</div>
<div class="paragraph">
<p>The <code>truststore.jks</code> file is the Java keystore which contains the certificate authority's public key.</p>
</div>
</div>
<div class="sect2">
<h4 id="_ssl_generate_certs">11.3.2. Generating a certificate/keystore per host</h3>
<div class="paragraph">
<p>It's common that each host in the instance is issued its own certificate (notably to ensure that revocation procedures
can be easily followed). The following steps can be taken for each host.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>
# Create the private key for our server
openssl genrsa -out server.key 4096
# Generate a certificate signing request (CSR) with our private key
openssl req -new -key server.key -out server.csr
# Use the CSR and the CA to create a certificate for the server (a reply to the CSR)
openssl x509 -req -in server.csr -CA root.pem -CAkey root.key -CAcreateserial \
-out server.crt -days 365
# Use the certificate and the private key for our server to create PKCS12 file
openssl pkcs12 -export -in server.crt -inkey server.key -certfile server.crt \
-name 'server-key' -out server.p12
# Create a Java KeyStore for the server using the PKCS12 file (private key)
keytool -importkeystore -srckeystore server.p12 -srcstoretype pkcs12 -destkeystore \
server.jks -deststoretype JKS
# Remove the PKCS12 file as we don't need it
rm server.p12
# Import the CA-signed certificate to the keystore
keytool -import -trustcacerts -alias server-crt -file server.crt -keystore server.jks
</pre>
</div>
</div>
<div class="paragraph">
<p>The <code>server.jks</code> file is the Java keystore containing the certificate for a given host. The above
methods are equivalent whether the certficate is generate for an Accumulo server or a client.</p>
</div>
</div>
<div class="sect1">
<h2 id="_implementation_details">12. Implementation Details</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_implementation_fate">12.1. Fault-Tolerant Executor (FATE)</h3>
<div class="paragraph">
<p>Accumulo must implement a number of distributed, multi-step operations to support
the client API. Creating a new table is a simple example of an atomic client call
which requires multiple steps in the implementation: get a unique table ID, configure
default table permissions, populate information in ZooKeeper to record the table's
existence, create directories in HDFS for the table's data, etc. Implementing these
steps in a way that is tolerant to node failure and other concurrent operations is
very difficult to achieve. Accumulo includes a Fault-Tolerant Executor (FATE) which
is widely used server-side to implement the client API safely and correctly.
FATE is the implementation detail which ensures that tables in creation when the
Master dies will be successfully created when another Master process is started.
This alleviates the need for any external tools to correct some bad state -- Accumulo can
undo the failure and self-heal without any external intervention.</p>
</div>
<div class="sect3">
<h4 id="_implementation_fate_overview">12.1.1 Overview</h3>
<div class="paragraph">
<p>FATE consists of two primary components: a repeatable, persisted operation (REPO), a storage
layer for REPOs and an execution system to run REPOs. Accumulo uses ZooKeeper as the storage
layer for FATE and the Accumulo Master acts as the execution system to run REPOs.
The important characteristic of REPOs are that they implemented in a way that is idempotent:
every operation must be able to undo or replay a partial execution of itself. Requiring the
implementation of the operation to support this functional greatly simplifies the execution
of these operations. This property is also what guarantees safety in light of failure conditions.</P>
</div>
<div class="sect3">
<h4 id="_implementation_fate_administration">12.1.2 Administration</h3>
<div class="paragraph">
<p>Sometimes, it is useful to inspect the current FATE operations, both pending and executing.
For example, a command that is not completing could be blocked on the execution of another
operation. Accumulo provides an Accumulo shell command to interact with fate.</p>
</div>
<div class="paragraph">
<p>The <code>fate</code> shell command accepts a number of arguments for different functionality:
<code>list</code>/<code>print</code>, <code>fail</code>, <code>delete</code>.</p>
</div>
<div class="sect3">
<h4 id="_implementation_fate_list_print">12.1.3 List/Print</h3>
<div class="paragraph">
<p>Without any additional arguments, this command will print all operations that still exist in
the FATE store (ZooKeeper). This will include active, pending, and completed operations (completed
operations are lazily removed from the store). Each operation includes a unique "transaction ID", the
state of the operation (e.g. <code>NEW</code>, <code>IN_PROGRESS</code>, <code>FAILED</code>), any locks the
transaction actively holds and any locks it is waiting to acquire.</p>
</div>
<div class="paragraph">
<p>This option can also accept transaction IDs which will restrict the list of transactions shown. </p>
</div>
<div class="sect3">
<h4 id="_implementation_fate_fail">12.1.4 Fail</h3>
<div class="paragraph">
<p>This command can be used to manually fail a FATE transaction and requires a transaction ID
as an argument. Failing an operation is not a normal procedure and should only be performed
by an administrator who understands the implications of why they are failing the operation.</p>
</div>
<div class="sect3">
<h4 id="_implementation_fate_delete">12.1.5 Delete</h3>
<div class="paragraph">
<p>This command requires a transaction ID and will delete any locks that the transaction
holds. Like the fail command, this command should only be used in extreme circumstances
by an administrator that understands the implications of the command they are about to
invoke. It is not normal to invoke this command.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_administration">13. Administration</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_hardware">13.1. Hardware</h3>
<div class="paragraph">
<p>Because we are running essentially two or three systems simultaneously layered
across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have
at least one core and 2 - 4 GB each.</p>
</div>
<div class="paragraph">
<p>One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks.</p>
</div>
<div class="paragraph">
<p>It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB
each, but in this case it is recommended to only run up to two processes per
machine&#8201;&#8212;&#8201;i.e. DataNode and TabletServer or DataNode and MapReduce worker but
not all three. The constraint here is having enough available heap space for all the
processes on a machine.</p>
</div>
</div>
<div class="sect2">
<h3 id="_network">13.2. Network</h3>
<div class="paragraph">
<p>Accumulo communicates via remote procedure calls over TCP/IP for both passing
data and control messages. In addition, Accumulo uses HDFS clients to
communicate with HDFS. To achieve good ingest and query performance, sufficient
network bandwidth must be available between any two machines.</p>
</div>
<div class="paragraph">
<p>In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will
use the following default ports. Please make sure that they are open, or change
their value in conf/accumulo-site.xml.</p>
</div>
<table class="tableblock frame-all grid-all" style="width:75%; ">
<caption class="title">Table 1. Accumulo default ports</caption>
<colgroup>
<col style="width:20%;">
<col style="width:40%;">
<col style="width:40%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-right valign-top">Port</th>
<th class="tableblock halign-center valign-top">Description</th>
<th class="tableblock halign-center valign-top">Property Name</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">4445</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Shutdown Port (Accumulo MiniCluster)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">n/a</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">4560</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo monitor (for centralized log display)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">monitor.port.log4j</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">9997</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Tablet Server</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">tserver.port.client</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">9999</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Master Server</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">master.port.client</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">12234</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo Tracer</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">trace.port.client</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">42424</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo Proxy Server</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">n/a</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">50091</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo GC</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">gc.port.client</p></td>
</tr>
<tr>
<td class="tableblock halign-right valign-top"><p class="tableblock">50095</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Accumulo HTTP monitor</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">monitor.port.client</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>In addition, the user can provide <code>0</code> and an ephemeral port will be chosen instead. This
ephemeral port is likely to be unique and not already bound. Thus, configuring ports to
use <code>0</code> instead of an explicit value, should, in most cases, work around any issues of
running multiple distinct Accumulo instances (or any other process which tries to use the
same default ports) on the same hardware.</p>
</div>
</div>
<div class="sect2">
<h3 id="_installation">13.3. Installation</h3>
<div class="paragraph">
<p>Choose a directory for the Accumulo installation. This directory will be referenced
by the environment variable <code>$ACCUMULO_HOME</code>. Run the following:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ tar xzf accumulo-1.6.0-bin.tar.gz # unpack to subdirectory
$ mv accumulo-1.6.0 $ACCUMULO_HOME # move to desired location</pre>
</div>
</div>
<div class="paragraph">
<p>Repeat this step at each machine within the cluster. Usually all machines have the
same <code>$ACCUMULO_HOME</code>.</p>
</div>
</div>
<div class="sect2">
<h3 id="_dependencies">13.4. Dependencies</h3>
<div class="paragraph">
<p>Accumulo requires HDFS and ZooKeeper to be configured and running
before starting. Password-less SSH should be configured between at least the
Accumulo master and TabletServer machines. It is also a good idea to run Network
Time Protocol (NTP) within the cluster to ensure nodes' clocks don&#8217;t get too out of
sync, which can cause problems with automatically timestamped data.</p>
</div>
</div>
<div class="sect2">
<h3 id="_configuration_2">13.5. Configuration</h3>
<div class="paragraph">
<p>Accumulo is configured by editing several Shell and XML files found in
<code>$ACCUMULO_HOME/conf</code>. The structure closely resembles Hadoop&#8217;s configuration
files.</p>
</div>
<div class="sect3">
<h4 id="_edit_conf_accumulo_env_sh">13.5.1. Edit conf/accumulo-env.sh</h4>
<div class="paragraph">
<p>Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh
and specify the following:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Enter the location of the installation directory of Accumulo for <code>$ACCUMULO_HOME</code></p>
</li>
<li>
<p>Enter your system&#8217;s Java home for <code>$JAVA_HOME</code></p>
</li>
<li>
<p>Enter the location of Hadoop for <code>$HADOOP_PREFIX</code></p>
</li>
<li>
<p>Choose a location for Accumulo logs and enter it for <code>$ACCUMULO_LOG_DIR</code></p>
</li>
<li>
<p>Enter the location of ZooKeeper for <code>$ZOOKEEPER_HOME</code></p>
</li>
</ol>
</div>
<div class="paragraph">
<p>By default Accumulo TabletServers are set to use 1GB of memory. You may change
this by altering the value of <code>$ACCUMULO_TSERVER_OPTS</code>. Note the syntax is that of
the Java JVM command line options. This value should be less than the physical
memory of the machines running TabletServers.</p>
</div>
<div class="paragraph">
<p>There are similar options for the master&#8217;s memory usage and the garbage collector
process. Reduce these if they exceed the physical RAM of your hardware and
increase them, within the bounds of the physical RAM, if a process fails because of
insufficient memory.</p>
</div>
<div class="paragraph">
<p>Note that you will be specifying the Java heap space in accumulo-env.sh. You should
make sure that the total heap space used for the Accumulo tserver and the Hadoop
DataNode and TaskTracker is less than the available memory on each slave node in
the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop
NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
machines to allow them to use more heap space. If you are running these on the
same machine on a small cluster, likewise make sure their heap space settings fit
within the available memory.</p>
</div>
</div>
<div class="sect3">
<h4 id="_native_map">13.5.2. Native Map</h4>
<div class="paragraph">
<p>The tablet server uses a data structure called a MemTable to store sorted key/value
pairs in memory when they are first received from the client. When a minor compaction
occurs, this data structure is written to HDFS. The MemTable will default to using
memory in the JVM but a JNI version, called the native map, can be used to significantly
speed up performance by utilizing the memory space of the native operating system. The
native map also avoids the performance implications brought on by garbage collection
in the JVM by causing it to pause much less frequently.</p>
</div>
<div class="sect4">
<h5 id="_native_map_building">13.5.2.1 Building</h4>
<div class="paragraph">
<p>32-bit and 64-bit Linux and Mac OS X versions of the native map can be built from the
Accumulo bin package by executing <code>$ACCUMULO_HOME/bin/build_native_library.sh</code>.
If your system's default compiler options are insufficient, you can add additional compiler
options to the command line, such as options for the architecture. These will be passed
to the Makefile in the environment variable USERFLAGS.</p>
<p>Examples:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><code>$ACCUMULO_HOME/bin/build_native_library.sh</code></p>
</li>
<li>
<p><code>$ACCUMULO_HOME/bin/build_native_library.sh -m32</code></p>
</li>
</ol>
</div>
<div class="paragraph">
<p>After building the native map from the source, you will find the artifact in
<code>$ACCUMULO_HOME/lib/native</code>. Upon starting up, the tablet server will look
in this directory for the map library. If the file is renamed or moved from its
target directory, the tablet server may not be able to find it. The system can also
locate the native maps shared library by setting <code>LD_LIBRARY_PATH</code> (or
<code>DYLD_LIBRARY_PATH</code> on Mac OS X) in <code>$ACCUMULO_HOME/conf/accumulo-env.sh</code></p>
</div>
</div>
<div class="sect3">
<h4 id="_administration_configuration">13.5.3. Configuration</h4>
<div class="paragraph">
<p>As mentioned, Accumulo will use the native libraries if they are found in the expected
location and <code>tserver.memory.maps.native.enabled</code> is set to <code>true</code> (which is the default).
Using the native maps over JVM Maps nets a noticable improvement in ingest rates; however,
certain configuration variables are important to modify when increasing the size of the
native map.</p>
</div>
<div class="paragraph">
<p>To adjust the size of the native map, increase the value of <code>tserver.memory.maps.max</code>.
By default, the maximum size of the native map is 1GB. When increasing this value, it is
also important to adjust the values of <code>table.compaction.minor.logs.threshold</code> and
<code>tserver.walog.max.size</code>. <code>table.compaction.minor.logs.threshold</code> is the maximum
number of write-ahead log files that a tablet can reference before they will be automatically
minor compacted. <code>tserver.walog.max.size</code> is the maximum size of a write-ahead log.</p>
</div>
<div class="paragraph">
<p>The maximum size of the native maps for a server should be less than the product
of the write-ahead log maximum size and minor compaction threshold for log files:</p>
</div>
<div class="paragraph">
<p><pre>(table.compaction.minor.logs.threshold * tserver.walog.max.size >= tserver.memory.maps.max)</pre></p>
</div>
<div class="paragraph">
<p>This formula ensures that minor compactions won't be automatically triggered before the native
maps can be completely saturated.</p>
</div>
<div class="paragraph">
<p>Subsequently, when increasing the size of the write-ahead logs, it can also be important
to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log.
This is controlled via <code>tserver.wal.blocksize</code>. A basic recommendation is that when
<code>tserver.walog.max.size</code> is larger than 2GB in size, set <code>tserver.wal.blocksize</code>
to 2GB. Increasing the block size to a value larger than 2GB can result in decreased write
performance to the write-ahead log file which will slow ingest.</p>
</div>
</div>
<div class="sect3">
<h4 id="_cluster_specification">13.5.4. Cluster Specification</h4>
<div class="paragraph">
<p>On the machine that will serve as the Accumulo master:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Write the IP address or domain name of the Accumulo Master to the <code>$ACCUMULO_HOME/conf/masters</code> file.</p>
</li>
<li>
<p>Write the IP addresses or domain name of the machines that will be TabletServers in <code>$ACCUMULO_HOME/conf/slaves</code>, one per line.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>Note that if using domain names rather than IP addresses, DNS must be configured
properly for all machines participating in the cluster. DNS can be a confusing source
of errors.</p>
</div>
</div>
<div class="sect3">
<h4 id="_accumulo_settings">13.5.5. Accumulo Settings</h4>
<div class="paragraph">
<p>Specify appropriate values for the following settings in
<code>$ACCUMULO_HOME/conf/accumulo-site.xml</code> :</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;property&gt;
&lt;name&gt;instance.zookeeper.host&lt;/name&gt;
&lt;value&gt;zooserver-one:2181,zooserver-two:2181&lt;/value&gt;
&lt;description&gt;list of zookeeper servers&lt;/description&gt;
&lt;/property&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
settings between processes and helps finalize TabletServer failure.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;property&gt;
&lt;name&gt;instance.secret&lt;/name&gt;
&lt;value&gt;DEFAULT&lt;/value&gt;
&lt;/property&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>The instance needs a secret to enable secure communication between servers. Configure your
secret and make sure that the <code>accumulo-site.xml</code> file is not readable to other users.
For alternatives to storing the <code>instance.secret</code> in plaintext, please read the
<code>Sensitive Configuration Values</code> section.</p>
</div>
<div class="paragraph">
<p>Some settings can be modified via the Accumulo shell and take effect immediately, but
some settings require a process restart to take effect. See the configuration documentation
(available in the docs directory of the tarball and in <a href="#configuration">Configuration Management</a>) for details.</p>
</div>
</div>
<div class="sect3">
<h4 id="_deploy_configuration">13.5.6. Deploy Configuration</h4>
<div class="paragraph">
<p>Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml
from the <code>$ACCUMULO_HOME/conf/</code> directory on the master to all the machines
specified in the slaves file.</p>
</div>
</div>
<div class="sect3">
<h4 id="_sensitive_configuration_values">13.5.7. Sensitive Configuration Values</h4>
<div class="paragraph">
<p>Accumulo has a number of properties that can be specified via the accumulo-site.xml
file which are sensitive in nature, instance.secret and trace.token.property.password
are two common examples. Both of these properties, if compromised, have the ability
to result in data being leaked to users who should not have access to that data.</p>
</div>
<div class="paragraph">
<p>In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common
implementation to abstract away the storage and retrieval of passwords from plaintext
storage in configuration files. Any Property marked with the <code>Sensitive</code> annotation
is a candidate for use with these CredentialProviders. For version of Hadoop which lack
these classes, the feature will just be unavailable for use.</p>
</div>
<div class="paragraph">
<p>A comma separated list of CredentialProviders can be configured using the Accumulo Property
<code>general.security.credential.provider.paths</code>. Each configured URL will be consulted
when the Configuration object for accumulo-site.xml is accessed.</p>
</div>
</div>
<div class="sect3">
<h4 id="_using_a_javakeystorecredentialprovider_for_storage">13.5.8. Using a JavaKeyStoreCredentialProvider for storage</h4>
<div class="paragraph">
<p>One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider.
Each entry in the KeyStore is the Accumulo Property key name. For example, to store the
<code>instance.secret</code>, the following command can be used:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>hadoop credential create instance.secret --provider jceks://file/etc/accumulo/conf/accumulo.jceks</pre>
</div>
</div>
<div class="paragraph">
<p>The command will then prompt you to enter the secret to use and create a keystore in:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>/etc/accumulo/conf/accumulo.jceks</pre>
</div>
</div>
<div class="paragraph">
<p>Then, accumulo-site.xml must be configured to use this KeyStore as a CredentialProvider:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;property&gt;
&lt;name&gt;general.security.credential.provider.paths&lt;/name&gt;
&lt;value&gt;jceks://file/etc/accumulo/conf/accumulo.jceks&lt;/value&gt;
&lt;/property&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>This configuration will then transparently extract the <code>instance.secret</code> from
the configured KeyStore and alleviates a human readable storage of the sensitive
property.</p>
</div>
<div class="paragraph">
<p>A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to
all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server
will expect the KeyStore in the same location.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_initialization">13.6. Initialization</h3>
<div class="paragraph">
<p>Accumulo must be initialized to create the structures it uses internally to locate
data across the cluster. HDFS is required to be configured and running before
Accumulo can be initialized.</p>
</div>
<div class="paragraph">
<p>Once HDFS is started, initialization can be performed by executing
<code>$ACCUMULO_HOME/bin/accumulo init</code> . This script will prompt for a name
for this instance of Accumulo. The instance name is used to identify a set of tables
and instance-specific settings. The script will then write some information into
HDFS so Accumulo can start properly.</p>
</div>
<div class="paragraph">
<p>The initialization script will prompt you to set a root password. Once Accumulo is
initialized it can be started.</p>
</div>
</div>
<div class="sect2">
<h3 id="_running">13.7. Running</h3>
<div class="sect3">
<h4 id="_starting_accumulo">13.7.1. Starting Accumulo</h4>
<div class="paragraph">
<p>Make sure Hadoop is configured on all of the machines in the cluster, including
access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
Make sure ZooKeeper is configured and running on at least one machine in the
cluster.
Start Accumulo using the <code>bin/start-all.sh</code> script.</p>
</div>
<div class="paragraph">
<p>To verify that Accumulo is running, check the Status page as described under
<em>Monitoring</em>. In addition, the Shell can provide some information about the status of
tables via reading the metadata tables.</p>
</div>
</div>
<div class="sect3">
<h4 id="_stopping_accumulo">13.7.2. Stopping Accumulo</h4>
<div class="paragraph">
<p>To shutdown cleanly, run <code>bin/stop-all.sh</code> and the master will orchestrate the
shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may
take some time for particular configurations.</p>
</div>
</div>
<div class="sect3">
<h4 id="_adding_a_node">13.7.3. Adding a Node</h4>
<div class="paragraph">
<p>Update your <code>$ACCUMULO_HOME/conf/slaves</code> (or <code>$ACCUMULO_CONF_DIR/slaves</code>) file to account for the addition.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/accumulo admin start &lt;host(s)&gt; {&lt;host&gt; ...}</pre>
</div>
</div>
<div class="paragraph">
<p>Alternatively, you can ssh to each of the hosts you want to add and run:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/start-here.sh</pre>
</div>
</div>
<div class="paragraph">
<p>Make sure the host in question has the new configuration, or else the tablet
server won&#8217;t start; at a minimum this needs to be on the host(s) being added,
but in practice it&#8217;s good to ensure consistent configuration across all nodes.</p>
</div>
</div>
<div class="sect3">
<h4 id="_decomissioning_a_node">13.7.4. Decomissioning a Node</h4>
<div class="paragraph">
<p>If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet
server. Accumulo will automatically rebalance the tablets across the available tablet servers.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/accumulo admin stop &lt;host[:port]&gt; {&lt;host[:port]&gt; ...}</pre>
</div>
</div>
<div class="paragraph">
<p>Alternatively, you can ssh to each of the hosts you want to remove and run:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/stop-here.sh</pre>
</div>
</div>
<div class="paragraph">
<p>Be sure to update your <code>$ACCUMULO_HOME/conf/slaves</code> (or <code>$ACCUMULO_CONF_DIR/slaves</code>) file to
account for the removal of these hosts. Bear in mind that the monitor will not re-read the
slaves file automatically, so it will report the decomissioned servers as down; it&#8217;s
recommended that you restart the monitor so that the node list is up to date.</p>
</div>
</div>
<div class="sect3">
<h4 id="_restarting_process_on_a_node">13.7.5. Restarting process on a node</h4>
<div class="paragraph">
<p>Occasionally, it might be necessary to restart the processes on a specific node. In addition
to the <code>start-all.sh</code> and <code>stop-all.sh</code> scripts, Accumulo contains scripts to start/stop all processes
on a node and start/stop a given process on a node.</p>
</div>
<div class="paragraph">
<p><code>start-here.sh</code> and <code>stop-here.sh</code> will start/stop all Accumulo processes on the current node. The
necessary processes to start/stop are determined via the "hosts" files (e.g. slaves, masters, etc).
These scripts expect no arguments.</p>
</div>
<div class="paragraph">
<p><code>start-server.sh</code> can also be useful in starting a given process on a host.
The first argument to the process is the hostname of the machine. Use the same host that
you specified in hosts file (if you specified FQDN in the masters file, use the FQDN, not
the shortname). The second argument is the name of the process to start (e.g. master, tserver).</>
</div>
<div class="paragraph">
<p>The steps described to decomission a node can also be used (without removal of the host
from the <code>$ACCUMULO_HOME/conf/slaves</code> file) to gracefully stop a node. This will
ensure that the tabletserver is cleanly stopped and recovery will not need to be performed
when the tablets are re-hosted.</p>
</div>
</div>
<div class="sect3">
<h4 id="_running_multiple_tabletservers_on_a_single_node">13.7.6. Running multiple TabletServers on a single node</h4>
<div class="paragraph">
<p>With very powerful nodes, it may be beneficial to run more than one TabletServer on a given
node. This decision should be made carefully and with much deliberation as Accumulo is designed
to be able to scale to using 10's of GB of RAM and 10's of CPU cores.</p>
</div>
<div class="paragraph">
<p>To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration
directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise
for the user.</p>
</div>
<div class="paragraph">
<p>Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from
other nodes. This requires additional configuration values in <code>accumulo-site.xml</code>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>tserver.port.client</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Normally, setting a value of <code>0</code> for these configuration properties is sufficient. In some
environment, the ports used by Accumulo must be well-known for security reasons and require a
separate copy of the configuration files to use a static port for each TabletServer instance.</p>
</div>
<div class="paragraph">
<p>It is also necessary to update the following exported variables in <code>accumulo-env.sh</code>.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>ACCUMULO_LOG_DIR</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The values for these properties are left up to the user to define; there are no constraints
other than ensuring that the directory exists and the user running Accumulo has the permission
to read/write into that directory.</p>
</div>
<div class="paragraph">
<p>Accumulo's provided scripts for stopping a cluster operate under the assumption that one process
is running per host. As such, starting and stopping multiple TabletServers on one host requires
more effort on the user. It is important to ensure that <code>ACCUMULO_CONF_DIR</code> is correctly
set for the instance of the TabletServer being started.</p>
</div>
<div class="paragraph">
<p><code>$ACCUMULO_CONF_DIR=$ACCUMULO_HOME/conf $ACCUMULO_HOME/bin/accumulo tserver --address <your_server_ip> &amp;</code></p>
</div>
<div class="paragraph">
<p>To stop TabletServers, the normal <code>stop-all.sh</code> will stop all instances of TabletServers across all nodes.
Using the provided <code>kill</code> command by your operation system is an option to stop a single instance on
a single node. <code>stop-server.sh</code> can be used to stop all TabletServers on a single node.</p>
</div>
</div>
<div class="sect2">
<h3 id="_monitoring">13.8. Monitoring</h3>
<div class="paragraph">
<p>The Accumulo Master provides an interface for monitoring the status and health of
Accumulo components. This interface can be accessed by pointing a web browser to
<code><a href="http://accumulomaster:50095/status">http://accumulomaster:50095/status</a></code></p>
</div>
</div>
<div class="sect2">
<h3 id="_tracing">13.9. Tracing</h3>
<div class="paragraph">
<p>It can be difficult to determine why some operations are taking longer
than expected. For example, you may be looking up items with very low
latency, but sometimes the lookups take much longer. Determining the
cause of the delay is difficult because the system is distributed, and
the typical lookup is fast.</p>
</div>
<div class="paragraph">
<p>Accumulo has been instrumented to record the time that various
operations take when tracing is turned on. The fact that tracing is
enabled follows all the requests made on behalf of the user throughout
the distributed infrastructure of accumulo, and across all threads of
execution.</p>
</div>
<div class="paragraph">
<p>These time spans will be inserted into the <code>trace</code> table in
Accumulo. You can browse recent traces from the Accumulo monitor
page. You can also read the <code>trace</code> table directly like any
other table.</p>
</div>
<div class="paragraph">
<p>The design of Accumulo&#8217;s distributed tracing follows that of
<a href="http://research.google.com/pubs/pub36356.html">Google&#8217;s Dapper</a>.</p>
</div>
<div class="sect3">
<h4 id="_tracers">13.9.1. Tracers</h4>
<div class="paragraph">
<p>To collect traces, Accumulo needs at least one server listed in
<code>$ACCUMULO_HOME/conf/tracers</code>. The server collects traces
from clients and writes them to the <code>trace</code> table. The Accumulo
user that the tracer connects to Accumulo with can be configured with
the following properties</p>
</div>
<div class="literalblock">
<div class="content">
<pre>trace.user
trace.token.property.password</pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_instrumenting_a_client">13.9.2. Instrumenting a Client</h4>
<div class="paragraph">
<p>Tracing can be used to measure a client operation, such as a scan, as
the operation traverses the distributed system. To enable tracing for
your application call</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">DistributedTrace.enable(instance, new ZooReader(instance), hostname, "myApplication");</code></pre>
</div>
</div>
<div class="paragraph">
<p>Once tracing has been enabled, a client can wrap an operation in a trace.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Trace.on("Client Scan");
BatchScanner scanner = conn.createBatchScanner(...);
// Configure your scanner
for (Entry entry : scanner) {
}
Trace.off();</code></pre>
</div>
</div>
<div class="paragraph">
<p>Additionally, the user can create additional Spans within a Trace.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Trace.on("Client Update");
...
Span readSpan = Trace.start("Read");
...
readSpan.stop();
...
Span writeSpan = Trace.start("Write");
...
writeSpan.stop();
Trace.off();</code></pre>
</div>
</div>
<div class="paragraph">
<p>Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">...
int numberOfEntriesRead = 0;
Span readSpan = Trace.start("Read");
// Do the read, update the counter
...
readSpan.data("Number of Entries Read", String.valueOf(numberOfEntriesRead));</code></pre>
</div>
</div>
<div class="paragraph">
<p>Some client operations may have a high volume within your
application. As such, you may wish to only sample a percentage of
operations for tracing. As seen below, the CountSampler can be used to
help enable tracing for 1-in-1000 operations</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="java language-java">Sampler sampler = new CountSampler(1000);
...
if (sampler.next()) {
Trace.on("Read");
}
...
Trace.offNoFlush();</code></pre>
</div>
</div>
<div class="paragraph">
<p>It should be noted that it is safe to turn off tracing even if it
isn&#8217;t currently active. The <code>Trace.offNoFlush()</code> should be used if the
user does not wish to have <code>Trace.off()</code> block while flushing trace
data.</p>
</div>
</div>
<div class="sect3">
<h4 id="_viewing_collected_traces">13.9.3. Viewing Collected Traces</h4>
<div class="paragraph">
<p>To view collected traces, use the "Recent Traces" link on the Monitor
UI. You can also programmatically access and print traces using the
<code>TraceDump</code> class.</p>
</div>
</div>
<div class="sect3">
<h4 id="_tracing_from_the_shell">13.9.4. Tracing from the Shell</h4>
<div class="paragraph">
<p>You can enable tracing for operations run from the shell by using the
<code>trace on</code> and <code>trace off</code> commands.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>root@test test&gt; trace on
root@test test&gt; scan
a b:c [] d
root@test test&gt; trace off
Waiting for trace information
Waiting for trace information
Trace started at 2013/08/26 13:24:08.332
Time Start Service@Location Name
3628+0 shell@localhost shell:root
8+1690 shell@localhost scan
7+1691 shell@localhost scan:location
6+1692 tserver@localhost startScan
5+1692 tserver@localhost tablet read ahead 6</pre>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_logging">13.10. Logging</h3>
<div class="paragraph">
<p>Accumulo processes each write to a set of log files. By default these are found under
<code>$ACCUMULO/logs/</code>.</p>
</div>
</div>
<div class="sect2">
<h3 id="_recovery">13.11. Recovery</h3>
<div class="paragraph">
<p>In the event of TabletServer failure or error on shutting Accumulo down, some
mutations may not have been minor compacted to HDFS properly. In this case,
Accumulo will automatically reapply such mutations from the write-ahead log
either when the tablets from the failed server are reassigned by the Master (in the
case of a single TabletServer failure) or the next time Accumulo starts (in the event of
failure during shutdown).</p>
</div>
<div class="paragraph">
<p>Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing
updates. The sort status of each file is displayed on
Accumulo monitor status page. Once the recovery is complete any
tablets involved should return to an &#8220;online&#8221; state. Until then those tablets will be
unavailable to clients.</p>
</div>
<div class="paragraph">
<p>The Accumulo client library is configured to retry failed mutations and in many
cases clients will be able to continue processing after the recovery process without
throwing an exception.</p>
</div>
</div>
<h3 id="_migrating_from_non_ha_to_ha">12.12. Migrating Accumulo from non-HA Namenode to HA Namenode</h3>
<div class="paragraph"><p>
The following steps will allow a non-HA instance to be migrated to an HA instance. Consider an HDFS URL
<code>hdfs://namenode.example.com:8020</code> which is going to be moved to <code>hdfs://nameservice1</code>.</p>
</div>
<div class="paragraph"><p>
Before moving HDFS over to the HA namenode, use <code>$ACCUMULO_HOME/bin/accumulo admin volumes</code> to confirm
that the only volume displayed is the volume from the current namenode's HDFS URL.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>
Listing volumes referenced in zookeeper
Volume : hdfs://namenode.example.com:8020/accumulo
Listing volumes referenced in accumulo.root tablets section
Volume : hdfs://namenode.example.com:8020/accumulo
Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time)
Listing volumes referenced in accumulo.metadata tablets section
Volume : hdfs://namenode.example.com:8020/accumulo
Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time)
</pre>
</div>
</div>
<div class="paragraph"><p>
After verifying the current volume is correct, shut down the cluster and transition HDFS to the HA nameservice.
Edit <code>$ACCUMULO_HOME/conf/accumulo-site.xml</code> to notify accumulo that a volume is being replaced. First,
add the new nameservice volume to the <code>instance.volumes</code> property. Next, add the
<code>instance.volumes.replacements</code> property in the form of <code>old new</code>. It's important to not include
the volume that's being replaced in <code>instance.volumes</code>, otherwise it's possible accumulo could continue
to write to the volume.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;!-- instance.dfs.uri and instance.dfs.dir should not be set--&gt;
&lt;property&gt;
&lt;name&gt;instance.volumes&lt;/name&gt;
&lt;value&gt;hdfs://nameservice1/accumulo&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name>instance.volumes.replacements&lt;/name&gt;
&lt;value>hdfs://namenode.example.com:8020/accumulo hdfs://nameservice1/accumulo&lt;/value&gt;
&lt;/property>
</code></pre>
</div>
</div>
<div class="paragraph">
<p>Run <code>$ACCUMULO_HOME/bin/accumulo init --add-volumes</code> and start up the accumulo cluster. Verify that the
new nameservice volume shows up with <code>$ACCUMULO_HOME/bin/accumulo admin volumes</code>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>
Listing volumes referenced in zookeeper
Volume : hdfs://namenode.example.com:8020/accumulo
Volume : hdfs://nameservice1/accumulo
Listing volumes referenced in accumulo.root tablets section
Volume : hdfs://namenode.example.com:8020/accumulo
Volume : hdfs://nameservice1/accumulo
Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time)
Listing volumes referenced in accumulo.metadata tablets section
Volume : hdfs://namenode.example.com:8020/accumulo
Volume : hdfs://nameservice1/accumulo
Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time)
</pre>
</div>
</div>
<div class="paragraph"><p>
Some erroneous GarbageCollector messages may still be seen for a small period while data is transitioning to
the new volumes. This is expected and can usually be ignored.</p>
</div>
</div>
<div class="sect1">
<h2 id="_multi_volume_installations">14. Multi-Volume Installations</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This is an advanced configuration setting for very large clusters
under a lot of write pressure.</p>
</div>
<div class="paragraph">
<p>The HDFS NameNode holds all of the metadata about the files in
HDFS. For fast performance, all of this information needs to be stored
in memory. A single NameNode with 64G of memory can store the
metadata for tens of millions of files.However, when scaling beyond a
thousand nodes, an active Accumulo system can generate lots of updates
to the file system, especially when data is being ingested. The large
number of write transactions to the NameNode, and the speed of a
single edit log, can become the limiting factor for large scale
Accumulo installations.</p>
</div>
<div class="paragraph">
<p>You can see the effect of slow write transactions when the Accumulo
Garbage Collector takes a long time (more than 5 minutes) to delete
the files Accumulo no longer needs. If your Garbage Collector
routinely runs in less than a minute, the NameNode is performing well.</p>
</div>
<div class="paragraph">
<p>However, if you do begin to experience slow-down and poor GC
performance, Accumulo can be configured to use multiple NameNode
servers. The configuration &#8220;instance.volumes&#8221; should be set to a
comma-separated list, using full URI references to different NameNode
servers:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;property&gt;
&lt;name&gt;instance.volumes&lt;/name&gt;
&lt;value&gt;hdfs://ns1:9001,hdfs://ns2:9001&lt;/value&gt;
&lt;/property&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>The introduction of multiple volume support in 1.6 changed the way Accumulo
stores pointers to files. It now stores fully qualified URI references to
files. Before 1.6, Accumulo stored paths that were relative to a table
directory. After an upgrade these relative paths will still exist and are
resolved using instance.dfs.dir, instance.dfs.uri, and Hadoop configuration in
the same way they were before 1.6.</p>
</div>
<div class="paragraph">
<p>If the URI for a namenode changes (e.g. namenode was running on host1 and its
moved to host2), then Accumulo will no longer function. Even if Hadoop and
Accumulo configurations are changed, the fully qualified URIs stored in
Accumulo will still contain the old URI. To handle this Accumulo has the
following configuration property for replacing URI stored in its metadata. The
example configuration below will replace ns1 with nsA and ns2 with nsB in
Accumulo metadata. For this property to take affect, Accumulo will need to be
restarted.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;property&gt;
&lt;name&gt;instance.volumes.replacements&lt;/name&gt;
&lt;value&gt;hdfs://ns1:9001 hdfs://nsA:9001, hdfs://ns2:9001 hdfs://nsB:9001&lt;/value&gt;
&lt;/property&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p>Using viewfs or HA namenode, introduced in Hadoop 2, offers another option for
managing the fully qualified URIs stored in Accumulo. Viewfs and HA namenode
both introduce a level of indirection in the Hadoop configuration. For
example assume viewfs:///nn1 maps to hdfs://nn1 in the Hadoop configuration.
If viewfs://nn1 is used by Accumulo, then its easy to map viewfs://nn1 to
hdfs://nnA by changing the Hadoop configuration w/o doing anything to Accumulo.
A production system should probably use a HA namenode. Viewfs may be useful on
a test system with a single non HA namenode.</p>
</div>
<div class="paragraph">
<p>You may also want to configure your cluster to use Federation,
available in Hadoop 2.0, which allows DataNodes to respond to multiple
NameNode servers, so you do not have to partition your DataNodes by
NameNode.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_troubleshooting">15. Troubleshooting</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_logs">15.1. Logs</h3>
<div class="paragraph">
<p><strong>Q</strong>: The tablet server does not seem to be running!? What happened?</p>
</div>
<div class="paragraph">
<p>Accumulo is a distributed system. It is supposed to run on remote
equipment, across hundreds of computers. Each program that runs on
these remote computers writes down events as they occur, into a local
file. By default, this is defined in
<code>$ACCUMULO_HOME/conf/accumule-env.sh</code> as <code>ACCUMULO_LOG_DIR</code>.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Look in the <code>$ACCUMULO_LOG_DIR/tserver*.log</code> file. Specifically, check the end of the file.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: The tablet server did not start and the debug log does not exists! What happened?</p>
</div>
<div class="paragraph">
<p>When the individual programs are started, the stdout and stderr output
of these programs are stored in <code>.out</code> and <code>.err</code> files in
<code>$ACCUMULO_LOG_DIR</code>. Often, when there are missing configuration
options, files or permissions, messages will be left in these files.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Probably a start-up problem. Look in <code>$ACCUMULO_LOG_DIR/tserver*.err</code></p>
</div>
</div>
<div class="sect2">
<h3 id="_monitor_2">15.2. Monitor</h3>
<div class="paragraph">
<p><strong>Q</strong>: Accumulo is not working, what&#8217;s wrong?</p>
</div>
<div class="paragraph">
<p>There&#8217;s a small web server that collects information about all the
components that make up a running Accumulo instance. It will highlight
unusual or unexpected conditions.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Point your browser to the monitor (typically the master host, on port 50095). Is anything red or yellow?</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My browser is reporting connection refused, and I cannot get to the monitor</p>
</div>
<div class="paragraph">
<p>The monitor program&#8217;s output is also written to .err and .out files in
the <code>$ACCUMULO_LOG_DIR</code>. Look for problems in this file if the
<code>$ACCUMULO_LOG_DIR/monitor*.log</code> file does not exist.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: The monitor program is probably not running. Check the log files for errors.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My browser hangs trying to talk to the monitor.</p>
</div>
<div class="paragraph">
<p>Your browser needs to be able to reach the monitor program. Often
large clusters are firewalled, or use a VPN for internal
communications. You can use SSH to proxy your browser to the cluster,
or consult with your system administrator to gain access to the server
from your browser.</p>
</div>
<div class="paragraph">
<p>It is sometimes helpful to use a text-only browser to sanity-check the
monitor while on the machine running the monitor:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ links http://localhost:50095</pre>
</div>
</div>
<div class="paragraph">
<p><strong>A</strong>: Verify that you are not firewalled from the monitor if it is running on a remote host.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: The monitor responds, but there are no numbers for tservers and tables. The summary page says the master is down.</p>
</div>
<div class="paragraph">
<p>The monitor program gathers all the details about the master and the
tablet servers through the master. It will be mostly blank if the
master is down.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Check for a running master.</p>
</div>
</div>
<div class="sect2">
<h3 id="_hdfs">15.3. HDFS</h3>
<div class="paragraph">
<p>Accumulo reads and writes to the Hadoop Distributed File System.
Accumulo needs this file system available at all times for normal operations.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: Accumulo is having problems &#8220;getting a block blk_1234567890123.&#8221; How do I fix it?</p>
</div>
<div class="paragraph">
<p>This troubleshooting guide does not cover HDFS, but in general, you
want to make sure that all the datanodes are running and an fsck check
finds the file system clean:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ hadoop fsck /accumulo</pre>
</div>
</div>
<div class="paragraph">
<p>You can use:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ hadoop fsck /accumulo/path/to/corrupt/file -locations -blocks -files</pre>
</div>
</div>
<div class="paragraph">
<p>to locate the block references of individual corrupt files and use those
references to search the name node and individual data node logs to determine which
servers those blocks have been assigned and then try to fix any underlying file
system issues on those nodes.</p>
</div>
<div class="paragraph">
<p>On a larger cluster, you may need to increase the number of Xcievers for HDFS DataNodes:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="xml language-xml">&lt;property&gt;
&lt;name&gt;dfs.datanode.max.xcievers&lt;/name&gt;
&lt;value&gt;4096&lt;/value&gt;
&lt;/property&gt;</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>A</strong>: Verify HDFS is healthy, check the datanode logs.</p>
</div>
</div>
<div class="sect2">
<h3 id="_zookeeper">15.4. Zookeeper</h3>
<div class="paragraph">
<p><strong>Q</strong>: <code>accumulo init</code> is hanging. It says something about talking to zookeeper.</p>
</div>
<div class="paragraph">
<p>Zookeeper is also a distributed service. You will need to ensure that
it is up. You can run the zookeeper command line tool to connect to
any one of the zookeeper servers:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ zkCli.sh -server zoohost
...
[zk: zoohost:2181(CONNECTED) 0]</pre>
</div>
</div>
<div class="paragraph">
<p>It is important to see the word <code>CONNECTED</code>! If you only see
<code>CONNECTING</code> you will need to diagnose zookeeper errors.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Check to make sure that zookeeper is up, and that
<code>$ACCUMULO_HOME/conf/accumulo-site.xml</code> has been pointed to
your zookeeper server(s).</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: Zookeeper is running, but it does not say <code>CONNECTED</code></p>
</div>
<div class="paragraph">
<p>Zookeeper processes talk to each other to elect a leader. All updates
go through the leader and propagate to a majority of all the other
nodes. If a majority of the nodes cannot be reached, zookeeper will
not allow updates. Zookeeper also limits the number connections to a
server from any other single host. By default, this limit can be as small as 10
and can be reached in some everything-on-one-machine test configurations.</p>
</div>
<div class="paragraph">
<p>You can check the election status and connection status of clients by
asking the zookeeper nodes for their status. You connect to zookeeper
and ask it with the four-letter <code>stat</code> command:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>$ nc zoohost 2181
stat
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/127.0.0.1:58289[0](queued=0,recved=1,sent=0)
/127.0.0.1:60231[1](queued=0,recved=53910,sent=53915)
Latency min/avg/max: 0/5/3008
Received: 1561459
Sent: 1561592
Connections: 2
Outstanding: 0
Zxid: 0x621a3b
Mode: standalone
Node count: 22524</pre>
</div>
</div>
<div class="paragraph">
<p><strong>A</strong>: Check zookeeper status, verify that it has a quorum, and has not exceeded maxClientCnxns.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My tablet server crashed! The logs say that it lost its zookeeper lock.</p>
</div>
<div class="paragraph">
<p>Tablet servers reserve a lock in zookeeper to maintain their ownership
over the tablets that have been assigned to them. Part of their
responsibility for keeping the lock is to send zookeeper a keep-alive
message periodically. If the tablet server fails to send a message in
a timely fashion, zookeeper will remove the lock and notify the tablet
server. If the tablet server does not receive a message from
zookeeper, it will assume its lock has been lost, too. If a tablet
server loses its lock, it kills itself: everything assumes it is dead
already.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Investigate why the tablet server did not send a timely message to
zookeeper.</p>
</div>
<div class="sect3">
<h4 id="_keeping_the_tablet_server_lock">15.4.1. Keeping the tablet server lock</h4>
<div class="paragraph">
<p><strong>Q</strong>: My tablet server lost its lock. Why?</p>
</div>
<div class="paragraph">
<p>The primary reason a tablet server loses its lock is that it has been pushed into swap.</p>
</div>
<div class="paragraph">
<p>A large java program (like the tablet server) may have a large portion
of its memory image unused. The operation system will favor pushing
this allocated, but unused memory into swap so that the memory can be
re-used as a disk buffer. When the java virtual machine decides to
access this memory, the OS will begin flushing disk buffers to return that
memory to the VM. This can cause the entire process to block long
enough for the zookeeper lock to be lost.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Configure your system to reduce the kernel parameter <em>swappiness</em> from the default (60) to zero.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My tablet server lost its lock, and I have already set swappiness to
zero. Why?</p>
</div>
<div class="paragraph">
<p>Be careful not to over-subscribe memory. This can be easy to do if
your accumulo processes run on the same nodes as hadoop&#8217;s map-reduce
framework. Remember to add up:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>size of the JVM for the tablet server</p>
</li>
<li>
<p>size of the in-memory map, if using the native map implementation</p>
</li>
<li>
<p>size of the JVM for the data node</p>
</li>
<li>
<p>size of the JVM for the task tracker</p>
</li>
<li>
<p>size of the JVM times the maximum number of mappers and reducers</p>
</li>
<li>
<p>size of the kernel and any support processes</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If a 16G node can run 2 mappers and 2 reducers, and each can be 2G,
then there is only 8G for the data node, tserver, task tracker and OS.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Reduce the memory footprint of each component until it fits comfortably.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My tablet server lost its lock, swappiness is zero, and my node has lots of unused memory!</p>
</div>
<div class="paragraph">
<p>The JVM memory garbage collector may fall behind and cause a
"stop-the-world" garbage collection. On a large memory virtual
machine, this collection can take a long time. This happens more
frequently when the JVM is getting low on free memory. Check the logs
of the tablet server. You will see lines like this:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>2013-06-20 13:43:20,607 [tabletserver.TabletServer] DEBUG: gc ParNew=0.00(+0.00) secs
ConcurrentMarkSweep=0.00(+0.00) secs freemem=1,868,325,952(+1,868,325,952) totalmem=2,040,135,680</pre>
</div>
</div>
<div class="paragraph">
<p>When <code>freemem</code> becomes small relative to the amount of memory
needed, the JVM will spend more time finding free memory than
performing work. This can cause long delays in sending keep-alive
messages to zookeeper.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Ensure the tablet server JVM is not running low on memory.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_tools">15.5. Tools</h3>
<div class="paragraph">
<p>The accumulo script can be used to run classes from the command line.
This section shows how a few of the utilities work, but there are many
more.</p>
</div>
<div class="paragraph">
<p>There&#8217;s a class that will examine an accumulo storage file and print
out basic metadata.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo /accumulo/tables/1/default_tablet/A000000n.rf
2013-07-16 08:17:14,778 [util.NativeCodeLoader] INFO : Loaded the native-hadoop library
Locality group : &lt;DEFAULT&gt;
Start block : 0
Num blocks : 1
Index level 0 : 62 bytes 1 blocks
First key : 288be9ab4052fe9e span:34078a86a723e5d3:3da450f02108ced5 [] 1373373521623 false
Last key : start:13fc375709e id:615f5ee2dd822d7a [] 1373373821660 false
Num entries : 466
Column families : [waitForCommits, start, md major compactor 1, md major compactor 2, md major compactor 3,
bringOnline, prep, md major compactor 4, md major compactor 5, md root major compactor 3,
minorCompaction, wal, compactFiles, md root major compactor 4, md root major compactor 1,
md root major compactor 2, compact, id, client:update, span, update, commit, write,
majorCompaction]
Meta block : BCFile.index
Raw size : 4 bytes
Compressed size : 12 bytes
Compression type : gz
Meta block : RFile.index
Raw size : 780 bytes
Compressed size : 344 bytes
Compression type : gz</pre>
</div>
</div>
<div class="paragraph">
<p>When trying to diagnose problems related to key size, the <code>PrintInfo</code> tool can provide a histogram of the individual key sizes:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo --histogram /accumulo/tables/1/default_tablet/A000000n.rf
...
Up to size count %-age
10 : 222 28.23%
100 : 244 71.77%
1000 : 0 0.00%
10000 : 0 0.00%
100000 : 0 0.00%
1000000 : 0 0.00%
10000000 : 0 0.00%
100000000 : 0 0.00%
1000000000 : 0 0.00%
10000000000 : 0 0.00%</pre>
</div>
</div>
<div class="paragraph">
<p>Likewise, <code>PrintInfo</code> will dump the key-value pairs and show you the contents of the RFile:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo --dump /accumulo/tables/1/default_tablet/A000000n.rf
row columnFamily:columnQualifier [visibility] timestamp deleteFlag -&gt; Value
...</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: Accumulo is not showing me any data!</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Do you have your auths set so that it matches your visibilities?</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: What are my visibilities?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use <code>PrintInfo</code> on a representative file to get some idea of the visibilities in the underlying data.</p>
</div>
<div class="paragraph">
<p>Note that the use of <code>PrintInfo</code> is an administrative tool and can only
by used by someone who can access the underlying Accumulo data. It
does not provide the normal access controls in Accumulo.</p>
</div>
<div class="paragraph">
<p>If you would like to backup, or otherwise examine the contents of Zookeeper, there are commands to dump and load to/from XML.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.DumpZookeeper --root /accumulo &gt;dump.xml
$ ./bin/accumulo org.apache.accumulo.server.util.RestoreZookeeper --overwrite &lt; dump.xml</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: How can I get the information in the monitor page for my cluster monitoring system?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use GetMasterStats:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.test.GetMasterStats | grep Load
OS Load Average: 0.27</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: The monitor page is showing an offline tablet. How can I find out which tablet it is?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use FindOfflineTablets:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.FindOfflineTablets
2&lt;&lt;@(null,null,localhost:9997) is UNASSIGNED #walogs:2</pre>
</div>
</div>
<div class="paragraph">
<p>Here&#8217;s what the output means:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"><code>2&lt;&lt;</code></dt>
<dd>
<p>This is the tablet from (-inf, +inf) for the
table with id 2. The command <code>tables -l</code> in the shell will show table ids for
tables.</p>
</dd>
<dt class="hdlist1"><code>@(null, null, localhost:9997)</code></dt>
<dd>
<p>Location information. The
format is <code>@(assigned, hosted, last)</code>. In this case, the
tablet has not been assigned, is not hosted anywhere, and was once
hosted on localhost.</p>
</dd>
<dt class="hdlist1"><code>#walogs:2</code></dt>
<dd>
<p>The number of write-ahead logs that this tablet requires for recovery.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>An unassigned tablet with write-ahead logs is probably waiting for
logs to be sorted for efficient recovery.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: How can I be sure that the metadata tables are up and consistent?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: <code>CheckForMetadataProblems</code> will verify the start/end of
every tablet matches, and the start and stop for the table is empty:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.CheckForMetadataProblems -u root --password
Enter the connection password:
All is well for table !0
All is well for table 1</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My hadoop cluster has lost a file due to a NameNode failure. How can I remove the file?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: There&#8217;s a utility that will check every file reference and ensure
that the file exists in HDFS. Optionally, it will remove the
reference:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.RemoveEntriesForMissingFiles -u root --password
Enter the connection password:
2013-07-16 13:10:57,293 [util.RemoveEntriesForMissingFiles] INFO : File /accumulo/tables/2/default_tablet/F0000005.rf
is missing
2013-07-16 13:10:57,296 [util.RemoveEntriesForMissingFiles] INFO : 1 files of 3 missing</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: I have many entries in zookeeper for old instances I no longer need. How can I remove them?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use CleanZookeeper:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.CleanZookeeper</pre>
</div>
</div>
<div class="paragraph">
<p>This command will not delete the instance pointed to by the local <code>conf/accumulo-site.xml</code> file.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: I need to decommission a node. How do I stop the tablet server on it?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use the admin command:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo admin stop hostname:9997
2013-07-16 13:15:38,403 [util.Admin] INFO : Stopping server 12.34.56.78:9997</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: I cannot login to a tablet server host, and the tablet server will not shut down. How can I kill the server?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Sometimes you can kill a "stuck" tablet server by deleting its lock in zookeeper:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks --list
127.0.0.1:9997 TSERV_CLIENT=127.0.0.1:9997
$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -delete 127.0.0.1:9997
$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -list
127.0.0.1:9997 null</pre>
</div>
</div>
<div class="paragraph">
<p>You can find the master and instance id for any accumulo instances using the same zookeeper instance:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>$ ./bin/accumulo org.apache.accumulo.server.util.ListInstances
INFO : Using ZooKeepers localhost:2181
Instance Name | Instance ID | Master
---------------------+--------------------------------------+-------------------------------
"test" | 6140b72e-edd8-4126-b2f5-e74a8bbe323b | 127.0.0.1:9999</pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="metadata">15.6. System Metadata Tables</h3>
<div class="paragraph">
<p>Accumulo tracks information about tables in metadata tables. The metadata for
most tables is contained within the metadata table in the accumulo namespace,
while metadata for that table is contained in the root table in the accumulo
namespace. The root table is composed of a single tablet, which does not
split, so it is also called the root tablet. Information about the root
table, such as its location and write-ahead logs, are stored in ZooKeeper.</p>
</div>
<div class="paragraph">
<p>Let&#8217;s create a table and put some data into it:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>shell&gt; createtable test
shell&gt; tables -l
accumulo.metadata =&gt; !0
accumulo.root =&gt; +r
test =&gt; 2
trace =&gt; 1
shell&gt; insert a b c d
shell&gt; flush -w</pre>
</div>
</div>
<div class="paragraph">
<p>Now let&#8217;s take a look at the metadata for this table:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>shell&gt; table accumulo.metadata
shell&gt; scan -b 3; -e 3&lt;
3&lt; file:/default_tablet/F000009y.rf [] 186,1
3&lt; last:13fe86cd27101e5 [] 127.0.0.1:9997
3&lt; loc:13fe86cd27101e5 [] 127.0.0.1:9997
3&lt; log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6
3&lt; srv:dir [] /default_tablet
3&lt; srv:flush [] 1
3&lt; srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001$13fe86cd27101e5
3&lt; srv:time [] M1373998392323
3&lt; ~tab:~pr [] \x00</pre>
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s decode this little session:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"><code>scan -b 3; -e 3&lt;</code></dt>
<dd>
<p>Every tablet gets its own row. Every row starts with the table id followed by
<code>;</code> or <code>&lt;</code>, and followed by the end row split point for that tablet.</p>
</dd>
<dt class="hdlist1"><code>file:/default_tablet/F000009y.rf [] 186,1</code></dt>
<dd>
<p>File entry for this tablet. This tablet contains a single file reference. The
file is <code>/accumulo/tables/3/default_tablet/F000009y.rf</code>. It contains 1
key/value pair, and is 186 bytes long.</p>
</dd>
<dt class="hdlist1"><code>last:13fe86cd27101e5 [] 127.0.0.1:9997</code></dt>
<dd>
<p>Last location for this tablet. It was last held on 127.0.0.1:9997, and the
unique tablet server lock data was <code>13fe86cd27101e5</code>. The default balancer
will tend to put tablets back on their last location.</p>
</dd>
<dt class="hdlist1"><code>loc:13fe86cd27101e5 [] 127.0.0.1:9997</code></dt>
<dd>
<p>The current location of this tablet.</p>
</dd>
<dt class="hdlist1"><code>log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0. &#8230;</code></dt>
<dd>
<p>This tablet has a reference to a single write-ahead log. This file can be found in
<code>/accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995</code>. The value
of this entry could refer to multiple files. This tablet&#8217;s data is encoded as
<code>6</code> within the log.</p>
</dd>
<dt class="hdlist1"><code>srv:dir [] /default_tablet</code></dt>
<dd>
<p>Files written for this tablet will be placed into
<code>/accumulo/tables/3/default_tablet</code>.</p>
</dd>
<dt class="hdlist1"><code>srv:flush [] 1</code></dt>
<dd>
<p>Flush id. This table has successfully completed the flush with the id of <code>1</code>.</p>
</dd>
<dt class="hdlist1"><code>srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001\$13fe86cd27101e5</code></dt>
<dd>
<p>This is the lock information for the tablet holding the present lock. This
information is checked against zookeeper whenever this is updated, which
prevents a metadata update from a tablet server that no longer holds its
lock.</p>
</dd>
<dt class="hdlist1"><code>srv:time [] M1373998392323</code></dt>
<dd>
<p>This indicates the time time type (<code>M</code> for milliseconds or <code>L</code> for logical) and the timestamp of the most recently written key in this tablet. It is used to ensure automatically assigned key timestamps are strictly increasing for the tablet, regardless of the tablet server&#8217;s system time.</p>
</dd>
<dt class="hdlist1"><code>~tab:~pr [] \x00</code></dt>
<dd>
<p>The end-row marker for the previous tablet (prev-row). The first byte
indicates the presence of a prev-row. This tablet has the range (-inf, +inf),
so it has no prev-row (or end row).</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>Besides these columns, you may see:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"><code>rowId future:zooKeeperID location</code></dt>
<dd>
<p>Tablet has been assigned to a tablet, but not yet loaded.</p>
</dd>
<dt class="hdlist1"><code>~del:filename</code></dt>
<dd>
<p>When a tablet server is done use a file, it will create a delete marker in the appropriate metadata table, unassociated with any tablet. The garbage collector will remove the marker, and the file, when no other reference to the file exists.</p>
</dd>
<dt class="hdlist1"><code>~blip:txid</code></dt>
<dd>
<p>Bulk-Load In Progress marker.</p>
</dd>
<dt class="hdlist1"><code>rowId loaded:filename</code></dt>
<dd>
<p>A file has been bulk-loaded into this tablet, however the bulk load has not yet completed on other tablets, so this marker prevents the file from being loaded multiple times.</p>
</dd>
<dt class="hdlist1"><code>rowId !cloned</code></dt>
<dd>
<p>A marker that indicates that this tablet has been successfully cloned.</p>
</dd>
<dt class="hdlist1"><code>rowId splitRatio:ratio</code></dt>
<dd>
<p>A marker that indicates a split is in progress, and the files are being split at the given ratio.</p>
</dd>
<dt class="hdlist1"><code>rowId chopped</code></dt>
<dd>
<p>A marker that indicates that the files in the tablet do not contain keys outside the range of the tablet.</p>
</dd>
<dt class="hdlist1"><code>rowId scan</code></dt>
<dd>
<p>A marker that prevents a file from being removed while there are still active scans using it.</p>
</dd>
</dl>
</div>
</div>
<div class="sect2">
<h3 id="_simple_system_recovery">15.7. Simple System Recovery</h3>
<div class="paragraph">
<p><strong>Q</strong>: One of my Accumulo processes died. How do I bring it back?</p>
</div>
<div class="paragraph">
<p>The easiest way to bring all services online for an Accumulo instance is to run the <code>start-all.sh</code> script.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ bin/start-all.sh</pre>
</div>
</div>
<div class="paragraph">
<p>This process will check the process listing, using <code>jps</code> on each host before attempting to restart a service on the given host.
Typically, this check is sufficient except in the face of a hung/zombie process. For large clusters, it may be
undesirable to ssh to every node in the cluster to ensure that all hosts are running the appropriate processes and <code>start-here.sh</code> may be of use.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ ssh host_with_dead_process
$ bin/start-here.sh</pre>
</div>
</div>
<div class="paragraph">
<p><code>start-here.sh</code> should be invoked on the host which is missing a given process. Like start-all.sh, it will start all
necessary processes that are not currently running, but only on the current host and not cluster-wide. Tools such as <code>pssh</code> or
<code>pdsh</code> can be used to automate this process.</p>
</div>
<div class="paragraph">
<p><code>start-server.sh</code> can also be used to start a process on a given host; however, it is not generally recommended for
users to issue this directly as the <code>start-all.sh</code> and <code>start-here.sh</code> scripts provide the same functionality with
more automation and are less prone to user error.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use <code>start-all.sh</code> or <code>start-here.sh</code>.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: My process died again. Should I restart it via <code>cron</code> or tools like <code>supervisord</code>?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: A repeatedly dying Accumulo process is a sign of a larger problem. Typically these problems are due to a
misconfiguration of Accumulo or over-saturation of resources. Blind automation of any service restart inside of Accumulo
is generally an undesirable situation as it is indicative of a problem that is being masked and ignored. Accumulo
processes should be stable on the order of months and not require frequent restart.</p>
</div>
</div>
<div class="sect2">
<h3 id="_advanced_system_recovery">15.8. Advanced System Recovery</h3>
<div class="sect3">
<h4 id="_hdfs_failure">15.8.1. HDFS Failure</h4>
<div class="paragraph">
<p><strong>Q</strong>: I had disasterous HDFS failure. After bringing everything back up, several tablets refuse to go online.</p>
</div>
<div class="paragraph">
<p>Data written to tablets is written into memory before being written into indexed files. In case the server
is lost before the data is saved into a an indexed file, all data stored in memory is first written into a
write-ahead log (WAL). When a tablet is re-assigned to a new tablet server, the write-ahead logs are read to
recover any mutations that were in memory when the tablet was last hosted.</p>
</div>
<div class="paragraph">
<p>If a write-ahead log cannot be read, then the tablet is not re-assigned. All it takes is for one of
the blocks in the write-ahead log to be missing. This is unlikely unless multiple data nodes in HDFS have been
lost.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Get the WAL files online and healthy. Restore any data nodes that may be down.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: How do find out which tablets are offline?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Use <code>accumulo admin checkTablets</code></p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ bin/accumulo admin checkTablets</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Q</strong>: I lost three data nodes, and I&#8217;m missing blocks in a WAL. I don&#8217;t care about data loss, how
can I get those tablets online?</p>
</div>
<div class="paragraph">
<p>See the discussion in <a href="#metadata">System Metadata Tables</a>, which shows a typical metadata table listing.
The entries with a column family of <code>log</code> are references to the WAL for that tablet.
If you know what WAL is bad, you can find all the references with a grep in the shell:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>shell&gt; grep 0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995
3&lt; log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6</pre>
</div>
</div>
<div class="paragraph">
<p><strong>A</strong>: You can remove the WAL references in the metadata table.</p>
</div>
<div class="literalblock">
<div class="content">
<pre>shell&gt; grant -u root Table.WRITE -t accumulo.metadata
shell&gt; delete 3&lt; log 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995</pre>
</div>
</div>
<div class="paragraph">
<p>Note: the colon (<code>:</code>) is omitted when specifying the <em>row cf cq</em> for the delete command.</p>
</div>
<div class="paragraph">
<p>The master will automatically discover the tablet no longer has a bad WAL reference and will
assign the tablet. You will need to remove the reference from all the tablets to get them
online.</p>
</div>
<div class="paragraph">
<p><strong>Q</strong>: The metadata (or root) table has references to a corrupt WAL.</p>
</div>
<div class="paragraph">
<p>This is a much more serious state, since losing updates to the metadata table will result
in references to old files which may not exist, or lost references to new files, resulting
in tablets that cannot be read, or large amounts of data loss.</p>
</div>
<div class="paragraph">
<p>The best hope is to restore the WAL by fixing HDFS data nodes and bringing the data back online.
If this is not possible, the best approach is to re-create the instance and bulk import all files from
the old instance into a new tables.</p>
</div>
<div class="paragraph">
<p>A complete set of instructions for doing this is outside the scope of this guide,
but the basic approach is:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Use <code>tables -l</code> in the shell to discover the table name to table id mapping</p>
</li>
<li>
<p>Stop all accumulo processes on all nodes</p>
</li>
<li>
<p>Move the accumulo directory in HDFS out of the way:
$ hadoop fs -mv /accumulo /corrupt</p>
</li>
<li>
<p>Re-initalize accumulo</p>
</li>
<li>
<p>Recreate tables, users and permissions</p>
</li>
<li>
<p>Import the directories under <code>/corrupt/tables/&lt;id&gt;</code> into the new instance</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>Q</strong>: One or more HDFS Files under /accumulo/tables are corrupt</p>
</div>
<div class="paragraph">
<p>Accumulo maintains multiple references into the tablet files in the metadata
tables and within the tablet server hosting the file, this makes it difficult to
reliably just remove those references.</p>
</div>
<div class="paragraph">
<p>The directory structure in HDFS for tables will follow the general structure:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>/accumulo
/accumulo/tables/
/accumulo/tables/!0
/accumulo/tables/!0/default_tablet/A000001.rf
/accumulo/tables/!0/t-00001/A000002.rf
/accumulo/tables/1
/accumulo/tables/1/default_tablet/A000003.rf
/accumulo/tables/1/t-00001/A000004.rf
/accumulo/tables/1/t-00001/A000005.rf
/accumulo/tables/2/default_tablet/A000006.rf
/accumulo/tables/2/t-00001/A000007.rf</pre>
</div>
</div>
<div class="paragraph">
<p>If files under <code>/accumulo/tables</code> are corrupt, the best course of action is to
recover those files in hdsf see the section on HDFS. Once these recovery efforts
have been exhausted, the next step depends on where the missing file(s) are
located. Different actions are required when the bad files are in Accumulo data
table files or if they are metadata table files.</p>
</div>
<div class="paragraph">
<p><strong>Data File Corruption</strong></p>
</div>
<div class="paragraph">
<p>When an Accumulo data file is corrupt, the most reliable way to restore Accumulo
operations is to replace the missing file with an &#8220;empty&#8221; file so that
references to the file in the METADATA table and within the tablet server
hosting the file can be resolved by Accumulo. An empty file can be created using
the CreateEmpty utiity:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ accumulo org.apache.accumulo.core.file.rfile.CreateEmpty /path/to/empty/file/empty.rf</pre>
</div>
</div>
<div class="paragraph">
<p>The process is to delete the corrupt file and then move the empty file into its
place (The generated empty file can be copied and used multiple times if necessary and does not need
to be regenerated each time)</p>
</div>
<div class="literalblock">
<div class="content">
<pre>$ hadoop fs –rm /accumulo/tables/corrupt/file/thename.rf; \
hadoop fs -mv /path/to/empty/file/empty.rf /accumulo/tables/corrupt/file/thename.rf</pre>
</div>
</div>
<div class="paragraph">
<p><strong>Metadata File Corruption</strong></p>
</div>
<div class="paragraph">
<p>If the corrupt files are metadata files, see <a href="#metadata">System Metadata Tables</a> (under the path
<code>/accumulo/tables/!0</code>) then you will need to rebuild
the metadata table by initializing a new instance of Accumulo and then importing
all of the existing data into the new instance. This is the same procedure as
recovering from a zookeeper failure (see <a href="#zookeeper_failure">ZooKeeper Failure</a>), except that
you will have the benefit of having the existing user and table authorizations
that are maintained in zookeeper.</p>
</div>
<div class="paragraph">
<p>You can use the DumpZookeeper utility to save this information for reference
before creating the new instance. You will not be able to use RestoreZookeeper
because the table names and references are likely to be different between the
original and the new instances, but it can serve as a reference.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: If the files cannot be recovered, replace corrupt data files with a empty
rfiles to allow references in the metadata table and in the tablet servers to be
resolved. Rebuild the metadata table if the corrupt files are metadata files.</p>
</div>
</div>
<div class="sect3">
<h4 id="zookeeper_failure">15.8.2. ZooKeeper Failure</h4>
<div class="paragraph">
<p><strong>Q</strong>: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance?</p>
</div>
<div class="paragraph">
<p>ZooKeeper, in addition to its lock-service capabilities, also serves to bootstrap an Accumulo
instance from some location in HDFS. It contains the pointers to the root tablet in HDFS which
is then used to load the Accumulo metadata tablets, which then loads all user tables. ZooKeeper
also stores all namespace and table configuration, the user database, the mapping of table IDs to
table names, and more across Accumulo restarts.</p>
</div>
<div class="paragraph">
<p>Presently, the only way to recover such an instance is to initialize a new instance and import all
of the old data into the new instance. The easiest way to tackle this problem is to first recreate
the mapping of table ID to table name and then recreate each of those tables in the new instance.
Set any necessary configuration on the new tables and add some split points to the tables to close
the gap between how many splits the old table had and no splits.</p>
</div>
<div class="paragraph">
<p>The directory structure in HDFS for tables will follow the general structure:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>/accumulo
/accumulo/tables/
/accumulo/tables/1
/accumulo/tables/1/default_tablet/A000001.rf
/accumulo/tables/1/t-00001/A000002.rf
/accumulo/tables/1/t-00001/A000003.rf
/accumulo/tables/2/default_tablet/A000004.rf
/accumulo/tables/2/t-00001/A000005.rf</pre>
</div>
</div>
<div class="paragraph">
<p>For each table, make a new directory that you can move (or copy if you have the HDFS space to do so)
all of the rfiles for a given table into. For example, to process the table with an ID of <code>1</code>, make a new directory,
say <code>/new-table-1</code> and then copy all files from <code>/accumulo/tables/1/*/*.rf</code> into that directory. Additionally,
make a directory, <code>/new-table-1-failures</code>, for any failures during the import process. Then, issue the import
command using the Accumulo shell into the new table, telling Accumulo to not re-set the timestamp:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>user@instance new_table&gt; importdirectory /new-table-1 /new-table-1-failures false</pre>
</div>
</div>
<div class="paragraph">
<p>Any RFiles which were failed to be loaded will be placed in <code>/new-table-1-failures</code>. Rfiles that were successfully
imported will no longer exist in <code>/new-table-1</code>. For failures, move them back to the import directory and retry
the <code>importdirectory</code> command.</p>
</div>
<div class="paragraph">
<p>It is <strong>extremely</strong> important to note that this approach may introduce stale data back into
the tables. For a few reasons, RFiles may exist in the table directory which are candidates for deletion but have
not yet been deleted. Additionally, deleted data which was not compacted away, but still exists in write-ahead logs if
the original instance was somehow recoverable, will be re-introduced in the new instance. Table splits and merges
(which also include the deleteRows API call on TableOperations, are also vulnerable to this problem. This process should
<strong>not</strong> be used if these are unacceptable risks. It is possible to try to re-create a view of the <code>accumulo.metadata</code>
table to prune out files that are candidates for deletion, but this is a difficult task that also may not be entirely accurate.</p>
</div>
<div class="paragraph">
<p>Likewise, it is also possible that data loss may occur from write-ahead log (WAL) files which existed on the old table but
were not minor-compacted into an RFile. Again, it may be possible to reconstruct the state of these WAL files to
replay data not yet in an RFile; however, this is a difficult task and is not implemented in any automated fashion.</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: The <code>importdirectory</code> shell command can be used to import RFiles from the old instance into a newly created instance,
but extreme care should go into the decision to do this as it may result in reintroduction of stale data or the
omission of new data.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_upgrade_issues">15.9. Upgrade Issues</h3>
<div class="paragraph">
<p><strong>Q</strong>: I upgrade from 1.4 to 1.5 to 1.6 but still have some WAL files on local disk. Do I have any way
to recover them?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: Yes, you can recover them by running the LocalWALRecovery utility on each node that needs
recovery performed. The utility will default to using the directory specified by <code>logger.dir.walog</code> in your
configuration, or can be overriden by using the <code>--local-wal-directories</code> option on the tool. It can be
invoked as follows:</p>
<div class="literalblock">
<div class="content">
<pre>$ACCUMULO_HOME/bin/accumulo org.apache.accumulo.tserver.log.LocalWALRecovery</pre>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_file_naming_conventions">15.10. File Naming Conventions</h3>
<div class="paragraph">
<p><strong>Q</strong>: Why are files named like they are? Why do some start with <code>C</code> and others with <code>F</code>?</p>
</div>
<div class="paragraph">
<p><strong>A</strong>: The file names give you a basic idea for the source of the file.</p>
</div>
<div class="paragraph">
<p>The base of the filename is a base-36 unique number. All filenames in accumulo are coordinated
with a counter in zookeeper, so they are always unique, which is useful for debugging.</p>
</div>
<div class="paragraph">
<p>The leading letter gives you an idea of how the file was created:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"><code>F</code></dt>
<dd>
<p>Flush: entries in memory were written to a file (Minor Compaction)</p>
</dd>
<dt class="hdlist1"><code>M</code></dt>
<dd>
<p>Merging compaction: entries in memory were combined with the smallest file to create one new file</p>
</dd>
<dt class="hdlist1"><code>C</code></dt>
<dd>
<p>Several files, but not all files, were combined to produce this file (Major Compaction)</p>
</dd>
<dt class="hdlist1"><code>A</code></dt>
<dd>
<p>All files were compacted, delete entries were dropped</p>
</dd>
<dt class="hdlist1"><code>I</code></dt>
<dd>
<p>Bulk import, complete, sorted index files. Always in a directory starting with <code>b-</code></p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>This simple file naming convention allows you to see the basic structure of the files from just
their filenames, and reason about what should be happening to them next, just
by scanning their entries in the metadata tables.</p>
</div>
<div class="paragraph">
<p>For example, if you see multiple files with <code>M</code> prefixes, the tablet is, or was, up against its
maximum file limit, so it began merging memory updates with files to keep the file count reasonable. This
slows down ingest performance, so knowing there are many files like this tells you that the system
is struggling to keep up with ingest vs the compaction strategy which reduces the number of files.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="configuration">16. Appendix A: Configuration Management</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_configuration_overview">16.1. Configuration Overview</h3>
<div class="paragraph">
<p>All accumulo properties have a default value in the source code. Properties can also be set
in accumulo-site.xml and in zookeeper on per-table or system-wide basis. If properties are set in more than one location,
accumulo will choose the property with the highest precedence. This order of precedence is described
below (from highest to lowest):</p>
</div>
<div class="sect3">
<h4 id="_zookeeper_table_properties">16.1.1. Zookeeper table properties</h4>
<div class="paragraph">
<p>Table properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. While table properties take precedent over system properties, both will override properties set in accumulo-site.xml</p>
</div>
<div class="paragraph">
<p>Table properties consist of all properties with the table.* prefix. Table properties are configured on a per-table basis using the following shell commmand:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>config -t TABLE -s PROPERTY=VALUE</pre>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_zookeeper_system_properties">16.1.2. Zookeeper system properties</h4>
<div class="paragraph">
<p>System properties are applied to the entire cluster when set in zookeeper using the accumulo API or shell. System properties consist of all properties with a <code>yes</code> in the <em>Zookeeper Mutable</em> column in the table below. They are set with the following shell command:</p>
</div>
<div class="literalblock">
<div class="content">
<pre>config -s PROPERTY=VALUE</pre>
</div>
</div>
<div class="paragraph">
<p>If a table.* property is set using this method, the value will apply to all tables except those configured on per-table basis (which have higher precedence).</p>
</div>
<div class="paragraph">
<p>While most system properties take effect immediately, some require a restart of the process which is indicated in <em>Zookeeper Mutable</em>.</p>
</div>
</div>
<div class="sect3">
<h4 id="_accumulo_site_xml">16.1.3. accumulo-site.xml</h4>
<div class="paragraph">
<p>Accumulo processes (master, tserver, etc) read their local accumulo-site.xml on start up. Therefore, changes made to accumulo-site.xml must rsynced across the cluster and processes must be restarted to apply changes.</p>
</div>
<div class="paragraph">
<p>Certain properties (indicated by a <code>no</code> in <em>Zookeeper Mutable</em>) cannot be set in zookeeper and only set in this file. The accumulo-site.xml also allows you to configure tablet servers with different settings.</p>
</div>
</div>
<div class="sect3">
<h4 id="_default_values">16.1.4. Default Values</h4>
<div class="paragraph">
<p>All properties have a default value in the source code. This value has the lowest precedence and is overriden if set in accumulo-site.xml or zookeeper.</p>
</div>
<div class="paragraph">
<p>While the default value is usually optimal, there are cases where a change can increase query and ingest performance.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_configuration_in_the_shell">16.2. Configuration in the Shell</h3>
<div class="paragraph">
<p>The <code>config</code> command in the shell allows you to view the current system configuration. You can also use the <code>-t</code> option to view a table&#8217;s configuration as below:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>$ ./bin/accumulo shell -u root
Enter current password for 'root'@'accumulo-instance': ******
Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.0
- instance name: accumulo-instance
- instance id: 4f48fa03-f692-43ce-ae03-94c9ea8b7181
-
- type 'help' for a list of available commands
-
root@accumulo-instance&gt; config -t foo
---------+---------------------------------------------+------------------------------------------------------
SCOPE | NAME | VALUE
---------+---------------------------------------------+------------------------------------------------------
default | table.balancer ............................ | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
default | table.bloom.enabled ....................... | false
default | table.bloom.error.rate .................... | 0.5%
default | table.bloom.hash.type ..................... | murmur
default | table.bloom.key.functor ................... | org.apache.accumulo.core.file.keyfunctor.RowFunctor
default | table.bloom.load.threshold ................ | 1
default | table.bloom.size .......................... | 1048576
default | table.cache.block.enable .................. | false
default | table.cache.index.enable .................. | false
default | table.compaction.major.everything.at ...... | 19700101000000GMT
default | table.compaction.major.everything.idle .... | 1h
default | table.compaction.major.ratio .............. | 1.3
site | @override .............................. | 1.4
system | @override .............................. | 1.5
table | @override .............................. | 1.6
default | table.compaction.minor.idle ............... | 5m
default | table.compaction.minor.logs.threshold ..... | 3
default | table.failures.ignore ..................... | false</pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_available_properties">16.3. Available Properties</h3>
<div class="paragraph">
<p>Jump to:
<a href="#RPC_PREFIX">rpc.*</a> | <a href="#INSTANCE_PREFIX">instance.*</a> | <a href="#GENERAL_PREFIX">general.*</a> | <a href="#MASTER_PREFIX">master.*</a> | <a href="#TSERV_PREFIX">tserver.*</a> | <a href="#LOGGER_PREFIX">logger.*</a> | <a href="#GC_PREFIX">gc.*</a> | <a href="#MONITOR_PREFIX">monitor.*</a> | <a href="#TRACE_PREFIX">trace.*</a> | <a href="#TRACE_TOKEN_PROPERTY_PREFIX">trace.token.property.*</a> | <a href="#TABLE_PREFIX">table.*</a> | <a href="#TABLE_CONSTRAINT_PREFIX">table.constraint.*</a> | <a href="#TABLE_ITERATOR_PREFIX">table.iterator.*</a> | <a href="#TABLE_LOCALITY_GROUP_PREFIX">table.group.*</a> | <a href="#TABLE_COMPACTION_STRATEGY_PREFIX">table.majc.compaction.strategy.opts.*</a> | <a href="#VFS_CONTEXT_CLASSPATH_PROPERTY">general.vfs.context.classpath.*</a></p>
</div>
<div class="sect3">
<h4 id="RPC_PREFIX">16.3.1. rpc.*</h4>
<div class="paragraph">
<p>Properties in this category related to the configuration of SSL keys for RPC. See also instance.ssl.enabled</p>
</div>
<div class="sect4">
<h5 id="_rpc_javax_net_ssl_keystore">rpc.javax.net.ssl.keyStore</h5>
<div class="paragraph">
<p>Path of the keystore file for the servers' private SSL key</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PATH<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>$ACCUMULO_CONF_DIR/ssl/keystore.jks</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_rpc_javax_net_ssl_keystorepassword">rpc.javax.net.ssl.keyStorePassword</h5>
<div class="paragraph">
<p>Password used to encrypt the SSL private keystore. Leave blank to use the Accumulo instance secret</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_rpc_javax_net_ssl_keystoretype">rpc.javax.net.ssl.keyStoreType</h5>
<div class="paragraph">
<p>Type of SSL keystore</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>jks</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_rpc_javax_net_ssl_truststore">rpc.javax.net.ssl.trustStore</h5>
<div class="paragraph">
<p>Path of the truststore file for the root cert</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PATH<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>$ACCUMULO_CONF_DIR/ssl/truststore.jks</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_rpc_javax_net_ssl_truststorepassword">rpc.javax.net.ssl.trustStorePassword</h5>
<div class="paragraph">
<p>Password used to encrypt the SSL truststore. Leave blank to use no password</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_rpc_javax_net_ssl_truststoretype">rpc.javax.net.ssl.trustStoreType</h5>
<div class="paragraph">
<p>Type of SSL truststore</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>jks</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_rpc_usejsse">rpc.useJsse</h5>
<div class="paragraph">
<p>Use JSSE system properties to configure SSL rather than the rpc.javax.net.ssl.* Accumulo properties</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="INSTANCE_PREFIX">16.3.2. instance.*</h4>
<div class="paragraph">
<p>Properties in this category must be consistent throughout a cloud. This is enforced and servers won&#8217;t be able to communicate if these differ.</p>
</div>
<div class="sect4">
<h5 id="_instance_dfs_dir">instance.dfs.dir</h5>
<div class="paragraph">
<p><span class="line-through"><em>Deprecated.</em> HDFS directory in which accumulo instance will run. Do not change after accumulo is initialized.</span></p>
</div>
<div class="paragraph">
<p><span class="line-through"><em>Type:</em> ABSOLUTEPATH</span><br>
<span class="line-through"><em>Zookeeper Mutable:</em> no</span><br>
<span class="line-through"><em>Default Value:</em> <code>/accumulo</code></span></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_dfs_uri">instance.dfs.uri</h5>
<div class="paragraph">
<p><span class="line-through"><em>Deprecated.</em> A url accumulo should use to connect to DFS. If this is empty, accumulo will obtain this information from the hadoop configuration. This property will only be used when creating new files if instance.volumes is empty. After an upgrade to 1.6.0 Accumulo will start using absolute paths to reference files. Files created before a 1.6.0 upgrade are referenced via relative paths. Relative paths will always be resolved using this config (if empty using the hadoop config).</span></p>
</div>
<div class="paragraph">
<p><span class="line-through"><em>Type:</em> URI</span><br>
<span class="line-through"><em>Zookeeper Mutable:</em> no</span><br>
<span class="line-through"><em>Default Value:</em> <em>empty</em></span></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_rpc_ssl_clientauth">instance.rpc.ssl.clientAuth</h5>
<div class="paragraph">
<p>Require clients to present certs signed by a trusted root</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_rpc_ssl_enabled">instance.rpc.ssl.enabled</h5>
<div class="paragraph">
<p>Use SSL for socket connections from clients and among accumulo services</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_secret">instance.secret</h5>
<div class="paragraph">
<p>A secret unique to a given instance that all servers must know in order to communicate with one another. It should be changed prior to the initialization of Accumulo. To change after Accumulo has been initialized, use the ChangeSecret tool and then update conf/accumulo-site.xml everywhere. Before using the ChangeSecret tool, make sure Accumulo is not running and you are logged in as the user that controls Accumulo files in HDFS. To use the ChangeSecret tool, run the command: <code>./bin/accumulo org.apache.accumulo.server.util.ChangeSecret</code></p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>DEFAULT</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_security_authenticator">instance.security.authenticator</h5>
<div class="paragraph">
<p>The authenticator class that accumulo will use to determine if a user has privilege to perform an action</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.security.handler.ZKAuthenticator</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_security_authorizor">instance.security.authorizor</h5>
<div class="paragraph">
<p>The authorizor class that accumulo will use to determine what labels a user has privilege to see</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.security.handler.ZKAuthorizor</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_security_permissionhandler">instance.security.permissionHandler</h5>
<div class="paragraph">
<p>The permission handler class that accumulo will use to determine if a user has privilege to perform an action</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.security.handler.ZKPermHandler</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_volumes">instance.volumes</h5>
<div class="paragraph">
<p>A comma seperated list of dfs uris to use. Files will be stored across these filesystems. If this is empty, then instance.dfs.uri will be used. After adding uris to this list, run <em>accumulo init --add-volume</em> and then restart tservers. If entries are removed from this list then tservers will need to be restarted. After a uri is removed from the list Accumulo will not create new files in that location, however Accumulo can still reference files created at that location before the config change. To use a comma or other reserved characters in a URI use standard URI hex encoding. For example replace commas with %2C.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_volumes_replacements">instance.volumes.replacements</h5>
<div class="paragraph">
<p>Since accumulo stores absolute URIs changing the location of a namenode could prevent Accumulo from starting. The property helps deal with that situation. Provide a comma seperated list of uri replacement pairs here if a namenode location changes. Each pair shold be separated with a space. For example, if hdfs://nn1 was repalced with hdfs://nnA and hdfs://nn2 was replaced with hdfs://nnB, then set this property to <em>hdfs://nn1 hdfs://nnA,hdfs://nn2 hdfs://nnB&#8217;Replacements must be configured for use. To see which volumes are currently in use, run 'accumulo admin volumes -l</em>. To use a comma or other reserved characters in a URI use standard URI hex encoding. For example replace commas with %2C.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_zookeeper_host">instance.zookeeper.host</h5>
<div class="paragraph">
<p>Comma separated list of zookeeper servers</p>
</div>
<div class="paragraph">
<p><em>Type:</em> HOSTLIST<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>localhost:2181</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_instance_zookeeper_timeout">instance.zookeeper.timeout</h5>
<div class="paragraph">
<p>Zookeeper session timeout; max value when represented as milliseconds should be no larger than 2147483647</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>30s</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="GENERAL_PREFIX">16.3.3. general.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of accumulo overall, but do not have to be consistent throughout a cloud.</p>
</div>
<div class="sect4">
<h5 id="_general_classpaths">general.classpaths</h5>
<div class="paragraph">
<p>A list of all of the places to look for a class. Order does matter, as it will look for the jar starting in the first location to the last. Please note, hadoop conf and hadoop lib directories NEED to be here, along with accumulo lib and zookeeper directory. Supports full regex on filename alone.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em></p>
</div>
<div class="listingblock">
<div class="content">
<pre>$ACCUMULO_CONF_DIR,
$ACCUMULO_HOME/lib/[^.].*.jar,
$ZOOKEEPER_HOME/zookeeper[^.].*.jar,
$HADOOP_CONF_DIR,
$HADOOP_PREFIX/[^.].*.jar,
$HADOOP_PREFIX/lib/[^.].*.jar,
$HADOOP_PREFIX/share/hadoop/common/.*.jar,
$HADOOP_PREFIX/share/hadoop/common/lib/.*.jar,
$HADOOP_PREFIX/share/hadoop/hdfs/.*.jar,
$HADOOP_PREFIX/share/hadoop/mapreduce/.*.jar,
/usr/lib/hadoop/[^.].*.jar,
/usr/lib/hadoop/lib/[^.].*.jar,
/usr/lib/hadoop-hdfs/[^.].*.jar,
/usr/lib/hadoop-mapreduce/[^.].*.jar,
/usr/lib/hadoop-yarn/[^.].*.jar,</pre>
</div>
</div>
</div>
<div class="sect4">
<h5 id="_general_dynamic_classpaths">general.dynamic.classpaths</h5>
<div class="paragraph">
<p>A list of all of the places where changes in jars or classes will force a reload of the classloader.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>$ACCUMULO_HOME/lib/ext/[^.].*.jar</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_kerberos_keytab">general.kerberos.keytab</h5>
<div class="paragraph">
<p>Path to the kerberos keytab to use. Leave blank if not using kerberoized hdfs</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PATH<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_kerberos_principal">general.kerberos.principal</h5>
<div class="paragraph">
<p>Name of the kerberos principal to use. _HOST will automatically be replaced by the machines hostname in the hostname portion of the principal. Leave blank if not using kerberoized hdfs</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_rpc_timeout">general.rpc.timeout</h5>
<div class="paragraph">
<p>Time to wait on I/O for simple, short RPC calls</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>120s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_server_message_size_max">general.server.message.size.max</h5>
<div class="paragraph">
<p>The maximum size of a message that can be sent to a server.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>1G</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_server_simpletimer_threadpool_size">general.server.simpletimer.threadpool.size</h5>
<div class="paragraph">
<p>The number of threads to use for server-internal scheduled tasks</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>1</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_vfs_cache_dir">general.vfs.cache.dir</h5>
<div class="paragraph">
<p>Directory to use for the vfs cache. The cache will keep a soft reference to all of the classes loaded in the VM. This should be on local disk on each node with sufficient space. It defaults to ${java.io.tmpdir}/accumulo-vfs-cache-${user.name}</p>
</div>
<div class="paragraph">
<p><em>Type:</em> ABSOLUTEPATH<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>${java.io.tmpdir}/accumulo-vfs-cache-${user.name}</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_general_vfs_classpaths">general.vfs.classpaths</h5>
<div class="paragraph">
<p>Configuration for a system level vfs classloader. Accumulo jar can be configured here and loaded out of HDFS.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="MASTER_PREFIX">16.3.4. master.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of the master server</p>
</div>
<div class="sect4">
<h5 id="_master_bulk_retries">master.bulk.retries</h5>
<div class="paragraph">
<p>The number of attempts to bulk-load a file before giving up.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>3</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_bulk_threadpool_size">master.bulk.threadpool.size</h5>
<div class="paragraph">
<p>The number of threads to use when coordinating a bulk-import.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_bulk_timeout">master.bulk.timeout</h5>
<div class="paragraph">
<p>The time to wait for a tablet server to process a bulk import request</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_fate_threadpool_size">master.fate.threadpool.size</h5>
<div class="paragraph">
<p>The number of threads used to run FAult-Tolerant Executions. These are primarily table operations like merge.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>4</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_lease_recovery_interval">master.lease.recovery.interval</h5>
<div class="paragraph">
<p>The amount of time to wait after requesting a WAL file to be recovered</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_port_client">master.port.client</h5>
<div class="paragraph">
<p>The port used for handling client connections on the master</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PORT<br>
<em>Zookeeper Mutable:</em> yes but requires restart of the master<br>
<em>Default Value:</em> <code>9999</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_recovery_delay">master.recovery.delay</h5>
<div class="paragraph">
<p>When a tablet server&#8217;s lock is deleted, it takes time for it to completely quit. This delay gives it time before log recoveries begin.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>10s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_recovery_max_age">master.recovery.max.age</h5>
<div class="paragraph">
<p>Recovery files older than this age will be removed.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>60m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_recovery_time_max">master.recovery.time.max</h5>
<div class="paragraph">
<p>The maximum time to attempt recovery before giving up</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>30m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_server_threadcheck_time">master.server.threadcheck.time</h5>
<div class="paragraph">
<p>The time between adjustments of the server thread pool.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_server_threads_minimum">master.server.threads.minimum</h5>
<div class="paragraph">
<p>The minimum number of threads to use to handle incoming requests.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>20</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_tablet_balancer">master.tablet.balancer</h5>
<div class="paragraph">
<p>The balancer class that accumulo will use to make tablet assignment and migration decisions.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.master.balancer.TableLoadBalancer</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_master_walog_closer_implementation">master.walog.closer.implementation</h5>
<div class="paragraph">
<p>A class that implements a mechansim to steal write access to a file</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.master.recovery.HadoopLogCloser</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="TSERV_PREFIX">16.3.5. tserver.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of the tablet servers</p>
</div>
<div class="sect4">
<h5 id="_tserver_archive_walogs">tserver.archive.walogs</h5>
<div class="paragraph">
<p>Keep copies of the WALOGs for debugging purposes</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_bloom_load_concurrent_max">tserver.bloom.load.concurrent.max</h5>
<div class="paragraph">
<p>The number of concurrent threads that will load bloom filters in the background. Setting this to zero will make bloom filters load in the foreground.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>4</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_bulk_assign_threads">tserver.bulk.assign.threads</h5>
<div class="paragraph">
<p>The master delegates bulk file processing and assignment to tablet servers. After the bulk file has been processed, the tablet server will assign the file to the appropriate tablets on all servers. This property controls the number of threads used to communicate to the other servers.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_bulk_process_threads">tserver.bulk.process.threads</h5>
<div class="paragraph">
<p>The master will task a tablet server with pre-processing a bulk file prior to assigning it to the appropriate tablet servers. This configuration value controls the number of threads used to process the files.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_bulk_retry_max">tserver.bulk.retry.max</h5>
<div class="paragraph">
<p>The number of times the tablet server will attempt to assign a file to a tablet as it migrates and splits.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_bulk_timeout">tserver.bulk.timeout</h5>
<div class="paragraph">
<p>The time to wait for a tablet server to process a bulk import request.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_cache_data_size">tserver.cache.data.size</h5>
<div class="paragraph">
<p>Specifies the size of the cache for file data blocks.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>128M</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_cache_index_size">tserver.cache.index.size</h5>
<div class="paragraph">
<p>Specifies the size of the cache for file indices.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>512M</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_client_timeout">tserver.client.timeout</h5>
<div class="paragraph">
<p>Time to wait for clients to continue scans before closing a session.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>3s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_compaction_major_concurrent_max">tserver.compaction.major.concurrent.max</h5>
<div class="paragraph">
<p>The maximum number of concurrent major compactions for a tablet server</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>3</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_compaction_major_delay">tserver.compaction.major.delay</h5>
<div class="paragraph">
<p>Time a tablet server will sleep between checking which tablets need compaction.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>30s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_compaction_major_thread_files_open_max">tserver.compaction.major.thread.files.open.max</h5>
<div class="paragraph">
<p>Max number of files a major compaction thread can open at once.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>10</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_compaction_minor_concurrent_max">tserver.compaction.minor.concurrent.max</h5>
<div class="paragraph">
<p>The maximum number of concurrent minor compactions for a tablet server</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>4</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_compaction_warn_time">tserver.compaction.warn.time</h5>
<div class="paragraph">
<p>When a compaction has not made progress for this time period, a warning will be logged</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>10m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_default_blocksize">tserver.default.blocksize</h5>
<div class="paragraph">
<p>Specifies a default blocksize for the tserver caches</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1M</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_dir_memdump">tserver.dir.memdump</h5>
<div class="paragraph">
<p>A long running scan could possibly hold memory that has been minor compacted. To prevent this, the in memory map is dumped to a local file and the scan is switched to that local file. We can not switch to the minor compacted file because it may have been modified by iterators. The file dumped to the local dir is an exact copy of what was in memory.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PATH<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>/tmp</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_files_open_idle">tserver.files.open.idle</h5>
<div class="paragraph">
<p>Tablet servers leave previously used files open for future queries. This setting determines how much time an unused file should be kept open until it is closed.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_hold_time_max">tserver.hold.time.max</h5>
<div class="paragraph">
<p>The maximum time for a tablet server to be in the "memory full" state. If the tablet server cannot write out memory in this much time, it will assume there is some failure local to its node, and quit. A value of zero is equivalent to forever.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_memory_manager">tserver.memory.manager</h5>
<div class="paragraph">
<p>An implementation of MemoryManger that accumulo will use.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_memory_maps_max">tserver.memory.maps.max</h5>
<div class="paragraph">
<p>Maximum amount of memory that can be used to buffer data written to a tablet server. There are two other properties that can effectively limit memory usage table.compaction.minor.logs.threshold and tserver.walog.max.size. Ensure that table.compaction.minor.logs.threshold * tserver.walog.max.size &gt;= this property.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1G</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_memory_maps_native_enabled">tserver.memory.maps.native.enabled</h5>
<div class="paragraph">
<p>An in-memory data store for accumulo implemented in c++ that increases the amount of data accumulo can hold in memory and avoids Java GC pauses.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes but requires restart of the tserver<br>
<em>Default Value:</em> <code>true</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_metadata_readahead_concurrent_max">tserver.metadata.readahead.concurrent.max</h5>
<div class="paragraph">
<p>The maximum number of concurrent metadata read ahead that will execute.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>8</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_migrations_concurrent_max">tserver.migrations.concurrent.max</h5>
<div class="paragraph">
<p>The maximum number of concurrent tablet migrations for a tablet server</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_monitor_fs">tserver.monitor.fs</h5>
<div class="paragraph">
<p>When enabled the tserver will monitor file systems and kill itself when one switches from rw to ro. This is usually and indication that Linux has detected a bad disk.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>true</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_mutation_queue_max">tserver.mutation.queue.max</h5>
<div class="paragraph">
<p>The amount of memory to use to store write-ahead-log mutations-per-session before flushing them. Since the buffer is per write session, consider the max number of concurrent writer when configuring. When using Hadoop 2, Accumulo will call hsync() on the WAL . For a small number of concurrent writers, increasing this buffer size decreases the frequncy of hsync calls. For a large number of concurrent writers a small buffers size is ok because of group commit.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1M</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_port_client">tserver.port.client</h5>
<div class="paragraph">
<p>The port used for handling client connections on the tablet servers</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PORT<br>
<em>Zookeeper Mutable:</em> yes but requires restart of the tserver<br>
<em>Default Value:</em> <code>9997</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_port_search">tserver.port.search</h5>
<div class="paragraph">
<p>if the ports above are in use, search higher ports until one is available</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_readahead_concurrent_max">tserver.readahead.concurrent.max</h5>
<div class="paragraph">
<p>The maximum number of concurrent read ahead that will execute. This effectively limits the number of long running scans that can run concurrently per tserver.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>16</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_recovery_concurrent_max">tserver.recovery.concurrent.max</h5>
<div class="paragraph">
<p>The maximum number of threads to use to sort logs during recovery</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>2</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_scan_files_open_max">tserver.scan.files.open.max</h5>
<div class="paragraph">
<p>Maximum total files that all tablets in a tablet server can open for scans.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes but requires restart of the tserver<br>
<em>Default Value:</em> <code>100</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_server_message_size_max">tserver.server.message.size.max</h5>
<div class="paragraph">
<p>The maximum size of a message that can be sent to a tablet server.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1G</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_server_threadcheck_time">tserver.server.threadcheck.time</h5>
<div class="paragraph">
<p>The time between adjustments of the server thread pool.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_server_threads_minimum">tserver.server.threads.minimum</h5>
<div class="paragraph">
<p>The minimum number of threads to use to handle incoming requests.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>20</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_session_idle_max">tserver.session.idle.max</h5>
<div class="paragraph">
<p>maximum idle time for a session</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_sort_buffer_size">tserver.sort.buffer.size</h5>
<div class="paragraph">
<p>The amount of memory to use when sorting logs during recovery.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>200M</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_tablet_split_midpoint_files_max">tserver.tablet.split.midpoint.files.max</h5>
<div class="paragraph">
<p>To find a tablets split points, all index files are opened. This setting determines how many index files can be opened at once. When there are more index files than this setting multiple passes must be made, which is slower. However opening too many files at once can cause problems.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>30</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_wal_blocksize">tserver.wal.blocksize</h5>
<div class="paragraph">
<p>The size of the HDFS blocks used to write to the Write-Ahead log. If zero, it will be 110% of tserver.walog.max.size (that is, try to use just one block)</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>0</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_wal_replication">tserver.wal.replication</h5>
<div class="paragraph">
<p>The replication to use when writing the Write-Ahead log to HDFS. If zero, it will use the HDFS default replication setting.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>0</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_wal_sync">tserver.wal.sync</h5>
<div class="paragraph">
<p>Use the SYNC_BLOCK create flag to sync WAL writes to disk. Prevents problems recovering from sudden system resets.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>true</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_wal_sync_method">tserver.wal.sync.method</h5>
<div class="paragraph">
<p>The method to invoke when sync'ing WALs. HSync will provide resiliency in the face of unexpected power outages, at the cost of speed. If method is not available, the legacy 'sync' method will be used to ensure backwards compatibility with older Hadoop versions. A value of 'hflush' is the alternative to the default value of 'hsync' which will result in faster writes, but with less durability. This property was added in Accumulo 1.6.1.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>hsync</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_walog_max_size">tserver.walog.max.size</h5>
<div class="paragraph">
<p>The maximum size for each write-ahead log. See comment for property tserver.memory.maps.max</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1G</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_walog_max_age">tserver.walog.max.age</h5>
<div class="paragraph">
<p>The maximum age for each write-ahead log.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>24h</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_tserver_workq_threads">tserver.workq.threads</h5>
<div class="paragraph">
<p>The number of threads for the distributed work queue. These threads are used for copying failed bulk files.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>2</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="LOGGER_PREFIX">16.3.6. logger.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of the write-ahead logger servers</p>
</div>
<div class="sect4">
<h5 id="_logger_dir_walog">logger.dir.walog</h5>
<div class="paragraph">
<p>The property only needs to be set if upgrading from 1.4 which used to store write-ahead logs on the local filesystem. In 1.5 write-ahead logs are stored in DFS. When 1.5 is started for the first time it will copy any 1.4 write ahead logs into DFS. It is possible to specify a comma-separated list of directories.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PATH<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>walogs</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="GC_PREFIX">16.3.7. gc.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of the accumulo garbage collector.</p>
</div>
<div class="sect4">
<h5 id="_gc_cycle_delay">gc.cycle.delay</h5>
<div class="paragraph">
<p>Time between garbage collection cycles. In each cycle, old files no longer in use are removed from the filesystem.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_gc_cycle_start">gc.cycle.start</h5>
<div class="paragraph">
<p>Time to wait before attempting to garbage collect any old files.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>30s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_gc_port_client">gc.port.client</h5>
<div class="paragraph">
<p>The listening port for the garbage collector&#8217;s monitor service</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PORT<br>
<em>Zookeeper Mutable:</em> yes but requires restart of the gc<br>
<em>Default Value:</em> <code>50091</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_gc_threads_delete">gc.threads.delete</h5>
<div class="paragraph">
<p>The number of threads used to delete files</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>16</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_gc_trash_ignore">gc.trash.ignore</h5>
<div class="paragraph">
<p>Do not use the Trash, even if it is configured</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_gc_file_archive">gc.file.archive</h5>
<div class="paragraph">
<p>Archive any files/directories instead of moving to the HDFS trash or deleting</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_gc_wal_dead_server_wait">gc.wal.dead.server.wait</h5>
<div class="paragraph">
<p>Archive any files/directories instead of moving to the HDFS trash or deleting</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1h</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="MONITOR_PREFIX">16.3.8. monitor.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of the monitor web server.</p>
</div>
<div class="sect4">
<h5 id="_monitor_banner_background">monitor.banner.background</h5>
<div class="paragraph">
<p>The background color of the banner text displayed on the monitor page.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>#304065</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_monitor_banner_color">monitor.banner.color</h5>
<div class="paragraph">
<p>The color of the banner text displayed on the monitor page.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>#c4c4c4</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_monitor_banner_text">monitor.banner.text</h5>
<div class="paragraph">
<p>The banner text displayed on the monitor page.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_monitor_lock_check_interval">monitor.lock.check.interval</h5>
<div class="paragraph">
<p>The amount of time to sleep between checking for the Montior ZooKeeper lock</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>5s</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_monitor_port_client">monitor.port.client</h5>
<div class="paragraph">
<p>The listening port for the monitor&#8217;s http service</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PORT<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>50095</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_monitor_port_log4j">monitor.port.log4j</h5>
<div class="paragraph">
<p>The listening port for the monitor&#8217;s log4j logging collection.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PORT<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>4560</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="TRACE_PREFIX">16.3.9. trace.*</h4>
<div class="paragraph">
<p>Properties in this category affect the behavior of distributed tracing.</p>
</div>
<div class="sect4">
<h5 id="_trace_password">trace.password</h5>
<div class="paragraph">
<p>The password for the user used to store distributed traces</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>secret</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_trace_port_client">trace.port.client</h5>
<div class="paragraph">
<p>The listening port for the trace server</p>
</div>
<div class="paragraph">
<p><em>Type:</em> PORT<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>12234</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_trace_table">trace.table</h5>
<div class="paragraph">
<p>The name of the table to store distributed traces</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>trace</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_trace_token_type">trace.token.type</h5>
<div class="paragraph">
<p>An AuthenticationToken type supported by the authorizer</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>org.apache.accumulo.core.client.security.tokens.PasswordToken</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_trace_user">trace.user</h5>
<div class="paragraph">
<p>The name of the user to store distributed traces</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> no<br>
<em>Default Value:</em> <code>root</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="TRACE_TOKEN_PROPERTY_PREFIX">16.3.10. trace.token.property.*</h4>
<div class="paragraph">
<p>The prefix used to create a token for storing distributed traces. For each propetry required by trace.token.type, place this prefix in front of it.</p>
</div>
</div>
<div class="sect3">
<h4 id="TABLE_PREFIX">16.3.11. table.*</h4>
<div class="paragraph">
<p>Properties in this category affect tablet server treatment of tablets, but can be configured on a per-table basis. Setting these properties in the site file will override the default globally for all tables and not any specific table. However, both the default and the global setting can be overridden per table using the table operations API or in the shell, which sets the overridden value in zookeeper. Restarting accumulo tablet servers after setting these properties in the site file will cause the global setting to take effect. However, you must use the API or the shell to change properties in zookeeper that are set on a table.</p>
</div>
<div class="sect4">
<h5 id="_table_balancer">table.balancer</h5>
<div class="paragraph">
<p>This property can be set to allow the LoadBalanceByTable load balancer to change the called Load Balancer for this table</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.server.master.balancer.DefaultLoadBalancer</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_bloom_enabled">table.bloom.enabled</h5>
<div class="paragraph">
<p>Use bloom filters on this table.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_bloom_error_rate">table.bloom.error.rate</h5>
<div class="paragraph">
<p>Bloom filter error rate.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> FRACTION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>0.5%</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_bloom_hash_type">table.bloom.hash.type</h5>
<div class="paragraph">
<p>The bloom filter hash type</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>murmur</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_bloom_key_functor">table.bloom.key.functor</h5>
<div class="paragraph">
<p>A function that can transform the key prior to insertion and check of bloom filter. org.apache.accumulo.core.file.keyfunctor.RowFunctor,,org.apache.accumulo.core.file.keyfunctor.ColumnFamilyFunctor, and org.apache.accumulo.core.file.keyfunctor.ColumnQualifierFunctor are allowable values. One can extend any of the above mentioned classes to perform specialized parsing of the key.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.core.file.keyfunctor.RowFunctor</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_bloom_load_threshold">table.bloom.load.threshold</h5>
<div class="paragraph">
<p>This number of seeks that would actually use a bloom filter must occur before a file&#8217;s bloom filter is loaded. Set this to zero to initiate loading of bloom filters when a file is opened.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_bloom_size">table.bloom.size</h5>
<div class="paragraph">
<p>Bloom filter size, as number of keys.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1048576</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_cache_block_enable">table.cache.block.enable</h5>
<div class="paragraph">
<p>Determines whether file block cache is enabled.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_cache_index_enable">table.cache.index.enable</h5>
<div class="paragraph">
<p>Determines whether index cache is enabled.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>true</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_classpath_context">table.classpath.context</h5>
<div class="paragraph">
<p>Per table classpath context</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_compaction_major_everything_idle">table.compaction.major.everything.idle</h5>
<div class="paragraph">
<p>After a tablet has been idle (no mutations) for this time period it may have all of its files compacted into one. There is no guarantee an idle tablet will be compacted. Compactions of idle tablets are only started when regular compactions are not running. Idle compactions only take place for tablets that have one or more files.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1h</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_compaction_major_ratio">table.compaction.major.ratio</h5>
<div class="paragraph">
<p>minimum ratio of total input size to maximum input file size for running a major compactionWhen adjusting this property you may want to also adjust table.file.max. Want to avoid the situation where only merging minor compactions occur.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> FRACTION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>3</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_compaction_minor_idle">table.compaction.minor.idle</h5>
<div class="paragraph">
<p>After a tablet has been idle (no mutations) for this time period it may have its in-memory map flushed to disk in a minor compaction. There is no guarantee an idle tablet will be compacted.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> TIMEDURATION<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>5m</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_compaction_minor_logs_threshold">table.compaction.minor.logs.threshold</h5>
<div class="paragraph">
<p>When there are more than this many write-ahead logs against a tablet, it will be minor compacted. See comment for property tserver.memory.maps.max</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>3</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_failures_ignore">table.failures.ignore</h5>
<div class="paragraph">
<p>If you want queries for your table to hang or fail when data is missing from the system, then set this to false. When this set to true missing data will be reported but queries will still run possibly returning a subset of the data.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>false</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_blocksize">table.file.blocksize</h5>
<div class="paragraph">
<p>Overrides the hadoop dfs.block.size setting so that files have better query performance. The maximum value for this is 2147483647</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>0B</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_compress_blocksize">table.file.compress.blocksize</h5>
<div class="paragraph">
<p>Similar to the hadoop io.seqfile.compress.blocksize setting, so that files have better query performance. The maximum value for this is 2147483647. (This setting is the size threshold prior to compression, and applies even compression is disabled.)</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>100K</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_compress_blocksize_index">table.file.compress.blocksize.index</h5>
<div class="paragraph">
<p>Determines how large index blocks can be in files that support multilevel indexes. The maximum value for this is 2147483647. (This setting is the size threshold prior to compression, and applies even compression is disabled.)</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>128K</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_compress_type">table.file.compress.type</h5>
<div class="paragraph">
<p>One of gz,lzo,none</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>gz</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_max">table.file.max</h5>
<div class="paragraph">
<p>Determines the max # of files each tablet in a table can have. When adjusting this property you may want to consider adjusting table.compaction.major.ratio also. Setting this property to 0 will make it default to tserver.scan.files.open.max-1, this will prevent a tablet from having more files than can be opened. Setting this property low may throttle ingest and increase query performance.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>15</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_replication">table.file.replication</h5>
<div class="paragraph">
<p>Determines how many replicas to keep of a tables' files in HDFS. When this value is LTE 0, HDFS defaults are used.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> COUNT<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>0</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_file_type">table.file.type</h5>
<div class="paragraph">
<p>Change the type of file a table writes</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>rf</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_formatter">table.formatter</h5>
<div class="paragraph">
<p>The Formatter class to apply on results in the shell</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.core.util.format.DefaultFormatter</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_groups_enabled">table.groups.enabled</h5>
<div class="paragraph">
<p>A comma separated list of locality group names to enable for this table.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_interepreter">table.interepreter</h5>
<div class="paragraph">
<p>The ScanInterpreter class to apply on scan arguments in the shell</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.core.util.interpret.DefaultScanInterpreter</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_majc_compaction_strategy">table.majc.compaction.strategy</h5>
<div class="paragraph">
<p>A customizable major compaction strategy.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> CLASSNAME<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_scan_max_memory">table.scan.max.memory</h5>
<div class="paragraph">
<p>The maximum amount of memory that will be used to cache results of a client query/scan. Once this limit is reached, the buffered data is sent to the client.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>512K</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_security_scan_visibility_default">table.security.scan.visibility.default</h5>
<div class="paragraph">
<p>The security label that will be assumed at scan time if an entry does not have a visibility set.
Note: An empty security label is displayed as []. The scan results will show an empty visibility even if the visibility from this setting is applied to the entry.
CAUTION: If a particular key has an empty security label AND its table&#8217;s default visibility is also empty, access will ALWAYS be granted for users with permission to that table. Additionally, if this field is changed, all existing data with an empty visibility label will be interpreted with the new label on the next scan.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> STRING<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <em>empty</em></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_split_threshold">table.split.threshold</h5>
<div class="paragraph">
<p>When combined size of files exceeds this amount a tablet is split.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> MEMORY<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>1G</code></p>
</div>
</div>
<div class="sect4">
<h5 id="_table_walog_enabled">table.walog.enabled</h5>
<div class="paragraph">
<p>Use the write-ahead log to prevent the loss of data.</p>
</div>
<div class="paragraph">
<p><em>Type:</em> BOOLEAN<br>
<em>Zookeeper Mutable:</em> yes<br>
<em>Default Value:</em> <code>true</code></p>
</div>
</div>
</div>
<div class="sect3">
<h4 id="TABLE_CONSTRAINT_PREFIX">16.3.12. table.constraint.*</h4>
<div class="paragraph">
<p>Properties in this category are per-table properties that add constraints to a table. These properties start with the category prefix, followed by a number, and their values correspond to a fully qualified Java class that implements the Constraint interface.
For example:
table.constraint.1 = org.apache.accumulo.core.constraints.MyCustomConstraint
and:
table.constraint.2 = my.package.constraints.MySecondConstraint</p>
</div>
</div>
<div class="sect3">
<h4 id="TABLE_ITERATOR_PREFIX">16.3.13. table.iterator.*</h4>
<div class="paragraph">
<p>Properties in this category specify iterators that are applied at various stages (scopes) of interaction with a table. These properties start with the category prefix, followed by a scope (minc, majc, scan, etc.), followed by a period, followed by a name, as in table.iterator.scan.vers, or table.iterator.scan.custom. The values for these properties are a number indicating the ordering in which it is applied, and a class name such as:
table.iterator.scan.vers = 10,org.apache.accumulo.core.iterators.VersioningIterator
These iterators can take options if additional properties are set that look like this property, but are suffixed with a period, followed by <em>opt</em> followed by another period, and a property name.
For example, table.iterator.minc.vers.opt.maxVersions = 3</p>
</div>
</div>
<div class="sect3">
<h4 id="TABLE_LOCALITY_GROUP_PREFIX">16.3.14. table.group.*</h4>
<div class="paragraph">
<p>Properties in this category are per-table properties that define locality groups in a table. These properties start with the category prefix, followed by a name, followed by a period, and followed by a property for that group.
For example table.group.group1=x,y,z sets the column families for a group called group1. Once configured, group1 can be enabled by adding it to the list of groups in the table.groups.enabled property.
Additional group options may be specified for a named group by setting table.group.&lt;name&gt;.opt.&lt;key&gt;=&lt;value&gt;.</p>
</div>
</div>
<div class="sect3">
<h4 id="TABLE_COMPACTION_STRATEGY_PREFIX">16.3.15. table.majc.compaction.strategy.opts.*</h4>
<div class="paragraph">
<p>Properties in this category are used to configure the compaction strategy.</p>
</div>
</div>
<div class="sect3">
<h4 id="VFS_CONTEXT_CLASSPATH_PROPERTY">16.3.16. general.vfs.context.classpath.*</h4>
<div class="paragraph">
<p>Properties in this category are define a classpath. These properties start with the category prefix, followed by a context name. The value is a comma seperated list of URIs. Supports full regex on filename alone. For example, general.vfs.context.classpath.cx1=hdfs://nn1:9902/mylibdir/*.jar. You can enable post delegation for a context, which will load classes from the context first instead of the parent first. Do this by setting general.vfs.context.classpath.&lt;name&gt;.delegation=post, where &lt;name&gt; is your context nameIf delegation is not specified, it defaults to loading from parent classloader first.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_property_types">16.4. Property Types</h3>
<div class="sect3">
<h4 id="_duration">16.4.1. duration</h4>
<div class="paragraph">
<p>A non-negative integer optionally followed by a unit of time (whitespace disallowed), as in 30s.
If no unit of time is specified, seconds are assumed. Valid units are <em>ms</em>, <em>s</em>, <em>m</em>, <em>h</em> for milliseconds, seconds, minutes, and hours.
Examples of valid durations are <em>600</em>, <em>30s</em>, <em>45m</em>, <em>30000ms</em>, <em>3d</em>, and <em>1h</em>.
Examples of invalid durations are <em>1w</em>, <em>1h30m</em>, <em>1s 200ms</em>, <em>ms</em>, <em>', and 'a</em>.
Unless otherwise stated, the max value for the duration represented in milliseconds is 9223372036854775807</p>
</div>
</div>
<div class="sect3">
<h4 id="_date_time">16.4.2. date/time</h4>
<div class="paragraph">
<p>A date/time string in the format: YYYYMMDDhhmmssTTT where TTT is the 3 character time zone</p>
</div>
</div>
<div class="sect3">
<h4 id="_memory">16.4.3. memory</h4>
<div class="paragraph">
<p>A positive integer optionally followed by a unit of memory (whitespace disallowed), as in 2G.
If no unit is specified, bytes are assumed. Valid units are <em>B</em>, <em>K</em>, <em>M</em>, <em>G</em>, for bytes, kilobytes, megabytes, and gigabytes.
Examples of valid memories are <em>1024</em>, <em>20B</em>, <em>100K</em>, <em>1500M</em>, <em>2G</em>.
Examples of invalid memories are <em>1M500K</em>, <em>1M 2K</em>, <em>1MB</em>, <em>1.5G</em>, <em>1,024K</em>, <em>', and 'a</em>.
Unless otherwise stated, the max value for the memory represented in bytes is 9223372036854775807</p>
</div>
</div>
<div class="sect3">
<h4 id="_host_list">16.4.4. host list</h4>
<div class="paragraph">
<p>A comma-separated list of hostnames or ip addresses, with optional port numbers.
Examples of valid host lists are <em>localhost:2000,www.example.com,10.10.1.1:500</em> and <em>localhost</em>.
Examples of invalid host lists are <em>', ':1000</em>, and <em>localhost:80000</em></p>
</div>
</div>
<div class="sect3">
<h4 id="_port">16.4.5. port</h4>
<div class="paragraph">
<p>An positive integer in the range 1024-65535, not already in use or specified elsewhere in the configuration</p>
</div>
</div>
<div class="sect3">
<h4 id="_count">16.4.6. count</h4>
<div class="paragraph">
<p>A non-negative integer in the range of 0-2147483647</p>
</div>
</div>
<div class="sect3">
<h4 id="_fraction_percentage">16.4.7. fraction/percentage</h4>
<div class="paragraph">
<p>A floating point number that represents either a fraction or, if suffixed with the <em>%</em> character, a percentage.
Examples of valid fractions/percentages are <em>10</em>, <em>1000%</em>, <em>0.05</em>, <em>5%</em>, <em>0.2%</em>, <em>0.0005</em>.
Examples of invalid fractions/percentages are <em>', '10 percent</em>, <em>Hulk Hogan</em></p>
</div>
</div>
<div class="sect3">
<h4 id="_path">16.4.8. path</h4>
<div class="paragraph">
<p>A string that represents a filesystem path, which can be either relative or absolute to some directory. The filesystem depends on the property. The following environment variables will be substituted: [ACCUMULO_HOME, ACCUMULO_CONF_DIR]</p>
</div>
</div>
<div class="sect3">
<h4 id="_absolute_path">16.4.9. absolute path</h4>
<div class="paragraph">
<p>An absolute filesystem path. The filesystem depends on the property. This is the same as path, but enforces that its root is explicitly specified.</p>
</div>
</div>
<div class="sect3">
<h4 id="_java_class">16.4.10. java class</h4>
<div class="paragraph">
<p>A fully qualified java class name representing a class on the classpath.
An example is <em>java.lang.String</em>, rather than <em>String</em></p>
</div>
</div>
<div class="sect3">
<h4 id="_string">16.4.11. string</h4>
<div class="paragraph">
<p>An arbitrary string of characters whose format is unspecified and interpreted based on the context of the property to which it applies.</p>
</div>
</div>
<div class="sect3">
<h4 id="_boolean">16.4.12. boolean</h4>
<div class="paragraph">
<p>Has a value of either <em>true</em> or <em>false</em></p>
</div>
</div>
<div class="sect3">
<h4 id="_uri">16.4.13. uri</h4>
<div class="paragraph">
<p>A valid URI</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div id="footer">
<div id="footer-text">
Last updated 2016-02-09 15:30:00 EST
</div>
</div>
</body>
</html>