blob: f192a93504f8da33bbdc7a343ab51febd77483ca [file] [log] [blame]
Title: Apache Accumulo Hello World Example
Notice: Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.
http://www.apache.org/licenses/LICENSE-2.0
.
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
For some data access patterns, its important to spread groups of tablets within
a table out evenly. Accumulo has a balancer that can do this using a regular
expression to group tablets. This example shows how this balancer spreads 4
groups of tablets within a table evenly across 17 tablet servers.
Below shows creating a table and adding splits. For this example we would like
all of the tablets where the split point has the same two digits to be on
different tservers. This gives us four groups of tablets: 01, 02, 03, and 04.
root@accumulo> createtable testRGB
root@accumulo testRGB> addsplits -t testRGB 01b 01m 01r 01z 02b 02m 02r 02z 03b 03m 03r 03z 04a 04b 04c 04d 04e 04f 04g 04h 04i 04j 04k 04l 04m 04n 04o 04p
root@accumulo testRGB> tables -l
accumulo.metadata => !0
accumulo.replication => +rep
accumulo.root => +r
testRGB => 2
trace => 1
After adding the splits we look at the locations in the metadata table.
root@accumulo testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc
2;01b loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;01m loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;01r loc:14a5f6e079d0011 [] ip-10-1-2-15:9997
2;01z loc:14a5f6e079d000f [] ip-10-1-2-13:9997
2;02b loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;02m loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;02r loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;02z loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;03b loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2;03m loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;03r loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2;03z loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;04a loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;04b loc:14a5f6e079d0010 [] ip-10-1-2-17:9997
2;04c loc:14a5f6e079d0010 [] ip-10-1-2-17:9997
2;04d loc:24a5f6e07d3000c [] ip-10-1-2-16:9997
2;04e loc:24a5f6e07d3000d [] ip-10-1-2-29:9997
2;04f loc:24a5f6e07d3000c [] ip-10-1-2-16:9997
2;04g loc:24a5f6e07d3000a [] ip-10-1-2-14:9997
2;04h loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;04i loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;04j loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;04k loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;04l loc:24a5f6e07d3000b [] ip-10-1-2-22:9997
2;04m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;04n loc:24a5f6e07d3000b [] ip-10-1-2-22:9997
2;04o loc:34a5f6e086b000a [] ip-10-1-2-18:9997
2;04p loc:24a5f6e07d30008 [] ip-10-1-2-24:9997
2< loc:24a5f6e07d30008 [] ip-10-1-2-24:9997
Below the information above was massaged to show which tablet groups are on
each tserver. The four tablets in group 03 are on two tservers, ideally those
tablets would be spread across 4 tservers. Note the default tablet (2<) was
categorized as group 04 below.
ip-10-1-2-13:9997 01
ip-10-1-2-14:9997 04
ip-10-1-2-15:9997 01
ip-10-1-2-16:9997 04 04
ip-10-1-2-17:9997 04 04
ip-10-1-2-18:9997 04
ip-10-1-2-19:9997 04 04
ip-10-1-2-20:9997 03 03
ip-10-1-2-21:9997 03 03
ip-10-1-2-22:9997 04 04
ip-10-1-2-23:9997 04 04
ip-10-1-2-24:9997 04 04
ip-10-1-2-25:9997 01 01
ip-10-1-2-26:9997 02 04
ip-10-1-2-27:9997 02 02
ip-10-1-2-28:9997 02 04
ip-10-1-2-29:9997 04
To remedy this situation, the RegexGroupBalancer is configured with the
commands below. The configured regular expression selects the first two digits
from a tablets end row as the group id. Tablets that don't match and the
default tablet are configured to be in group 04.
root@accumulo testRGB> config -t testRGB -s table.custom.balancer.group.regex.pattern=(\\d\\d).*
root@accumulo testRGB> config -t testRGB -s table.custom.balancer.group.regex.default=04
root@accumulo testRGB> config -t testRGB -s table.balancer=org.apache.accumulo.server.master.balancer.RegexGroupBalancer
After waiting a little bit, look at the tablet locations again and all is good.
root@accumulo testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc
2;01b loc:34a5f6e086b000a [] ip-10-1-2-18:9997
2;01m loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;01r loc:14a5f6e079d0011 [] ip-10-1-2-15:9997
2;01z loc:14a5f6e079d000f [] ip-10-1-2-13:9997
2;02b loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;02m loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;02r loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;02z loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;03b loc:24a5f6e07d3000d [] ip-10-1-2-29:9997
2;03m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;03r loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2;03z loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;04a loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;04b loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;04c loc:14a5f6e079d0010 [] ip-10-1-2-17:9997
2;04d loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;04e loc:24a5f6e07d3000d [] ip-10-1-2-29:9997
2;04f loc:24a5f6e07d3000c [] ip-10-1-2-16:9997
2;04g loc:24a5f6e07d3000a [] ip-10-1-2-14:9997
2;04h loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;04i loc:14a5f6e079d0011 [] ip-10-1-2-15:9997
2;04j loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;04k loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;04l loc:14a5f6e079d000f [] ip-10-1-2-13:9997
2;04m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;04n loc:24a5f6e07d3000b [] ip-10-1-2-22:9997
2;04o loc:34a5f6e086b000a [] ip-10-1-2-18:9997
2;04p loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2< loc:24a5f6e07d30008 [] ip-10-1-2-24:9997
Once again, the data above is transformed to make it easier to see which groups
are on tservers. The transformed data below shows that all groups are now
evenly spread.
ip-10-1-2-13:9997 01 04
ip-10-1-2-14:9997 04
ip-10-1-2-15:9997 01 04
ip-10-1-2-16:9997 04
ip-10-1-2-17:9997 04
ip-10-1-2-18:9997 01 04
ip-10-1-2-19:9997 02 04
ip-10-1-2-20:9997 03 04
ip-10-1-2-21:9997 03 04
ip-10-1-2-22:9997 04
ip-10-1-2-23:9997 03 04
ip-10-1-2-24:9997 04
ip-10-1-2-25:9997 01 04
ip-10-1-2-26:9997 02 04
ip-10-1-2-27:9997 02 04
ip-10-1-2-28:9997 02 04
ip-10-1-2-29:9997 03 04
If you need this functionality, but a regular expression does not meet your
needs then extend GroupBalancer. This allows you to specify a partitioning
function in Java. Use the RegexGroupBalancer source as an example.