blob: 19bb404d2fe73df986b60cedc6c6f19ba162b620 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Apache Accumulo Balancer Example
For some data access patterns, its important to spread groups of tablets within
a table out evenly. Accumulo has a balancer that can do this using a regular
expression to group tablets. This example shows how this balancer spreads 4
groups of tablets within a table evenly across 17 tablet servers.
Below shows creating a table and adding splits. For this example we would like
all of the tablets where the split point has the same two digits to be on
different tservers. This gives us four groups of tablets: 01, 02, 03, and 04.
root@accumulo> createnamespace examples
root@accumulo> createtable examples.testRGB
root@accumulo examples.testRGB> addsplits -t examples.testRGB 01b 01m 01r 01z 02b 02m 02r 02z 03b 03m 03r 03z 04a 04b 04c 04d 04e 04f 04g 04h 04i 04j 04k 04l 04m 04n 04o 04p
Run the tables command with the "-l" option to find the table ID.
root@accumulo examples.testRGB> tables -l
accumulo.metadata => !0
accumulo.replication => +rep
accumulo.root => +r
testRGB => 2
trace => 1
Using the table ID for part of the begin and end row scan options, look at the locations in the metadata table.
root@accumulo examples.testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc
2;01b loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;01m loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;01r loc:14a5f6e079d0011 [] ip-10-1-2-15:9997
2;01z loc:14a5f6e079d000f [] ip-10-1-2-13:9997
2;02b loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;02m loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;02r loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;02z loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;03b loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2;03m loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;03r loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2;03z loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;04a loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;04b loc:14a5f6e079d0010 [] ip-10-1-2-17:9997
2;04c loc:14a5f6e079d0010 [] ip-10-1-2-17:9997
2;04d loc:24a5f6e07d3000c [] ip-10-1-2-16:9997
2;04e loc:24a5f6e07d3000d [] ip-10-1-2-29:9997
2;04f loc:24a5f6e07d3000c [] ip-10-1-2-16:9997
2;04g loc:24a5f6e07d3000a [] ip-10-1-2-14:9997
2;04h loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;04i loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;04j loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;04k loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;04l loc:24a5f6e07d3000b [] ip-10-1-2-22:9997
2;04m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;04n loc:24a5f6e07d3000b [] ip-10-1-2-22:9997
2;04o loc:34a5f6e086b000a [] ip-10-1-2-18:9997
2;04p loc:24a5f6e07d30008 [] ip-10-1-2-24:9997
2< loc:24a5f6e07d30008 [] ip-10-1-2-24:9997
Below the information above was massaged to show which tablet groups are on
each tserver. The four tablets in group 03 are on two tservers, ideally those
tablets would be spread across 4 tservers. Note the default tablet (2<) was
categorized as group 04 below.
ip-10-1-2-13:9997 01
ip-10-1-2-14:9997 04
ip-10-1-2-15:9997 01
ip-10-1-2-16:9997 04 04
ip-10-1-2-17:9997 04 04
ip-10-1-2-18:9997 04
ip-10-1-2-19:9997 04 04
ip-10-1-2-20:9997 03 03
ip-10-1-2-21:9997 03 03
ip-10-1-2-22:9997 04 04
ip-10-1-2-23:9997 04 04
ip-10-1-2-24:9997 04 04
ip-10-1-2-25:9997 01 01
ip-10-1-2-26:9997 02 04
ip-10-1-2-27:9997 02 02
ip-10-1-2-28:9997 02 04
ip-10-1-2-29:9997 04
To remedy this situation, the RegexGroupBalancer is configured with the
commands below. The configured regular expression selects the first two digits
from a tablets end row as the group id. Tablets that don't match and the
default tablet are configured to be in group 04.
root@accumulo examples.testRGB> config -t examples.testRGB -s table.custom.balancer.group.regex.pattern=(\d\d).*
root@accumulo examples.testRGB> config -t examples.testRGB -s table.custom.balancer.group.regex.default=04
root@accumulo examples.testRGB> config -t examples.testRGB -s table.balancer=org.apache.accumulo.core.spi.balancer.RegexGroupBalancer
After waiting a bit, look at the tablet locations again and all is good.
root@accumulo examples.testRGB> scan -t accumulo.metadata -b 2; -e 2< -c loc
2;01b loc:34a5f6e086b000a [] ip-10-1-2-18:9997
2;01m loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;01r loc:14a5f6e079d0011 [] ip-10-1-2-15:9997
2;01z loc:14a5f6e079d000f [] ip-10-1-2-13:9997
2;02b loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;02m loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;02r loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;02z loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;03b loc:24a5f6e07d3000d [] ip-10-1-2-29:9997
2;03m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;03r loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2;03z loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;04a loc:34a5f6e086b000b [] ip-10-1-2-26:9997
2;04b loc:34a5f6e086b000c [] ip-10-1-2-25:9997
2;04c loc:14a5f6e079d0010 [] ip-10-1-2-17:9997
2;04d loc:14a5f6e079d000e [] ip-10-1-2-20:9997
2;04e loc:24a5f6e07d3000d [] ip-10-1-2-29:9997
2;04f loc:24a5f6e07d3000c [] ip-10-1-2-16:9997
2;04g loc:24a5f6e07d3000a [] ip-10-1-2-14:9997
2;04h loc:14a5f6e079d000c [] ip-10-1-2-28:9997
2;04i loc:14a5f6e079d0011 [] ip-10-1-2-15:9997
2;04j loc:34a5f6e086b000d [] ip-10-1-2-19:9997
2;04k loc:14a5f6e079d0012 [] ip-10-1-2-27:9997
2;04l loc:14a5f6e079d000f [] ip-10-1-2-13:9997
2;04m loc:24a5f6e07d30009 [] ip-10-1-2-23:9997
2;04n loc:24a5f6e07d3000b [] ip-10-1-2-22:9997
2;04o loc:34a5f6e086b000a [] ip-10-1-2-18:9997
2;04p loc:14a5f6e079d000d [] ip-10-1-2-21:9997
2< loc:24a5f6e07d30008 [] ip-10-1-2-24:9997
Once again, the data above is transformed to make it easier to see which groups
are on tservers. The transformed data below shows that all groups are now
evenly spread.
ip-10-1-2-13:9997 01 04
ip-10-1-2-14:9997 04
ip-10-1-2-15:9997 01 04
ip-10-1-2-16:9997 04
ip-10-1-2-17:9997 04
ip-10-1-2-18:9997 01 04
ip-10-1-2-19:9997 02 04
ip-10-1-2-20:9997 03 04
ip-10-1-2-21:9997 03 04
ip-10-1-2-22:9997 04
ip-10-1-2-23:9997 03 04
ip-10-1-2-24:9997 04
ip-10-1-2-25:9997 01 04
ip-10-1-2-26:9997 02 04
ip-10-1-2-27:9997 02 04
ip-10-1-2-28:9997 02 04
ip-10-1-2-29:9997 03 04
If you need this functionality, but a regular expression does not meet your
needs then extend GroupBalancer. This allows you to specify a partitioning
function in Java. Use the RegexGroupBalancer source as an example.