blob: d4471e81c35da3a7ebc2508e4eaa49b007705130 [file] [log] [blame]
////
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
////
[[cp]]
= Apache HBase Coprocessors
:doctype: book
:numbered:
:toc: left
:icons: font
:experimental:
HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, pages 66-67.). Coprocessors function in a similar way to Linux kernel modules.
They provide a way to run server-level code against locally-stored data.
The functionality they provide is very powerful, but also carries great risk and can have adverse effects on the system, at the level of the operating system.
The information in this chapter is primarily sourced and heavily reused from Mingjie Lai's blog post at https://blogs.apache.org/hbase/entry/coprocessor_introduction.
Coprocessors are not designed to be used by end users of HBase, but by HBase developers who need to add specialized functionality to HBase.
One example of the use of coprocessors is pluggable compaction and scan policies, which are provided as coprocessors in link:https://issues.apache.org/jira/browse/HBASE-6427[HBASE-6427].
== Coprocessor Framework
The implementation of HBase coprocessors diverges from the BigTable implementation.
The HBase framework provides a library and runtime environment for executing user code within the HBase region server and master processes.
The framework API is provided in the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor] package.
Two different types of coprocessors are provided by the framework, based on their scope.
.Types of Coprocessors
System Coprocessors::
System coprocessors are loaded globally on all tables and regions hosted by a region server.
Table Coprocessors::
You can specify which coprocessors should be loaded on all regions for a table on a per-table basis.
The framework provides two different aspects of extensions as well: _observers_ and _endpoints_.
Observers::
Observers are analogous to triggers in conventional databases.
They allow you to insert user code by overriding upcall methods provided by the coprocessor framework.
Callback functions are executed from core HBase code when events occur.
Callbacks are handled by the framework, and the coprocessor itself only needs to insert the extended or alternate functionality.
Endpoints (HBase 0.96.x and later)::
The implementation for endpoints changed significantly in HBase 0.96.x due to the introduction of protocol buffers (protobufs) (link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5488]). If you created endpoints before 0.96.x, you will need to rewrite them.
Endpoints are now defined and callable as protobuf services, rather than endpoint invocations passed through as Writable blobs
Endpoints (HBase 0.94.x and earlier)::
Dynamic RPC endpoints resemble stored procedures.
An endpoint can be invoked at any time from the client.
When it is invoked, it is executed remotely at the target region or regions, and results of the executions are returned to the client.
== Examples
An example of an observer is included in _hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestZooKeeperScanPolicyObserver.java_.
Several endpoint examples are included in the same directory.
== Building A Coprocessor
Before you can build a processor, it must be developed, compiled, and packaged in a JAR file.
The next step is to configure the coprocessor framework to use your coprocessor.
You can load the coprocessor from your HBase configuration, so that the coprocessor starts with HBase, or you can configure the coprocessor from the HBase shell, as a table attribute, so that it is loaded dynamically when the table is opened or reopened.
=== Load from Configuration
To configure a coprocessor to be loaded when HBase starts, modify the RegionServer's _hbase-site.xml_ and configure one of the following properties, based on the type of observer you are configuring:
* `hbase.coprocessor.region.classes`for RegionObservers and Endpoints
* `hbase.coprocessor.wal.classes`for WALObservers
* `hbase.coprocessor.master.classes`for MasterObservers
.Example RegionObserver Configuration
====
In this example, one RegionObserver is configured for all the HBase tables.
[source,xml]
----
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>
----
====
If multiple classes are specified for loading, the class names must be comma-separated.
The framework attempts to load all the configured classes using the default class loader.
Therefore, the jar file must reside on the server-side HBase classpath.
Coprocessors which are loaded in this way will be active on all regions of all tables.
These are the system coprocessor introduced earlier.
The first listed Coprocessors will be assigned the priority `Coprocessor.Priority.SYSTEM`.
Each subsequent coprocessor in the list will have its priority value incremented by one (which
reduces its priority, because priorities have the natural sort order of Integers).
+
These priority values can be manually overriden in hbase-site.xml. This can be useful if you
want to guarantee that a coprocessor will execute after another. For example, in the following
configuration `SumEndPoint` would be guaranteed to go last, except in the case of a tie with
another coprocessor:
+
[source,xml]
----
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.myname.hbase.coprocessor.endpoint.SumEndPoint|2147483647</value>
</property>
----
When calling out to registered observers, the framework executes their callbacks methods in the sorted order of their priority.
Ties are broken arbitrarily.
=== Load from the HBase Shell
You can load a coprocessor on a specific table via a table attribute.
The following example will load the `FooRegionObserver` observer when table `t1` is read or re-read.
.Load a Coprocessor On a Table Using HBase Shell
====
----
hbase(main):005:0> alter 't1', METHOD => 'table_att',
'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.0730 seconds
hbase(main):006:0> describe 't1'
DESCRIPTION ENABLED
{NAME => 't1', coprocessor$1 => 'hdfs:///foo.jar|com.foo.FooRegio false
nObserver|1001|arg1=1,arg2=2', FAMILIES => [{NAME => 'c1', DATA_B
LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
=> '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>
'0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZ
E => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLO
CKCACHE => 'true'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE',
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3'
, COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647'
, KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY
=> 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0190 seconds
----
====
The coprocessor framework will try to read the class information from the coprocessor table attribute value.
The value contains four pieces of information which are separated by the `|` character.
* File path: The jar file containing the coprocessor implementation must be in a location where all region servers can read it.
You could copy the file onto the local disk on each region server, but it is recommended to store it in HDFS.
* Class name: The full class name of the coprocessor.
* Priority: An integer.
The framework will determine the execution sequence of all configured observers registered at the same hook using priorities.
This field can be left blank.
In that case the framework will assign a default priority value.
* Arguments: This field is passed to the coprocessor implementation.
.Unload a Coprocessor From a Table Using HBase Shell
====
----
hbase(main):007:0> alter 't1', METHOD => 'table_att_unset',
hbase(main):008:0* NAME => 'coprocessor$1'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1130 seconds
hbase(main):009:0> describe 't1'
DESCRIPTION ENABLED
{NAME => 't1', FAMILIES => [{NAME => 'c1', DATA_BLOCK_ENCODING => false
'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSION
S => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '214
7483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN
_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true
'}, {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_C
ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCO
DE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0180 seconds
----
====
WARNING: There is no guarantee that the framework will load a given coprocessor successfully.
For example, the shell command neither guarantees a jar file exists at a particular location nor verifies whether the given class is actually contained in the jar file.
== Check the Status of a Coprocessor
To check the status of a coprocessor after it has been configured, use the `status` HBase Shell command.
----
hbase(main):020:0> status 'detailed'
version 0.92-tm-6
0 regionsInTransition
master coprocessors: []
1 live servers
localhost:52761 1328082515520
requestsPerSecond=3, numberOfOnlineRegions=3, usedHeapMB=32, maxHeapMB=995
-ROOT-,,0
numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0, readRequestsCount=54, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
.META.,,1
numberOfStores=1, numberOfStorefiles=0, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0, readRequestsCount=97, writeRequestsCount=4, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, coprocessors=[]
t1,,1328082575190.c0491168a27620ffe653ec6c04c9b4d1.
numberOfStores=2, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0,
totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN,
coprocessors=[AggregateImplementation]
0 dead servers
----
== Monitor Time Spent in Coprocessors
HBase 0.98.5 introduced the ability to monitor some statistics relating to the amount of time spent executing a given coprocessor.
You can see these statistics via the HBase Metrics framework (see <<hbase_metrics>> or the Web UI for a given Region Server, via the _Coprocessor Metrics_ tab.
These statistics are valuable for debugging and benchmarking the performance impact of a given coprocessor on your cluster.
Tracked statistics include min, max, average, and 90th, 95th, and 99th percentile.
All times are shown in milliseconds.
The statistics are calculated over coprocessor execution samples recorded during the reporting interval, which is 10 seconds by default.
The metrics sampling rate as described in <<hbase_metrics>>.
.Coprocessor Metrics UI
image::coprocessor_stats.png[]