Welcome to the HBase kafka proxy. The purpose of this proxy is to act as a fake peer. It receives replication events from a peer cluster and applies a set of rules (stored in a kafka-route-rules.xml file) to determine if the event should be forwarded to a kafka topic. If the mutation matches a rule, the mutation is converted to an avro object and the item is placed into the topic.
The service sets up a bare bones RegionServer, so it will use the values in any hbase-site.xml it finds on CLASSPATH. If you wish to override those values, pass them as properties on the command line; i.e -Dkey=value
.
hbase
command is in your path. The proxy runs hbase classpath
to find hbase libraries.Each rule has the following parameters:
The qualifier parameter can contain simple wildcard expressions (start and end only).
<rules> <rule action="route" table="default:mytable" topic="foo" /> </rules>
This causes all mutations to default:mytable
to be routed to the kafka topic foo
.
<rules> <rule action="route" table="default:mytable" columnFamily="mycf" qualifier="myqualifier" topic="mykafkatopic"/> </rules>
This will cause all mutations to default:mytable
columnFamily mycf
and qualifier myqualifier
to be routed to mykafkatopic
.
<rules> <rule action="drop" table="default:mytable" columnFamily="mycf" qualifier="secret*"/> <rule action="route" table="default:mytable" columnFamily="mycf" topic="mykafkatopic"/> </rules>
This combination will route all mutations from default:mytable
columnFamily mycf
to mykafkatopic
unless they start with 'secret'
. Items matching that rule will be dropped. The way the rule is written, all other mutations for column family mycf
will be routed to the mykafka
topic.
hbase.replication=true
.table
and column family is cf
in the following example:disable 'table' alter 'table', {NAME => 'cf', REPLICATION_SCOPE => 1} enable 'table'
--kafkabrokers (or -b) <kafka brokers (comma delmited)> --routerulesfile (or -r) <file with rules to route to kafka (defaults to kafka-route-rules.xml)> --kafkaproperties (or -f) <Path to properties file that has the kafka connection properties> --peername (or -p) name of hbase peer to use (defaults to hbasekafka) --znode (or -z) root znode (defaults to /kafkaproxy) --enablepeer (or -e) enable peer on startup (defaults to false)] --auto (or -a) auto create peer
hbase
command is in your path.-r
argument.For example:
$ bin/hbase-connectors-daemon.sh start kafkaproxy -a -e -p <peer> -b <kafka.address>:<kafka.port>
This:
-a
so proxy will create the replication peer specified by -p
if it does not exist (not required, but it saves some busy work).-e
) when the service starts (not required, you can manually enable the peer in the shell).hbase-site.xml
by default. You can override this by passing -Dhbase.zookeeper.quorum
.Messages are in avro format, this is the schema:
"type": "record", "name": "HBaseKafkaEvent", "fields": [ {"name": "key", "type": "bytes"}, {"name": "timestamp", "type": "long" }, {"name": "delete", "type": "boolean" }, {"name": "value", "type": "bytes"}, {"name": "qualifier", "type": "bytes"}, {"name": "family", "type": "bytes"}, {"name": "table", "type": "bytes"} ] }
Any language that supports Avro should be able to consume the messages off the topic.
A utility is included to test the routing rules.
$ bin/hbase-connectors-daemon.sh start kafkaproxytest -k <kafka.broker> -t <topic to listen to>
The messages will be dumped in string format under logs/
.