blob: 7685095b9a71a333f78571e8172c0d5c4edb9e6a [file] [log] [blame]
Hbase Kafka Proxy
Welcome to the hbase kafka proxy. The purpose of this proxy is to act as a 'fake peer'. It
receives replication events from it's peer and applies a set of rules (stored in
kafka-route-rules.xml) to determine if the event should be forwared to a topic. If the
mutation matches one of the rules, the mutation is converted to an avro object and the
item is placed into the topic.
The service sets up a bare bones region server, so it will use the values in hbase-site.xml. If
you wish to override those values, pass them as -Dkey=value.
To Use:
1. Make sure the hbase command is in your path. The proxy uses the 'hbase classpath' command to
find the hbase libraries.
2. Create any topics in your kafka broker that you wish to use.
3. set up kafka-route-rules.xml. This file controls how the mutations are routed. There are
two kinds of rules: route and drop. drop: any mutation that matches this rule will be dropped.
route: any mutation that matches this rule will be routed to the configured topic.
Each rule has the following parameters:
- table
- columnFamily
- qualifier
The qualifier parameter can contain simple wildcard expressions (start and end only).
Examples
<rules>
<rule action="route" table="default:mytable" topic="foo" />
</rules>
This causes all mutations done to default:mytable to be routed to kafka topic 'foo'
<rules>
<rule action="route" table="default:mytable" columnFamily="mycf" qualifier="myqualifier"
topic="mykafkatopic"/>
</rules>
This will cause all mutations from default:mytable columnFamily mycf and qualifier myqualifier
to be routed to mykafkatopic.
<rules>
<rule action="drop" table="default:mytable" columnFamily="mycf" qualifier="secret*"/>
<rule action="route" table="default:mytable" columnFamily="mycf" topic="mykafkatopic"/>
</rules>
This combination will route all mutations from default:mytable columnFamily mycf to mykafkatopic
unless they start with 'secret'. Items matching that rule will be dropped. The way the rule is
written, all other mutations for column family mycf will be routed to the 'mykafka' topic.
4. Service arguments
--kafkabrokers (or -b) <kafka brokers (comma delmited)>
--routerulesfile (or -r) <file with rules to route to kafka (defaults to kafka-route-rules.xml)>
--kafkaproperties (or -f) <Path to properties file that has the kafka connection properties>
--peername (or -p) name of hbase peer to use (defaults to hbasekafka)
--znode (or -z) root znode (defaults to /kafkaproxy)
--enablepeer (or -e) enable peer on startup (defaults to false)]
--auto (or -a) auto create peer
5. start the service.
- make sure the hbase command is in your path
- ny default, the service looks for route-rules.xml in the conf directory, you can specify a
differeent file or location with the -r argument
bin/hbase-connectors-daemon.sh start kafkaproxy -a -e -p wootman -b localhost:9092 -r ~/kafka-route-rules.xml
this:
- starts the kafka proxy
- passes the -a. The proxy will create the replication peer specified by -p if it does not exist
(not required, but it savecs some busy work).
- enables the peer (-e) the proxy will enable the peer when the service starts (not required, you can
manually enable the peer in the hbase shell)
Notes:
1. The proxy will connect to the zookeeper in hbase-site.xml by default. You can override this by
passing -Dhbase.zookeeper.quorum
bin/hbase-connectors-daemon.sh start kafkaproxy -Dhbase.zookeeper.quorum=localhost:1234 ..... other args ....
2. route rules only support unicode characters.
3. I do not have access to a secured hadoop clsuter to test this on.
Message format
Messages are in avro format, this is the schema:
{"namespace": "org.apache.hadoop.hbase.kafka",
"type": "record",
"name": "HBaseKafkaEvent",
"fields": [
{"name": "key", "type": "bytes"},
{"name": "timestamp", "type": "long" },
{"name": "delete", "type": "boolean" },
{"name": "value", "type": "bytes"},
{"name": "qualifier", "type": "bytes"},
{"name": "family", "type": "bytes"},
{"name": "table", "type": "bytes"}
]
}
Any language that supports Avro should be able to consume the messages off the topic.
Testing Utility
A utility is included to test the routing rules.
bin/hbase-connectors-daemon.sh start kafkaproxytest -k <kafka.broker> -t <topic to listen to>
the messages will be dumped in string format under logs/
TODO:
1. Some properties passed into the region server are hard-coded.
2. The avro objects should be generic
3. Allow rules to be refreshed without a restart
4. Get this tested on a secure (TLS & Kerberos) enabled cluster.