| //// |
| /** |
| * |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| //// |
| |
| [[protobuf]] |
| = Protobuf in HBase |
| :doctype: book |
| :numbered: |
| :toc: left |
| :icons: font |
| :experimental: |
| |
| |
| == Protobuf |
| HBase uses Google's link:https://developers.google.com/protocol-buffers/[protobufs] wherever |
| it persists metadata -- in the tail of hfiles or Cells written by |
| HBase into the system hbase:meta table or when HBase writes znodes |
| to zookeeper, etc. -- and when it passes objects over the wire making |
| xref:hbase.rpc[RPCs]. HBase uses protobufs to describe the RPC |
| Interfaces (Services) we expose to clients, for example the `Admin` and `Client` |
| Interfaces that the RegionServer fields, |
| or specifying the arbitrary extensions added by developers via our |
| xref:cp[Coprocessor Endpoint] mechanism. |
| |
| In this chapter we go into detail for developers who are looking to |
| understand better how it all works. This chapter is of particular |
| use to those who would amend or extend HBase functionality. |
| |
| With protobuf, you describe serializations and services in a `.protos` file. |
| You then feed these descriptors to a protobuf tool, the `protoc` binary, |
| to generate classes that can marshall and unmarshall the described serializations |
| and field the specified Services. |
| |
| See the `README.txt` in the HBase sub-modules for details on how |
| to run the class generation on a per-module basis; |
| e.g. see `hbase-protocol/README.txt` for how to generate protobuf classes |
| in the hbase-protocol module. |
| |
| In HBase, `.proto` files are either in the `hbase-protocol` module; a module |
| dedicated to hosting the common proto files and the protoc generated classes |
| that HBase uses internally serializing metadata. For extensions to hbase |
| such as REST or Coprocessor Endpoints that need their own descriptors; their |
| protos are located inside the function's hosting module: e.g. `hbase-rest` |
| is home to the REST proto files and the `hbase-rsgroup` table grouping |
| Coprocessor Endpoint has all protos that have to do with table grouping. |
| |
| Protos are hosted by the module that makes use of them. While |
| this makes it so generation of protobuf classes is distributed, done |
| per module, we do it this way so modules encapsulate all to do with |
| the functionality they bring to hbase. |
| |
| Extensions whether REST or Coprocessor Endpoints will make use |
| of core HBase protos found back in the hbase-protocol module. They'll |
| use these core protos when they want to serialize a Cell or a Put or |
| refer to a particular node via ServerName, etc., as part of providing the |
| CPEP Service. Going forward, after the release of hbase-2.0.0, this |
| practice needs to whither. We'll explain why in the later |
| xref:shaded.protobuf[hbase-2.0.0] section. |
| |
| [[shaded.protobuf]] |
| === hbase-2.0.0 and the shading of protobufs (HBASE-15638) |
| |
| As of hbase-2.0.0, our protobuf usage gets a little more involved. HBase |
| core protobuf references are offset so as to refer to a private, |
| bundled protobuf. Core stops referring to protobuf |
| classes at com.google.protobuf.* and instead references protobuf at |
| the HBase-specific offset |
| org.apache.hadoop.hbase.shaded.com.google.protobuf.*. We do this indirection |
| so hbase core can evolve its protobuf version independent of whatever our |
| dependencies rely on. For instance, HDFS serializes using protobuf. |
| HDFS is on our CLASSPATH. Without the above described indirection, our |
| protobuf versions would have to align. HBase would be stuck |
| on the HDFS protobuf version until HDFS decided to upgrade. HBase |
| and HDFS versions would be tied. |
| |
| We had to move on from protobuf-2.5.0 because we need facilities |
| added in protobuf-3.1.0; in particular being able to save on |
| copies and avoiding bringing protobufs onheap for |
| serialization/deserialization. |
| |
| In hbase-2.0.0, we introduced a new module, `hbase-protocol-shaded` |
| inside which we contained all to do with protobuf and its subsequent |
| relocation/shading. This module is in essence a copy of much of the old |
| `hbase-protocol` but with an extra shading/relocation step. |
| Core was moved to depend on this new module. |
| |
| That said, a complication arises around Coprocessor Endpoints (CPEPs). |
| CPEPs depend on public HBase APIs that reference protobuf classes at |
| `com.google.protobuf.*` explicitly. For example, in our Table Interface |
| we have the below as the means by which you obtain a CPEP Service |
| to make invocations against: |
| |
| [source,java] |
| ---- |
| ... |
| <T extends com.google.protobuf.Service,R> Map<byte[],R> coprocessorService( |
| Class<T> service, byte[] startKey, byte[] endKey, |
| org.apache.hadoop.hbase.client.coprocessor.Batch.Call<T,R> callable) |
| throws com.google.protobuf.ServiceException, Throwable |
| ---- |
| |
| Existing CPEPs will have made reference to core HBase protobufs |
| specifying ServerNames or carrying Mutations. |
| So as to continue being able to service CPEPs and their references |
| to `com.google.protobuf.*` across the upgrade to hbase-2.0.0 and beyond, |
| HBase needs to be able to deal with both |
| `com.google.protobuf.*` references and its internal offset |
| `org.apache.hadoop.hbase.shaded.com.google.protobuf.*` protobufs. |
| |
| The `hbase-protocol-shaded` module hosts all |
| protobufs used by HBase core. |
| |
| But for the vestigial CPEP references to the (non-shaded) content of |
| `hbase-protocol`, we keep around most of this module going forward |
| just so it is available to CPEPs. Retaining the most of `hbase-protocol` |
| makes for overlapping, 'duplicated' proto instances where some exist as |
| non-shaded/non-relocated here in their old module |
| location but also in the new location, shaded under |
| `hbase-protocol-shaded`. In other words, there is an instance |
| of the generated protobuf class |
| `org.apache.hadoop.hbase.protobuf.generated.ServerName` |
| in hbase-protocol and another generated instance that is the same in all |
| regards except its protobuf references are to the internal shaded |
| version at `org.apache.hadoop.hbase.shaded.protobuf.generated.ServerName` |
| (note the 'shaded' addition in the middle of the package name). |
| |
| If you extend a proto in `hbase-protocol-shaded` for internal use, |
| consider extending it also in |
| `hbase-protocol` (and regenerating). |
| |
| Going forward, we will provide a new module of common types for use |
| by CPEPs that will have the same guarantees against change as does our |
| public API. TODO. |
| |
| === protobuf changes for hbase-3.0.0 (HBASE-23797) |
| Since hadoop(start from 3.3.x) also shades protobuf and bumps the version to |
| 3.x, there is no reason for us to stay on protobuf 2.5.0 any more. |
| |
| In HBase 3.0.0, the hbase-protocol module has been purged, the CPEP |
| implementation should use the protos in hbase-protocol-shaded module, and also |
| make use of the shaded protobuf in hbase-thirdparty. In general, we will keep |
| the protobuf version compatible for a whole major release, unless there are |
| critical problems, for example, a critical CVE on protobuf. |
| |
| Add this dependency to your pom: |
| [source,xml] |
| ---- |
| <dependency> |
| <groupId>org.apache.hbase.thirdparty</groupId> |
| <artifactId>hbase-shaded-protobuf</artifactId> |
| <!-- use the version that your target hbase cluster uses --> |
| <version>${hbase-thirdparty.version}</version> |
| <scope>provided</scope> |
| </dependency> |
| ---- |
| |
| And typically you also need to add this plugin to your pom to make your |
| generated protobuf code also use the shaded and relocated protobuf version |
| in hbase-thirdparty. |
| [source,xml] |
| ---- |
| <plugin> |
| <groupId>com.google.code.maven-replacer-plugin</groupId> |
| <artifactId>replacer</artifactId> |
| <version>1.5.3</version> |
| <executions> |
| <execution> |
| <phase>process-sources</phase> |
| <goals> |
| <goal>replace</goal> |
| </goals> |
| </execution> |
| </executions> |
| <configuration> |
| <basedir>${basedir}/target/generated-sources/</basedir> |
| <includes> |
| <include>**/*.java</include> |
| </includes> |
| <!-- Ignore errors when missing files, because it means this build |
| was run with -Dprotoc.skip and there is no -Dreplacer.skip --> |
| <ignoreErrors>true</ignoreErrors> |
| <replacements> |
| <replacement> |
| <token>([^\.])com.google.protobuf</token> |
| <value>$1org.apache.hbase.thirdparty.com.google.protobuf</value> |
| </replacement> |
| <replacement> |
| <token>(public)(\W+static)?(\W+final)?(\W+class)</token> |
| <value>@javax.annotation.Generated("proto") $1$2$3$4</value> |
| </replacement> |
| <!-- replacer doesn't support anchoring or negative lookbehind --> |
| <replacement> |
| <token>(@javax.annotation.Generated\("proto"\) ){2}</token> |
| <value>$1</value> |
| </replacement> |
| </replacements> |
| </configuration> |
| </plugin> |
| ---- |
| |
| In hbase-examples module, we have some examples under the |
| `org.apache.hadoop.hbase.coprocessor.example` package. You can see |
| `BulkDeleteEndpoint` and `BulkDelete.proto` for more details, and you can also |
| check the `pom.xml` of hbase-examples module to see how to make use of the above |
| plugin. |