Notices For Client Library Development

Currently C++ and Java are supported to visit Pegasus cluster. You may want to refer to C++ documentations or Java documentation for details.

Here we give some notices on the client library, which may be helpful to development of other language bindings.

message protocol

request to server

Message format sent to some single server of Pegasus Cluster:

(A special header of 48 bytes) + (body)

header

The 48-byte header consists of:

bytestypecomments
0~3“THFT”(header_type)
4~732bit intheader version
8~1132bit intheader_length
12~1532bit intheader_crc32
16~1932bit intbody_length
20~2332bit intbody_crc32
24~2732bit intapp_id (rDSN related concept)
28~3132bit intpartition_index (rDSN related concept)
32~3532bit intclient_timeout
36~3932bit intthread_hash (rDSN related concept)
40~4764bit longpartition_hash (rDSN related concept)

Some notes on the above “header”:

  • all ints/long in the header are in network order.
  • if send request to meta server, the app_id & partition_index should set to 0; if request is to replica server, the fields should be set to the target replica.
  • pegasus server may use client_timeout to do some optimization. Say, a server may simply discard a request if it has expired when server receives it
  • thread_hash = app_id * 7919 + partition_index, it is used for server to decide in which thread to queue the request.
  • partition_hash should set to 0, it's only a useful field for RPC client of rDSN framework.

body

the body is a standard thrift struct in binary protocol:

TMessageBegin + args + TMessageEnd.

You should write a thrift “TMessage” in TMessageBegin, the structure of TMessage is:

  • name: the RPC name (please refer to Java client for detail)
  • type: TMessage.CALL
  • seqid: the seqid int

response from server

response got from pegasus server:

total_response_length(4 bytes) + error_code_thrift_struct + response_body

Some notes on the above response:

  • error_code_thrift_struct is an error_code struct in thrift binary protocol, usually it indicates some error of service status than the response of some specific rpc call. For example, for a meta server, this error may indicate “the meta server is not leader”; for a read/write request to replica server, this error may indicate that the replica server is not a primary or don't serve the partition"
  • response_body is a standard thrift rpc response of the rpc call, with the format as follows:
    • TMessageBegin: rpc_name, TMessage.T_REPLY, seqid_integer
    • response_args
    • TMessageEnd

write/read request process

You can refer to TableHandler.java for the detailed RPC process in write/read request RPCs.

how to generate code in thrift

There are 3 IDL files for RPC client:

  • base.thrift: a placeholder for rDSN specific structures(blob, error_code, task_code, RPC_address, gpid), you may use thrift to generate a sketch, and implement the details all by yourself.
  • replication.thrift: messages and RPCs used for communicate with meta server. Using generated code is ok.
  • rrdb.thrift: messages and RPCs used for communicate with replica server. Using generated code is ok.

Due to some history reasons, RPC names defined in the IDL can't be recognized by server right now. A proper name should be set manually, please refer to operators for details. Besides, you may also need to refer to base for how to implement serialization for rDSN specific structures.