Thrift Remote Procedure Call

This document describes the high-level message exchange between the Thrift RPC client and server. See [thrift-binary-protocol.md] and [thrift-compact-protocol.md] for a description of how the exchanges are encoded on the wire.

In addition, this document compares the binary protocol with the compact protocol. Finally, it describes the framed vs. unframed transport.

The information here is mostly based on the Java implementation in the Apache thrift library (version 0.9.1 and 0.9.3). Other implementation, however, should behave the same.

For background on Thrift see the Thrift whitepaper (pdf).

Contents

  • Thrift Message exchange for Remote Procedure Call
    • Message
    • Request struct
    • Response struct
  • Protocol considerations
    • Comparing binary and compact protocol
    • Compatibility
    • Framed vs unframed transport

Thrift Remote Procedure Call Message exchange

Both the binary protocol and the compact protocol assume a transport layer that exposes a bi-directional byte stream, for example a TCP socket. Both use the following exchange:

  1. Client sends a Message (type Call or Oneway). The TMessage contains some metadata and the name of the method to invoke.
  2. Client sends method arguments (a struct defined by the generate code).
  3. Server sends a Message (type Reply or Exception) to start the response.
  4. Server sends a struct containing the method result or exception.

The pattern is a simple half duplex protocol where the parties alternate in sending a Message followed by a struct. What these are is described below.

Although the standard Apache Thrift Java clients do not support pipelining (sending multiple requests without waiting for an response), the standard Apache Thrift Java servers do support it.

Message

A Message contains:

  • Name, a string (can be empty).
  • Message type, a message types, one of Call, Reply, Exception and Oneway.
  • Sequence id, a signed int32 integer.

The sequence id is a simple message id assigned by the client. The server will use the same sequence id in the message of the response. The client uses this number to detect out of order responses. Each client has an int32 field which is increased for each message. The sequence id simply wraps around when it overflows.

The name indicates the service method name to invoke. The server copies the name in the response message.

When the multiplexed protocol is used, the name contains the service name, a colon : and the method name. The multiplexed protocol is not compatible with other protocols.

The message type indicates what kind of message is sent. Clients send requests with TMessages of type Call or Oneway (step 1 in the protocol exchange). Servers send responses with messages of type Exception or Reply (step 3).

Type Reply is used when the service method completes normally. That is, it returns a value or it throws one of the exceptions defined in the Thrift IDL file.

Type Exception is used for other exceptions. That is: when the service method throws an exception that is not declared in the Thrift IDL file, or some other part of the Thrift stack throws an exception. For example when the server could not encode or decode a message or struct.

In the Java implementation (0.9.3) there is different behavior for the synchronous and asynchronous server. In the async server all exceptions are sent as a TApplicationException (see ‘Response struct’ below). In the synchronous Java implementation only (undeclared) exceptions that extend TException are send as a TApplicationException. Unchecked exceptions lead to an immediate close of the connection.

Type Oneway is only used starting from Apache Thrift 0.9.3. Earlier versions do not send TMessages of type Oneway, even for service methods defined with the oneway modifier.

When the client sends a request with type Oneway, the server must not send a response (steps 3 and 4 are skipped). Note that the Thrift IDL enforces a return type of void and does not allow exceptions for oneway services.

Request struct

The struct that follows the message of type Call or Oneway contains the arguments of the service method. The argument ids correspond to the field ids. The name of the struct is the name of the method with _args appended. For methods without arguments an struct is sent without fields.

Response struct

The struct that follows the message of type Reply are structs in which exactly 1 of the following fields is encoded:

  • A field with name success and id 0, used in case the method completed normally.
  • An exception field, name and id are as defined in the throws clause in the Thrift IDL's service method definition.

When the message is of type Exception the struct is encoded as if it was declared by the following IDL:

exception TApplicationException {
  1: string message,
  2: i32 type
}

The following exception types are defined in the java implementation (0.9.3):

  • unknown: 0, used in case the type from the peer is unknown.
  • unknown method: 1, used in case the method requested by the client is unknown by the server.
  • invalid message type: 2, no usage was found.
  • wrong method name: 3, no usage was found.
  • bad sequence id: 4, used internally by the client to indicate a wrong sequence id in the response.
  • missing result: 5, used internally by the client to indicate a response without any field (result nor exception).
  • internal error: 6, used when the server throws an exception that is not declared in the Thrift IDL file.
  • protocol error: 7, used when something goes wrong during decoding. For example when a list is too long or a required field is missing.
  • invalid transform: 8, no usage was found.
  • invalid protocol: 9, no usage was found.
  • unsupported client type: 10, no usage was found.

Protocol considerations

Comparing binary and compact protocol

The binary protocol is fairly simple and therefore easy to process. The compact protocol needs less bytes to send the same data at the cost of additional processing. As bandwidth is usually the bottleneck, the compact protocol is almost always slightly faster.

Compatibility

A server could automatically determine whether a client talks the binary protocol or the compact protocol by investigating the first byte. If the value is 1000 0001 or 0000 0000 (assuming a name shorter than ±16 MB) it is the binary protocol. When the value is 1000 0010 it is talking the compact protocol.

Framed vs. unframed transport

The first thrift binary wire format was unframed. This means that information is sent out in a single stream of bytes. With unframed transport the (generated) processors will read directly from the socket (though Apache Thrift does try to grab all available bytes from the socket in a buffer when it can).

Later, Thrift introduced the framed transport.

With framed transport the full request and response (the TMessage and the following struct) are first written to a buffer. Then when the struct is complete (transport method flush is hijacked for this), the length of the buffer is written to the socket first, followed by the buffered bytes. The combination is called a frame. On the receiver side the complete frame is first read in a buffer before the message is passed to a processor.

The length prefix is a 4 byte signed int, send in network (big endian) order. The following must be true: 0 <= length <= 16384000 (16M).

Framed transport was introduced to ease the implementation of async processors. An async processor is only invoked when all data is received. Unfortunately, framed transport is not ideal for large messages as the entire frame stays in memory until the message has been processed. In addition, the java implementation merges the incoming data to a single, growing byte array. Every time the byte array is full it needs to be copied to a new larger byte array.

Framed and unframed transports are not compatible with each other.