This is an advanced topic. Before reading further, you may want to check whether an instrumentation library for your platform [already exists]({{ site.github.url }}/pages/existing_instrumentations). If not and if you want to take on creating an instrumentation library, first things first; jump on Zipkin Gitter chat channel and let us know. We'll be extremely happy to help you along the way. {: .message}
To instrument a library, you'll need to understand and create the following elements:
Alright, ready? Here we go.
First, there are a core set of structures that we need:
Annotation
An Annotation is used to record an occurance in time. There's a set of core annotations used to define the beginning and end of a request:
cs
will be combination of network latency and clock jitter.ss
will be the amount of time it took the server to process the request.Other annotations can be recorded during the request's lifetime in order to provide further insight. For instance adding an annotation when a server begins and ends an expensive computation may provide insight into how much time is being spent pre and post processing the request versus how much time is spent running the calculation.
BinaryAnnotation
Binary annotations do not have a time component. They are meant to provide extra information about the RPC. For instance when calling an HTTP service, providing the URI of the call will help with later analysis of requests coming into the service.
Span
A set of Annotations and BinaryAnnotations that correspond to a particular RPC. Spans contain identifying information such as traceId, spandId, parentId, and RPC name.
Trace
A set of spans that share a single root span. Traces are built by collecting all Spans that share a traceId. The spans are then arranged in a tree based on spanId and parentId thus providing an overview of the path a request takes through the system.
In order to reassemble a set of spans into a full trace three pieces of information are required. These are all 64 bits long.
Trace Id
The overall ID of the trace. Every span in a trace will share this ID.
Span Id
The ID for a particular span. This may or may not be the same as the trace id.
Parent Id
This is an optional ID that will only be present on child spans. That is the span without a parent id is considered the root of the trace.
Let's walk through how Spans are identified.
For the initial receipt of a request no trace information exists. So we create a trace id and span id. These should be 64 random bits. The span id can be the same as the trace id.
If the request already has trace information attached to it, the service should use that information as server receive and server send events are part of the same span as the client send and client receive events
If the service calls out to a downstream service a new span is created as a child of the former span. It is identified by the same trace id, a new span id, and the parent id is set to the span id of the previous span. The new span id should be 64 random bits.
Note This process must be repeated if the service makes multiple downstream calls. That is each subsequent span will have the same trace id and parent id, but a new and different span id.
Trace information needs to be passed between upstream and downstream services in order to reassemble a complete trace. Five pieces of information are required:
Check here for the format
Finagle provides mechanisms for passing this information with HTTP and Thrift requests. Other protocols will need to be augmented with the information for tracing to be effective.
HTTP Tracing
HTTP headers are used to pass along trace information.
The B3 portion of the header is so named for the original name of Zipkin: BigBrotherBird.
Ids are encoded as hex strings:
Thrift Tracing
Finagle clients and servers negotate whether they can handle extra information in the header of the thrift message when a connection is established. Once negotiated trace data is packed into the front of each thrift message.