name: Opentracing support about: Adopt industry standard distributed tracing solution title: ‘Opentracing support’ labels: rfc, discussion assignees: ''
Adopt an industry standard vendor-neutral APIs and instrumentation for distributed tracing.
Collecting profiling data is very tricky at the moment. Developers have to run generic profiling tools which are not aware of CouchDB specifics. This makes it hard to do the performance optimization work. We need a tool which would allow us to get profiling data from specific points in the codebase. This means code instrumentation.
There is an https://opentracing.io/ project, which is a vendor-neutral API and instrumentation for distributed tracing. In Erlang it is implemented by one of the following libraries:
otter
otter
version donated to opentracing project.The opentracing philosophy is founded on three pillars:
The main addition is to include one of the above mentioned libraries and add instrumentation points into the codebase. In initial implementation, there would be a new span started on every HTTP request. The following HTTP headers would be used to link tracing span with application specific traces.
More information about the use of these headers can be found here. Open tracing specification has a number of conventions which would be good to follow.
In a nutshell the idea is:
span_start
call.span_start
call.span_start
from chttpd:handle_request_int/1
.#httpd{}
recordtrace_id
and parent_span_id
through the stack (extend records if needed)The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
Span Tags
.Span Logs
).SpanContext
References
to zero or more causally-related Spans
As mentioned earlier, there are two flavours of libraries. None of them is perfect for all use cases. The biggest differences in between otters
and passage
are:
otters | passage | |
---|---|---|
reporting protocol | http | udp |
filtering | custom DSL | sampling callback module |
reporter | zipkin only | jaeger or plugin |
functional API | + | + |
process dictionary | + | + |
process based span storage | + | - |
send event in batches | + | - |
sender overload detection | - | + |
report batches based on | timer | spans of single operation |
design for performance | + | - |
design for robustness at scale | - | + |
counters | + | - |
sampling based on duration | + | - |
number of extra dependencies | 1 | 3 |
In order to allow future replacement of a tracing library it would be desirable to create an interface module couch_trace
. The otters
library would be used for the first iteration.
The otters
library uses application environment to store its configuration. It also has a facility to compile filtering DSL into a beam module. The filtering DSL looks like following: <name>([<condition>]) -> <action>.
. The safety of DSL compiler is unknown. Therefore a modification of tracing settings via configuration over HTTP wouldn't be possible. The otter related section of the config tracing.filters
would be protected by BLACKLIST_CONFIG_SECTIONS. The configuration of tracing would only be allowed from remsh or modification of the ini file. The configuration for otter filters would be stored in couch_config as follows:
[tracing.filters] <name> = ([<condition>]) -> <action>.
Following headers on the request would be supported
SamplingState
would be ignoredFollowing headers on the response would be supported
The conventions bellow are based on conventions from opentracing. All tags are optional since it is just a recomendation from open tracing to hint visualization and filtering tools.
Span tag name | Type | Notes and examples |
---|---|---|
component | string | couchdb. (e.g. couchdb.chttpd, couchdb.fabric) |
db.instance | string | for fdb-layer would be fdb connection string |
db.type | string | for fdb-layer would be fdb |
error | bool | true if operation failed |
http.method | string | HTTP method of the request for the associated Span |
http.status_code | integer | HTTP response status code for the associated Span |
http.url | string | sanitized URL of the request in URI format |
span.kind | string | Either client or server (RPC roles). |
user | string | Authenticated user name |
db.name | string | Name of the accessed database |
db.shard | string | Name of the accessed shard |
nonce | string | Nonce used for the request |
Span log field name | Type | Notes and examples |
---|---|---|
error.kind | string | The “kind” of an error (error, exit, throw) |
message | string | human-readable, one-line message |
stack | string | A stack trace (\n between lines) |
CouchDB has complex architecture. The request handling crosses layers' and components' boundaries. Every component or layer would start a new span. It MUST specify its parent span in order for visualization tools to work. The value of a TraceId MUST be included in every span start. The value of TraceId and SpanId MAY be passed to FDB when foundationdb#2085 is resolved.
otters_conn_zipkin:send_buffer/0
to make it more robustotters_conn_zipkin
from thrift
to gRPC
Specifically for otters
library there are following concerns:
Support for following headers would be added:
N/A
The security risk of injecting malicious payload into ini config is mitigated via placing the section into BLACKLIST_CONFIG_SECTIONS.