blob: b9e7e72475c14c08a2a17fe47cffef9a7d79e5c5 [file] [log] [blame]
{"name":"Htrace","tagline":"","body":"HTrace\r\n======\r\nHTrace is a tracing framework intended for use with distributed systems written in java. \r\n\r\nThe project is hosted at http://github.com/cloudera/htrace. \r\nThe project is available in Maven Central with groupId: org.htrace, and name: htrace. \r\n(It was formally at groupId: org.cloudera.htrace, and name: htrace). \r\n\r\nAPI\r\n---\r\nUsing HTrace requires some instrumentation to your application. \r\nBefore we get into that we have to review our terminology. HTrace\r\nborrows [Dapper's](http://research.google.com/pubs/pub36356.html)\r\nterminology. \r\n \r\n<b>Span:</b> The basic unit of work. For example, sending an RPC is a\r\nnew span, as is sending a response to an RPC. \r\nSpan's are identified by a unique 64-bit ID for the span and another\r\n64-bit ID for the trace the span is a part of. Spans also have other\r\ndata, such as descriptions, key-value annotations, the ID of the span\r\nthat caused them, and process ID's (normally IP address). \r\n<br>\r\nSpans are started and stopped, and they keep track of their timing\r\ninformation. Once you create a span, you must stop it at some point\r\nin the future. \r\n \r\n<b>Trace:</b> A set of spans forming a tree-like structure. For\r\nexample, if you are running a distributed big-data store, a trace\r\nmight be formed by a put request. \r\n\r\nTo instrument your system you must: \r\n<br>\r\n<b>1. Attach additional information to your RPC's.</b> \r\nIn order to create the causal links necessary for a trace, HTrace\r\nneeds to know about the causal\r\nrelationships between spans. The only information you need to add to\r\nyour RPC's is two 64-bit longs. If tracing is enabled (Trace.isTracing()\r\nreturns true) when you send an RPC, attach the ID of the current span\r\nand the ID of the current trace to the message. \r\nOn the receiving end of the RPC, check to see if the message has the\r\nadditional tracing information above. If it does, start a new span\r\nwith the information given (more on that in a bit). \r\n<br>\r\n<b>2. Wrap your thread changes.</b> \r\nHTrace stores span information in java's ThreadLocals, which causes\r\nthe trace to be \"lost\" on thread changes. The only way to prevent\r\nthis is to \"wrap\" your thread changes. For example, if your code looks\r\nlike this:\r\n\r\n````java\r\n Thread t1 = new Thread(new MyRunnable());\r\n ... \r\n````\r\n\r\nJust change it to look this: \r\n\r\n````java\r\n Thread t1 = new Thread(Trace.wrap(new MyRunnable()));\r\n````\r\n\r\nThat's it! `Trace.wrap()` takes a single argument (a runnable or a\r\ncallable) and if the current thread is a part of a trace, returns a\r\nwrapped version of the argument. The wrapped version of a callable\r\nand runnable just knows about the span that created it and will start\r\na new span in the new thread that is the child of the span that\r\ncreated the runnable/callable. There may be situations in which a\r\nsimple `Trace.wrap()` does not suffice. In these cases all you need\r\nto do is keep a reference to the \"parent span\" (the span before the\r\nthread change) and once you're in the new thread start a new span that\r\nis the \"child\" of the parent span you stored. \r\n<br>\r\nFor example: \r\n<br>\r\nSay you have some object representing a \"put\" operation. When the\r\nclient does a \"put,\" the put is first added to a list so another\r\nthread can batch together the puts. In this situation, you\r\nmight want to add another field to the Put class that could store the\r\ncurrent span at the time the put was created. Then when the put is\r\npulled out of the list to be processed, you can start a new span as\r\nthe child of the span stored in the Put. \r\n<br>\r\n<b>3. Add custom spans and annotations.</b> \r\nOnce you've augmented your RPC's and wrapped the necessary thread\r\nchanges, you can add more spans and annotations wherever you want. \r\nFor example, you might do some expensive computation that you want to\r\nsee on your traces. In this case, you could start a new span before\r\nthe computation that you then stop after the computation has\r\nfinished. It might look like this: \r\n\r\n````java\r\n Span computationSpan = Trace.startSpan(\"Expensive computation.\"); \r\n try { \r\n //expensive computation here \r\n } finally { \r\n computationSpan.stop(); \r\n } \r\n````\r\n\r\nHTrace also supports key-value annotations on a per-trace basis. \r\n<br>\r\nExample:\r\n\r\n````java\r\n Trace.currentTrace().addAnnotation(\"faultyRecordCounter\".getBytes(), \"1\".getBytes());\r\n````\r\n\r\n`Trace.currentTrace()` will not return `null` if the current thread is\r\nnot tracing, but instead it will return a `NullSpan`, which does\r\nnothing on any of its method calls. The takeaway here is you can call\r\nmethods on the `currentTrace()` without fear of NullPointerExceptions.\r\n\r\n###Samplers \r\n`Sampler` is an interface that defines one function: \r\n\r\n````java\r\n boolean next(T info);\r\n````\r\n\r\nAll of the `Trace.startSpan()` methods can take an optional sampler. \r\nA new span is only created if the sampler's next function returns\r\ntrue. If the Sampler returns false, the `NullSpan` is returned from\r\n`startSpan()`, so it's safe to call `stop()` or `addAnnotation()` on it.\r\nAs you may have noticed from the `next()` method signature, Sampler is\r\nparameterized. The argument to `next()` is whatever piece of\r\ninformation you might need for sampling. See `Sampler.java` for an\r\nexample of this. If you do not require any additional information,\r\nthen just ignore the parameter. \r\nHTrace includes a sampler that always returns true, a\r\nsampler that always returns false and a sampler returns true some\r\npercentage of the time (you pass in the percentage as a decimal at construction). \r\n\r\n###`Trace.startSpan()` \r\nThere is a single method to create and start spans: `startSpan()`. \r\nFor the `startSpan()` methods that do not take an explicit Sampler, the\r\ndefault Sampler is used. The default sampler returns true if and only\r\nif tracing is already on in the current thread. That means that\r\ncalling `startSpan()` with no explicit Sampler is a good idea when you\r\nhave information that you would like to add to a trace if it's already\r\noccurring, but is not something you would want to start a whole new\r\ntrace for. \r\n<br>\r\nIf you are using a sampler that makes use of the `T info` parameter to\r\n`next()`, just pass in the object as the last argument. If you leave it\r\nout, HTrace will pass `null` for you (so make sure your Samplers can\r\nhandle `null`). \r\n<br>\r\nAside from whether or not you pass in an explicit `Sampler`, there are\r\nother options you have when calling `startSpan()`. \r\nFor the next section I am assuming you are familiar with the options\r\nfor passing in `Samplers` and `info` parameters, so when I say \"no\r\narguments,\" I mean no additional arguments other than whatever\r\n`Sampler`/`info` parameters you deem necessary. \r\n<br>\r\nYou can call `startSpan()` with no additional arguments.\r\nIn this case, `Trace.java` will start a span if the sampler (explicit\r\nor default) returns true. If the current span is not the `NullSpan`, the span\r\nreturned will be a child of the current span, otherwise it will start\r\na new trace in the current thread (it will be a\r\n`ProcessRootMilliSpan`). All of the other `startSpan()` methods take some\r\nparameter describing the parent span of the span to be created. The\r\nversions that take a `TraceInfo` or a `long traceId` and `long\r\nparentId` will mostly be used when continuing a trace over RPC. The\r\nreceiver of the RPC will check the message for the additional two\r\n`longs` and will call `startSpan()` if they are attached. The last\r\n`startSpan()` takes a `Span parent`. The result of `parent.child()`\r\nwill be used for the new span. `Span.child()` simply returns a span\r\nthat is a child of `this`. \r\n\r\nTesting Information\r\n-------------------------------\r\n\r\nThe test that creates a sample trace (TestHTrace) takes a command line argument telling it where to write span information. Run mvn test -DspanFile=\"FILE\\_PATH\" to write span information to FILE_PATH. If no file is specified, span information will be written to standard out. If span information is written to a file, you can use the included graphDrawer python script in tools/ to create a simple visualization of the trace. Or you could write some javascript to make a better visualization, and send a pull request if you do :). \r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}