commit | 8f8ae10b8cd1f3c7fe2094a75183d797087563a9 | [log] [tgz] |
---|---|---|
author | Fred Hebert <mononcqc@ferd.ca> | Mon Jul 08 10:08:36 2013 -0400 |
committer | Fred Hebert <mononcqc@ferd.ca> | Mon Jul 08 10:08:36 2013 -0400 |
tree | b678fc3ec34cd1db2d8f2858d5302f6c6be16c16 | |
parent | d058bacb6638483b79d6256f1288127d5e2c24a1 [diff] |
More efficient way to look up process_info Erlang already supports looking up lists of attributes rather than making N calls for each attribute individually.
Recon wants to be a set of tools usable in production to diagnose Erlang problems or inspect production environment safely.
Functions will be added as the need arise (see: not covered by other tools far and wide for the time being).
To get the number of reduction of a sliding window in time on a given node. The call to recon:reductions(5000, 5)
will check for the biggest reduction users for 5 seconds, and return the 5 bigger consumers:
1> recon:reductions(5000, 5). [{<0.32.0>, {2348,{dets,open_file_loop2,2},{proc_lib,init_p,5}}}, {<0.48.0>, {1977,{recon,'-sample/0-lc$^1/1-1-',3},{erlang,apply,2}}}, {<0.15.0>,{0,{gen_server,loop,6},{proc_lib,init_p,5}}}, {<0.31.0>,{0,{gen_server,loop,6},{proc_lib,init_p,5}}}, {<0.47.0>,{0,{gen_server,loop,6},{proc_lib,init_p,5}}}] 2> sys:get_status(pid(0,32,0)). {status,<0.32.0>, {module,dets}, [[{1336,3384}, {'$ancestors',[dets_sup,kernel_safe_sup,kernel_sup,<0.9.0>]}, {312,1336}, {'$initial_call',{dets,init,2}}, {1340,5432}], running,<0.30.0>,[], {head,512,1024,512, {file_descriptor,prim_file,{#Port<0.760>,21}}, 479,479,9,0,set,1, {v,v,v,v,v,{l,55320},v,{...},...}, 57043, [{5,21},{6,75},{7,123},{8,76},{9,...}], 500,dirty,false,phash2,true,512,...}]}
This function is particularly useful when processes are short-lived, usually too short to inspect through other tools, in order to figure out what kind of code is taking up all the time on a given node.
It is important to see its use as a sliding window. A program's timeline during sampling might look like this:
--w---- [Sample1] ---x-------------y----- [Sample2] ---z--->
Some processes will live between w
and die at x
, some between y
and z
, and some between w
and z
. These samples won‘t be too significant as they’re incomplete. If the majority of your processes run in a time interval x
...y
, you should make sure that your sampling time is smaller than this, and do it many times. This will allow to take snapshots that are more representative, both for short-lived and long-lived processes, in order to observe a trend. Not doing this can skew the results: long-lived processes that have 10 times the time to accumulate reductions will look like a bottleneck when they're not one.