Fix recon_lib failing on long ports or proc lists

Nodes with lots of processes or ports can see their proc_count,
proc_window, inet_count, or inet_window functions fail due to a race
condition where:

1. The list of ports or processes is created;
2. The ports or processes are iteratively polled for their properties;
3. Some port or process closes;
4. A badmatch error occurs and the entire function fails.

The error specifically happens in the functions of arity 2 in recon_lib
that made the fetch to each port or process.

The interface of these functions are getting changed to:

- account for the error
- return {ok, State} or {error, Reason} depending on the case

Moreover, the functions of arity 1 in recon_lib that make use of them
are changing so that their list comprehension filters bad data --
which we do not care about anyway.

A similar change is included to respect the new API in recon's refc
binary leak function.
4 files changed
tree: c5b0f83ccc1924b27ec3bbd2365803669a4aae56
  1. doc/
  2. script/
  3. site/
  4. src/
  5. .gitignore
  6. docsite.erl
  7. LICENSE
  8. README.md
  9. rebar
README.md

recon

Recon wants to be a set of tools usable in production to diagnose Erlang problems or inspect production environment safely.

To build the library:

rebar compile

Documentation for the library can be obtained at http://ferd.github.io/recon/

Changelog

  • 0.4.0: fixed bug where nodes with lots of processes or ports could see their count or window functions fail because a process or socket closed between the time the function started and before it finished. This ends up changing the API in recon_lib for the window and count functions that take a specific pid as an argument.
  • 0.3.1: factored out some logic from recon:info/1 into recon_lib:term_to_pid and allowed arbitrary terms to be used for pids in recon:get_state/1.