Improve couch_proc_manager

The main improvement is speeding up process lookup. This should result in
improved latency for concurrent requests which quickly acquire and
release couchjs processes. Testing with concurrent vdu and map/reduce calls
showed a 1.6 -> 6x performance speedup [1].

Previously, couch_proc_manager linearly searched through all the processes and
executed a custom callback function for each to match design doc IDs. Instead,
use a separate ets table index for idle processes to avoid scanning assigned
processes.

Use a db tag in addition to a ddoc id to quickly find idle processes. This could
improve performance, but if that's not the case, allow configuring the tagging
scheme to use a db prefix only, or disable the scheme altogether.

Use the new `map_get` ets select guard [2] to perform ddoc id lookups during
the ets select traversal without a custom matcher callback.

In ordered ets tables use the partially bound key trick [3]. This helps skip
scanning processes using a different query language altogether.

Waiting clients used `os:timestamp/0` as a unique client identifier. It turns
out, `os:timestamp/0` is not guaranteed to be unique and could result in some
clients never getting a response. This bug was mostly likely the reason the
"fifo client order" test had to be commented out. Fix the issue by using a
newer monotonic timestamp function, and for uniqueness add the client's
gen_server return tag at the end. Uncomment the previously commented out test
so it can hopefully run again.

When clients tag a previously untagged process, asynchronously replace the
untagged process with a new process. This happens in the background and the
client doesn't have to wait for it.

When a ddoc tagged process cannot be found, before giving up, stop the oldest
unused ddoc processes to allow spawning new fresh ones. To avoid doing a linear
scan here, keep a separate `?IDLE_ACCESS` index with an ordered list of idle
ddoc proceses sorted by their last usage time.

When processes are returned to the pool, quickly respond to the client with an
early return, instead of forcing them to wait until we re-insert the process
back into the idle ets table. This should improve client latency.

If the waiting client list gets long enough, where it waits longer than the
gen_server get_proc timeout, do not waste time assigning or spawning a new
process for that client, since it already timed-out.

When gathering stats, avoid making gen_server calls, at least for the total
number of processes spawned metric. Table sizes can be easily computed with
`ets:info(Table, size)` from outside the main process.

In addition to peformance improvements clean up the couch_proc_manager API by
forcing all the calls to go through properly exported functions instead of
doing direct gen_server calls.

Remove `#proc_int{}` and use only `#proc{}`. The cast to a list/tuple between
`#proc_int{}` and `#proc{}` was dangerous and it avoided the compiler checking
that we're using the proper fields. Adding an extra field to the record
resulted in mis-matched fields being assigned.

To simplify the code a bit, keep the per-language count in an ets table. This
helps not having to thread the old and updated state everywhere. Everything
else was mostly kept in ets tables anyway, so we're staying consistent with
that general pattern.

Improve test coverage and convert the tests to use the `?TDEF_FE` macro so
there is no need for the awkward `?_test(begin ... end)` construct.

[1] https://gist.github.com/nickva/f088accc958f993235e465b9591e5fac
[2] https://www.erlang.org/doc/apps/erts/match_spec.html
[3] https://www.erlang.org/doc/man/ets.html#table-traversal
16 files changed