Fix race condition in worker release on connection_closing state.
This is exposed in the replicator large attachments tests case,
replicating from local to remote. In the current test configuration
it appears about once in 20-40 times. The failure manifests as
up as an {error, req_timedout} exception in the logs from one of the
PUT methods, during push replication. Then database comparison fails
because not all documents made it to the target.
Gory details:
After ibrowse receives Connection: Close header it will go into
shutdown 'connection_closing' state.
couch_replicator_httpc handles that state by trying to close
the socket and retrying, hoping that it would pick up a new worker from
the pool on next retry in couch_replicator_httpc.erl:
```
process_response({error, connection_closing}, Worker, HttpDb, Params, _Cb) ->
...
```
But it did not directly have a way to ensure socket is really closed,
instead it called ibrowse_http_client:stop(Worker). That didn't wait for
worker to die, also worker was returned back to the pool asynchronously,
in the 'after' clause in couch_replicator_httpc:send_req/3.
This worker which could still be alive but in a dying process,
could have been picked up immediately during the retry.
ibrowse in ibrowse:do_send_req/7 will handle a dead workers
process as {error, req_timedout}, which is what the intermitend
test failure showed in the log:
The fix:
* Make sure worker is really stopped after calling stop.
* Make sure worker is returned to the pool synchronously. So that
on retry, a worker in a known good state is picked up.
COUCHDB-2833
2 files changed