c69e16b297fdc359126d0fa0cd7142603e82ac0d - hadoop

commit	c69e16b297fdc359126d0fa0cd7142603e82ac0d	[log] [tgz]
author	Steve Loughran <stevel@cloudera.com>	Wed Aug 31 11:16:52 2022 +0100
committer	GitHub <noreply@github.com>	Wed Aug 31 11:16:52 2022 +0100
tree	c8b042d2dc4dbed2a2659db8c84e90670c4634b1
parent	c334ba89ada065be74fbe449e115dc56ab51dd38 [diff]

HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766)


HADOOP-16202 "Enhance openFile()" added asynchronous draining of the 
remaining bytes of an S3 HTTP input stream for those operations
(unbuffer, seek) where it could avoid blocking the active
thread.

This patch fixes the asynchronous stream draining to work and so
return the stream back to the http pool. Without this, whenever
unbuffer() or seek() was called on a stream and an asynchronous
drain triggered, the connection was not returned; eventually
the pool would be empty and subsequent S3 requests would
fail with the message "Timeout waiting for connection from pool"

The root cause was that even though the fields passed in to drain() were
converted to references through the methods, in the lambda expression
passed in to submit, they were direct references

operation = client.submit(
 () -> drain(uri, streamStatistics,
       false, reason, remaining,
       object, wrappedStream));  /* here */

Those fields were only read during the async execution, at which
point they would have been set to null (or even a subsequent read).

A new SDKStreamDrainer class peforms the draining; this is a Callable
and can be submitted directly to the executor pool.

The class is used in both the classic and prefetching s3a input streams.

Also, calling unbuffer() switches the S3AInputStream from adaptive
to random IO mode; that is, it is considered a cue that future
IO will not be sequential, whole-file reads.

Contributed by Steve Loughran.

10 files changed