Description
I have a system that periodically downloads tens of thousands of small (~3-400k) objects. Sometimes, it experiences an exception at some point in the process, which causes a torrent of subsequent exceptions on most (and sometimes all) future requests. Since the files are small, the per-file request overhead is relatively high, so I keep 40 requests in flight (increasing concurrency didn't improve overall performance past that point).
Expected Behavior
If there's a timeout or other issue, it should only affect that request, and subsequent requests should continue to work.
Current Behavior
Sometimes the first exception happens almost instantly, and other times it happens 20,000 requests in. I made a simplified example of the workload and I was able to reproduce the issue with that twice, and in each case the first exception was a ReadTimeoutException. Perhaps that's always the cause, which would also explain why it doesn't have a clear pattern of when it would occur, but usually I only notice the issue after it's generated so much logging (from stacktraces) that I've lost the start of the log.
Usually when the first exception happens all subsequent requests immediately fail, but in the particular case I captured the logs on, several more requests completed.
(I also see exceptions frequently when closing the client when the work is done, but that's probably a separate issue.)
Steps to Reproduce (for bugs)
Use https://bitbucket.org/marshallpierce/s3-sdk-exception-repro/src/master/ on a bucket with lots of smallish files. I've been able to intermittently reproduce the issue with n=40 concurrent downloads.
See the attached log for (lightly anonymized) output from the above tool. I stopped it after a second once it started vigorously barfing exceptions.
Context
I want to download a bunch of small files concurrently without having it crash half the time.
Your Environment
- AWS Java SDK version used: 2.5.29
- JDK version used: Zulu 11.0.3
- Operating System and version: macOS 10.14.4