Skip to content

Field eventsToDeliver is a LinkedList, i.e., not thread-safe. Accesses to field eventsToDeliver are protected by synchronization on itself, but not in 1 location. #1972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adriannistor opened this issue Aug 3, 2020 · 1 comment
Labels
bug This issue is a bug.

Comments

@adriannistor
Copy link
Contributor

I created a Pull Request for this report at:

#1971

Description

Field eventsToDeliver is a LinkedList (line 128), i.e., not thread-safe.

Accesses to eventsToDeliver are protected by synchronization on itself (synchronized (eventsToDeliver)), e.g., at lines: 373-376, 424-426, 493-494, 493-498, and 493-506.

Note that line 506 (a remove()) and line 374 (decoder calls handleMessage() as per decoder's definition at line 110, and handleMessage() calls add() at line 271) modify the eventsToDeliver (and are protected by synchronization at line 493 and line 373 respectively).

However, line 397 (an add(), i.e., modifies eventsToDeliver) is not protected.

I see line 397 adds a ON_COMPLETE_EVENT, which, when processed in the eventsToDeliver, will signal termination (in isCompletedOrDeliverEvent() at line 494).

However, I think the add() at line 397 can occur in parallel with the remove() at line 506. This is because the remove() at line 506 occurs for other messages in the eventsToDeliver that were already enqueued before the ON_COMPLETE_EVENT.

I.e., when ON_COMPLETE_EVENT is being added by line 397 other messages may still be getting processed from the eventsToDeliver queue.

Note how isCompletedOrDeliverEvent() is getting called (by itself, recursively) at lines 505-509 until eventsToDeliver becomes empty (line 498) or ON_COMPLETE_EVENT is encountered (line 494).

Note also how the recursive calls at lines 505-509 are asynchronous (which is likely part of the reason for which we have the synchronization).

Even if somehow we know that there is a guarantee that all calls to line 506 were finished before line 397 is called (seems unlikely, with all those asynchronous calls above---also, if we could enforce such a guarantee, we may have better ways to communicate completion than to enqueue ON_COMPLETE_EVENT in the shared eventsToDeliver queue), it still seems like a fragile guarantee to make, because external callers may not be aware of it (e.g., it is not documented) or about other timing assumptions.

This Fix Code

This fix is very simple: just surround the add() at line 397 with a synchronized (eventsToDeliver), just like the other lines discussed above (373-376, 424-426, 493-494, 493-498, and 493-506).

Testing

mvn clean install

@adriannistor adriannistor added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 3, 2020
@debora-ito debora-ito added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Aug 26, 2020
@debora-ito
Copy link
Member

The fix was release in SDK version 2.15.20.

aws-sdk-java-automation added a commit that referenced this issue Mar 25, 2022
…1d919f4c1

Pull request: release <- staging/f170c0d8-fca0-4a74-9526-e011d919f4c1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants