Skip to content

The select algorithm gets into live-lock in a corner case #504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ndkoval opened this issue Aug 21, 2018 · 0 comments
Closed

The select algorithm gets into live-lock in a corner case #504

ndkoval opened this issue Aug 21, 2018 · 0 comments
Assignees
Labels

Comments

@ndkoval
Copy link
Member

ndkoval commented Aug 21, 2018

In case we have two channels (c1 and c2) and two coroutines, which perform select on this channel but in the different direction, a livelock (StackOverflowError in practice) can occur.

Let's assume that the 1st coroutine processed c1 and then c2, while the second --- c2 and then c1. The following interleaving can occur.

  1. Coroutine 1 goes to c1 and adds itself to the waiting queue;
  2. Coroutine 2 goes to c2 and adds itself to the waiting queue;
  3. Coroutine 1 goes to c2, decides to make a rendezvous, creates the corresponding descriptor and write it to its state;
  4. Coroutine 2 does the same (wants to make a rendezvous on c1).

After that, both coroutines tries to select each other, for what each of them has to update the state of the opposite coroutine to SELECTED. However, they see descriptors, so they try to help each other, see descriptors again, try to help each other again, ... This way, they get into a livelock, what causes a StackOverflowError, because helping is written using recursion.

This could be easily fixed by having the total order on all SelectInstance-s, and writing a descriptor to the first in this order SelectInstance at first, and only then to the second one.

@elizarov elizarov added the bug label Aug 25, 2018
elizarov added a commit that referenced this issue Aug 7, 2019
The fix is based on the sequential numbering of atomic select operation.
Deadlock is detected and the operation with the lower sequential
number is aborted and restarted (with a larger number).

Fixes #504
qwwdfsad pushed a commit that referenced this issue Aug 21, 2019
The fix is based on the sequential numbering of atomic select operation.
Deadlock is detected and the operation with the lower sequential
number is aborted and restarted (with a larger number).

Fixes #504
elizarov added a commit that referenced this issue Aug 22, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Aug 23, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Aug 30, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

* TODO:NOTE: SelectDeadlockLFStressTest does not pass and is disabled.
  !!! DO NOT MERGE !!!

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 6, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

* TODO:NOTE: SelectDeadlockLFStressTest does not pass and is disabled.
  !!! DO NOT MERGE !!!

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 6, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

* TODO:NOTE: SelectDeadlockLFStressTest does not pass and is disabled.
  !!! DO NOT MERGE !!!

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 7, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

* TODO:NOTE: SelectDeadlockLFStressTest does not pass and is disabled.
  !!! DO NOT MERGE !!!

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 7, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 10, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 10, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 19, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
elizarov added a commit that referenced this issue Sep 21, 2019
* onSend/onReceive clauses on the same channel: Instead of
  StackOverflowError we throw IllegalStateException and leave
  the channel in the original state.
* Fix SOE in select with "opposite channels" stress-test. The fix is
  based on the sequential numbering of atomic select operation.
  Deadlock is detected and the operation with the lower sequential
  number is aborted and restarted (with a larger number).

Fixes #504
Fixes #1411
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants