Skip to content

Add a documentation for JobSupport.addLastAtomic #3892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

qwwdfsad
Copy link
Member

  • Document internal invariants
  • Add memory footprint test
  • This explanation can be used to reasonably review follow-up PR that gets rid of addLastIf

* Document internal invariants
* Add memory footprint test
* This explanation can be used to reasonably review follow-up PR that gets rid of addLastIf
* If it returns `false` then there is no children to wait for and it's safe to finalize the final job's state.
* Note that at this point, new children can appear:
* * For `cancelling` state they are immediately cancelled.
* * For `completing` state they are not cancelled and left hanging out to dry.
Copy link
Member Author

@qwwdfsad qwwdfsad Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to write something meaningful here, but failed to do so.

Here is a more formal description of what it means: #3893

@qwwdfsad
Copy link
Member Author

qwwdfsad commented Oct 17, 2023

@dkhalanskyjb while you are here, here is the rough outline of the migration plan that minimizes the impact on the codebase (e.g. we don't have to rewrite everything in JobSupport), and helps us to get rid of DCSS, pave a road towards concurrent linked list without doing a lot of redundant work:

  • Provide a concept of "closing" the list. Luckily, we already have that (addLastIfPrev in previous versions, currently removed), it's local (i.e. operation affects only the list itself and not the memory locations around) and battle-tested, this is basically how old channels were implemented
  • Finalization of the job state (figuring the precise definition is the trick here. TL;DR currently it's when DCSS fails) is now a multi-step operation:
    • Close the list with the new state (e.g. node Closed(newProposedState))
    • Update the state
    • Any observer of the list's tail (e.g. the caller of invokeOfCompletion) first helps with the state (list: [xs] :: Closed(newProposedState) -> newProposedState) and only then launches their operation
  • Now we can safely remove DCSS from the codebase and have a more "local" reasoning about job states.
  • Provide the same concept of closing to ConcurrentLinkedList and seamlessly migrate

The second step is the trickiest one and other is a chance it won't work out, so any critique is welcomed

Copy link
Collaborator

@dkhalanskyjb dkhalanskyjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also (re)move the // This is Harris's RDCSS (Restricted Double-Compare Single Swap) operation comment, as it's currently pointing to nothing.

Regarding the proposed migration path from DCSS, I think this would lead to the following non-linearizable behavior:

// THREAD A
job.complete()

// THREAD B
job.cancel()
job.isCancelled // true
Job(job)
job.children // doesn't have the new `Job`
job.isCompleted // false

How this could come to be:

  • Thread A wants to complete a job. It closes the list but stops before updating the state.
  • Thread B wants to cancel a job. It doesn't modify the list, because new children should still be admissible after cancellation, but does update the state. Then, it looks at the job's state and observes that, yes, it's cancelling. Then it attempts to add a new child, but can't, as the list is closed. Regardless of what it does when it observes this inconsistency, then, it observes that the state is not final yet.

Do you see any errors in this logic?

@qwwdfsad
Copy link
Member Author

@dkhalanskyjb I see, thanks for the trace.
It seems like the root of the problem is that we can update the state without actually touching the tail of the list (or without touching the list whatsoever), creating a race. Seems like concurrent helping has to be performed unconditionally

@dkhalanskyjb
Copy link
Collaborator

dkhalanskyjb commented Oct 20, 2023 via email

@dkhalanskyjb
Copy link
Collaborator

dkhalanskyjb commented Oct 23, 2023

There's a scheme for finalization I have far more faith in. It's similar to how DCSS behaves now but without the overly general consensus mechanism:

  • First, update the state to ListClosingInProgress(new state) (which is not Incomplete). The state is the sole source of truth; the other memory locations are (potentially outdated) reflections of it.
  • Anyone who sees this state (including the one who initially set it) must attempt to close the list (without any special markers).
  • Anyone who knows that the list is closed (including the one who initially closed it) must attempt to update the state to new state. The operation that succeeds is responsible for the finalization: notifying everyone on the list, etc.

In essence, ListClosingInProgress(new state) behaves exactly like new state, except it imposes some cleanup work on other operations.

If an operation successfully appends to the list, this must have happened before the finalization, as we have an event ordering updating the list -> closing the list -> sending out notifications. In particular, this approach would fix #3893: there's no way for notifications not to be sent to some newly added element.

@dkhalanskyjb dkhalanskyjb mentioned this pull request Dec 19, 2023
@qwwdfsad qwwdfsad closed this Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants