-
Notifications
You must be signed in to change notification settings - Fork 29
Conversation
wlan0
commented
Mar 8, 2021
- backoff exponentially on errors
- ensure correct behavior when multiple operations on the same object pile on top of one another
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wlan0 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
// Ensure that multiple operations on different versions of the same object | ||
// do not happen in parallel | ||
c.OpLock(op) | ||
defer c.OpUnlock(op) | ||
c.OpLock(uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am missing something here. How does tracking uuid prevent operating in parallel on different versions of the same object? Wouldn't this serialize on all versions of the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our controller, we run a single shared informer between multiple crd resource event handlers. The single shared informer reports changes in bucket* objects as they occur. The events are processed by first encoding them in operation structs, and putting them into the tail of a workqueue. The workqueue implementation is what this PR mostly touches on.
The workqueue was previously holding operation objects (addOp, updateOp, deleteOp). On any event, these operation objects were put into the queue, and waited to be processed by the next available thread.
These objects were not comparable with one another, and that lead to a few problems:
- Gven an object, if update occurs while add is waiting in the queue, it should only result in 1 event - add, with the updated fields
- Gven an object, if update occurs while another update is waiting in the queue, then the second update is the only one that should be processed
The previous implementation, albeit technically eventually consistent, could not ensure the above dynamics. This was simply because it was not possible to determine new events on an object already present in the queue.
Now, for the uuid
. This is a string field that is easily comparable (==
), and by using the kubernetes UID as the unique value in this field, it is possible to compare two events occurring on the same object.
This change also allowed me to improve the exponential backoff characteristics. If new events occur while an operation on the same object is in back-off, the new event will not be processed before the back-off period is complete. i.e. the uuid positionin the queue is not changed. Instead, the value of opMap[uuid]
is set to the new operation structure denoting the event. By decoupling the operation itself from the queuing order, we gain a lot of flexibility.
Now to answer your direct question,
Sorry, I am missing something here. How does tracking uuid prevent operating in parallel on different versions of the same object? Wouldn't this serialize on all versions of the object?
By using uuid, we do not serialize the steps. We either replace old events with new ones if both are updates or handle them differently based on the type of the events.
o.Indexer.Delete(o.Object) | ||
} else { | ||
glog.Errorf("Error deleting %s %s: %v", ns, name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in default
block can we insert the operator that was not handled, or is that logged earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
func (c *ObjectStorageController) GetOpLock(op types.UID) *sync.Mutex { | ||
lockKey := op | ||
c.lockerLock.Lock() | ||
defer c.lockerLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the defer
be after L326 to handle the case of c.locker == nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the docs say that deferred funcs can be called anytime after the defer statement itself, my understanding is that it always gets called only after returning from the function it is called from. i.e. only after the defer starts going out of scope.
- backoff exponentially on errors - ensure correct behavior when multiple operations on the same object pile on top of one another
/lgtm |
Fix context definition
Fix context definition
remove opaque parameters from delete request and return bucket id o…