-
Notifications
You must be signed in to change notification settings - Fork 3.3k
fix leaderelection: release lock when error occour and graceful shotd… #1747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix leaderelection: release lock when error occour and graceful shotd… #1747
Conversation
Welcome @fangfenghuang! |
/assign @roycaihw |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me how the graceful shotdown works. Could you elaborate?
if lock_status: | ||
logging.info("release lock: {}".format(old_election_record.holder_identity)) | ||
old_election_record.lease_duration = 1 | ||
old_election_record.holder_identity = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lock may have been successfully acquired by another client. Doing so may force that client to lose the leadership. Why do we want to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this happens when lock has been acquired and exception occurs that lock cannot be normally release, if not release the lock here, the next leader election must wait for timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i fix it just like client-go did: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/leaderelection/leaderelection.go#L294
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM otherwise
if lock_status: | ||
logging.info("release lock: {}".format(old_election_record.holder_identity)) | ||
old_election_record.lease_duration = 1 | ||
old_election_record.holder_identity = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also set renew time and acquire time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def release(self): | ||
lock_status, old_election_record = self.election_config.lock.get(self.election_config.lock.name, | ||
self.election_config.lock.namespace) | ||
if lock_status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's check if the elector is the leader. We can short-circuit the release method if not
retry_period, | ||
onstarted_leading, | ||
onstopped_leading, | ||
release_on_cancel=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document the caveat
nit: also "cancel" may not be the most accurate word here, since it was from the Golang context idea.
…own #1669 Signed-off-by fangfenghuang <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fangfenghuang The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…own #1669
Signed-off-by fangfenghuang [email protected]
What type of PR is this?
/kind bug
What this PR does / why we need it:
release lock when error occour and graceful shotdown
Which issue(s) this PR fixes:
Fixes #1669
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: