-
Notifications
You must be signed in to change notification settings - Fork 2k
Leader Election #523 #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leader Election #523 #538
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: atttx123 If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for contributing this!
i left some comments and will try running/testing the implementation later in my local environment
/** | ||
* EventRecorder is optional. | ||
*/ | ||
private EventRecorder eventRecorder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this. we should design/implement event utilities in another pull. i updated the issue to reflect the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EventRecorder
is just a interface for ResourceLock to record the event,
client-go/leaderelection/resroucelock/interface.go
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't design it if it doesn't intend to use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i will delete this interface.
} | ||
|
||
private String getRootUrl() { | ||
return String.format("/api/%s/namespaces/%s/%s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're introducing coordination
api in the future releases, so it doesn't have to be a legacy/api
path. dealing w/ format printing url here doesn't make sense. better make it an interface or lambda returning an (generic?) object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, i will change this and add more unit test, it will take a time to complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a not good design in k8s-java-client
, because all k8s generated resource type do not have an common parent class. if ConfigMap
and Endpoint
extends HashedMetaObject
, then this interface will be more meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ConfigMap and Endpoint extends HashedMetaObject, then this interface will be more meaningful.
we have limitted kinds of resources for leader-electing. i presume just multiple independant resource lock classes is fine. separated class like ConfigMapResourceLock
, EndpointsResourceLock
, CoordinationResourceLock
..
@brendandburns for thoughts
this.name = name; | ||
this.namespace = namespace; | ||
this.client = client; | ||
this.apiVersion = apiVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have a second version for /api
since that it's a legacy path
try { | ||
leaderElector.run(); | ||
} finally { | ||
leaderElector.cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to be dense. what's the point of cancelling if the elector.run
already returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just fellow the logic in leaderelection.go
, the Run
function behave like this:
- block the call thread
- if itself get the leader lock, run a goroutine for
OnStartedLeading
- if others get the leader lock, this lock will fail into a while sleep loop, unit itself get the leader lock
so, theleaderElector.run()
will block the thread and will not return, so thecancel()
function is designed for elegant exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancel
here seems supposed to called in another thread. what'll read the boolean if run
already exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,cancel()
set an AtomicBoolean like a stop channel in golang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while the example suggests that "we're supposed to call cancel
after the elector exited". i'm confused about the order.
import java.util.concurrent.TimeUnit; | ||
|
||
/** | ||
* Created by IDEA on 2019-03-27 16:06 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: minimize docs
and btw, cncf seems unhappy /check-cla |
I signed it |
@yue9944882 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to be dense. how can i solve the Non complying file error ?
it's the formatter unhappy. https://github.com/coveo/fmt-maven-plugin#command-line
private String renewTime; | ||
private int leaderTransitions = 0; | ||
|
||
public LeaderElectionRecord() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeaderElectionRecord
means record saved in crd.metadata.annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the unused non-param constructor
try { | ||
leaderElector.run(); | ||
} finally { | ||
leaderElector.cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while the example suggests that "we're supposed to call cancel
after the elector exited". i'm confused about the order.
/** | ||
* LeaderElector is a leader election client. | ||
*/ | ||
public class LeaderElector implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point of implementing runnable?
if (!acquire(cancel)) { | ||
return; | ||
} | ||
Runnable runnable = () -> config.getCallbacks().onStartedLeading(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make it an anonymous lambda and fold it into the following line
return; | ||
} | ||
reportedLeader = observedRecord.getHolderIdentity(); | ||
Callable<Boolean> callable = () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make it an anonymous lambda and fold it into the following line
this comment also applies here
|
||
private AtomicBoolean cancel = new AtomicBoolean(false); | ||
|
||
private ExecutorService executor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes it a ScheduledExecutorService
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally, i think we should split it into two separate pools, one for running lease and one for hooks. the latter one should be a fixed pool at size 1.
* LeaseDuration is the duration that non-leader candidates will wait to force acquire leadership. | ||
* This is measured against time of last observed ack. | ||
*/ | ||
private long leaseDuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java.time.Duration
looks better?
} | ||
return done; | ||
}; | ||
executor.invokeAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executor.invokeAll( | |
executor.scheduleAtFixedRate(() -> { | |
// ... tryAcquireOrRenew | |
}, 0, getRetryPeriod().toMillis(), TimeUnit.MILLISECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are two different functions:
invokeAll(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
scheduleAtFixedRate(Runnable command, long initialDelay, long period, TimeUnit unit)
the logic there:
- the
tryAcquireOrRenew
function will tried everygetRetryPeriod (example 5s)
, and if total cost up togetRenewDeadline (example 15s)
will throwInterruptedException
- this will avoid dead lock or Infinite waiting in logic code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it better if we retry w/ the help of a scheduled thread pool?
* internal bookkeeping | ||
*/ | ||
private LeaderElectionRecord observedRecord; | ||
private Long observedTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: makes it either a Date
or a long
(built-in)
} catch (ResourceNotFoundException e) { | ||
try { | ||
config.getLock().create(leaderElectionRecord); | ||
} catch (ApiException e1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a better name for e1
. we don't like to do extra reverse engineering work, do we?
/close closing in favor of #590 |
@yue9944882: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
leader election #523
@yue9944882 @brendandburns
by the way, i'm not sure how to write the unit test code for the leader election.