Skip to content

Leader Election #523 #538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Leader Election #523 #538

wants to merge 3 commits into from

Conversation

atttx123
Copy link

leader election #523
@yue9944882 @brendandburns

by the way, i'm not sure how to write the unit test code for the leader election.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: atttx123
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: mbohlool

If they are not already assigned, you can assign the PR to them by writing /assign @mbohlool in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 27, 2019
Copy link
Member

@yue9944882 yue9944882 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for contributing this!

i left some comments and will try running/testing the implementation later in my local environment

/**
* EventRecorder is optional.
*/
private EventRecorder eventRecorder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this. we should design/implement event utilities in another pull. i updated the issue to reflect the task.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventRecorderis just a interface for ResourceLock to record the event,
client-go/leaderelection/resroucelock/interface.go,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't design it if it doesn't intend to use it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i will delete this interface.

}

private String getRootUrl() {
return String.format("/api/%s/namespaces/%s/%s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're introducing coordination api in the future releases, so it doesn't have to be a legacy/api path. dealing w/ format printing url here doesn't make sense. better make it an interface or lambda returning an (generic?) object

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, i will change this and add more unit test, it will take a time to complete

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a not good design in k8s-java-client, because all k8s generated resource type do not have an common parent class. if ConfigMap and Endpoint extends HashedMetaObject, then this interface will be more meaningful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ConfigMap and Endpoint extends HashedMetaObject, then this interface will be more meaningful.

we have limitted kinds of resources for leader-electing. i presume just multiple independant resource lock classes is fine. separated class like ConfigMapResourceLock, EndpointsResourceLock, CoordinationResourceLock..

@brendandburns for thoughts

this.name = name;
this.namespace = namespace;
this.client = client;
this.apiVersion = apiVersion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a second version for /api since that it's a legacy path

try {
leaderElector.run();
} finally {
leaderElector.cancel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to be dense. what's the point of cancelling if the elector.run already returned?

Copy link
Author

@atttx123 atttx123 Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just fellow the logic in leaderelection.go, the Run function behave like this:

  • block the call thread
  • if itself get the leader lock, run a goroutine for OnStartedLeading
  • if others get the leader lock, this lock will fail into a while sleep loop, unit itself get the leader lock
    so, the leaderElector.run() will block the thread and will not return, so the cancel() function is designed for elegant exit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancel here seems supposed to called in another thread. what'll read the boolean if run already exit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,cancel() set an AtomicBoolean like a stop channel in golang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the example suggests that "we're supposed to call cancel after the elector exited". i'm confused about the order.

import java.util.concurrent.TimeUnit;

/**
* Created by IDEA on 2019-03-27 16:06
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: minimize docs

@yue9944882
Copy link
Member

and btw, cncf seems unhappy

/check-cla

@atttx123
Copy link
Author

I signed it

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 28, 2019
@atttx123
Copy link
Author

@yue9944882
sorry to be dense. how can i solve the Non complying file error ?

Copy link
Member

@yue9944882 yue9944882 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to be dense. how can i solve the Non complying file error ?

it's the formatter unhappy. https://github.com/coveo/fmt-maven-plugin#command-line

private String renewTime;
private int leaderTransitions = 0;

public LeaderElectionRecord() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LeaderElectionRecord means record saved in crd.metadata.annotations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the unused non-param constructor

try {
leaderElector.run();
} finally {
leaderElector.cancel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the example suggests that "we're supposed to call cancel after the elector exited". i'm confused about the order.

/**
* LeaderElector is a leader election client.
*/
public class LeaderElector implements Runnable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of implementing runnable?

if (!acquire(cancel)) {
return;
}
Runnable runnable = () -> config.getCallbacks().onStartedLeading();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make it an anonymous lambda and fold it into the following line

return;
}
reportedLeader = observedRecord.getHolderIdentity();
Callable<Boolean> callable = () -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make it an anonymous lambda and fold it into the following line

this comment also applies here


private AtomicBoolean cancel = new AtomicBoolean(false);

private ExecutorService executor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes it a ScheduledExecutorService

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally, i think we should split it into two separate pools, one for running lease and one for hooks. the latter one should be a fixed pool at size 1.

* LeaseDuration is the duration that non-leader candidates will wait to force acquire leadership.
* This is measured against time of last observed ack.
*/
private long leaseDuration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.time.Duration looks better?

}
return done;
};
executor.invokeAll(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
executor.invokeAll(
executor.scheduleAtFixedRate(() -> {
// ... tryAcquireOrRenew
}, 0, getRetryPeriod().toMillis(), TimeUnit.MILLISECONDS)

Copy link
Author

@atttx123 atttx123 Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are two different functions:

invokeAll(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
scheduleAtFixedRate(Runnable command, long initialDelay, long period, TimeUnit unit)

the logic there:

  • the tryAcquireOrRenew function will tried every getRetryPeriod (example 5s), and if total cost up to getRenewDeadline (example 15s) will throw InterruptedException
  • this will avoid dead lock or Infinite waiting in logic code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it better if we retry w/ the help of a scheduled thread pool?

* internal bookkeeping
*/
private LeaderElectionRecord observedRecord;
private Long observedTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: makes it either a Date or a long(built-in)

} catch (ResourceNotFoundException e) {
try {
config.getLock().create(leaderElectionRecord);
} catch (ApiException e1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better name for e1. we don't like to do extra reverse engineering work, do we?

@yue9944882
Copy link
Member

/close

closing in favor of #590

@k8s-ci-robot
Copy link
Contributor

@yue9944882: Closed this PR.

In response to this:

/close

closing in favor of #590

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants