Skip to content
This repository was archived by the owner on Jul 30, 2021. It is now read-only.

🐛 Support running alongside other Cluster API pods in the same namespace with leader election enabled #273

Merged
merged 1 commit into from
Oct 7, 2019

Conversation

noamran
Copy link
Contributor

@noamran noamran commented Oct 4, 2019

What this PR does / why we need it:
Currently LeaderElectionID is set to the default controller-leader-election-helper. This causes an issue when several Cluster API pods are deployed in the same namespace. Adding a unique ID per service should resolve this.
Which issue(s) this PR fixes :
Fixes #271

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 4, 2019
@k8s-ci-robot k8s-ci-robot requested review from chuckha and ncdc October 4, 2019 22:24
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 4, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @noamran. Thanks for your PR.

I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

main.go Outdated
@@ -103,6 +103,7 @@ func main() {
Scheme: scheme,
MetricsBindAddress: metricsAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "controller-leader-election-helper-cabpk",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking but it would be nice to make this into a constant

@ncdc
Copy link
Contributor

ncdc commented Oct 5, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 5, 2019
Copy link
Contributor

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Do we want to make this configurable in the future?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: noamran, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2019
@detiber
Copy link
Contributor

detiber commented Oct 7, 2019

Do we want to make this configurable in the future?

I don't believe so, since it is scoped to a namespace (We can set ctrl.Options.LeaderElectionNamespace in the case that watchNamespace != ""). Otherwise we are setting users up to deploy competing controllers.

It looks like we can just set ctrl.Options.LeaderElectionNamespace to watchNamespace, since controller-runtime handles defaulting if it is "": https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/leaderelection/leader_election.go#L61-L67

@ncdc
Copy link
Contributor

ncdc commented Oct 7, 2019

Do you think it makes more sense to match the configmap's namespace to the pod's namespace or the watchNamespace? I guess I would expect it to match the pod.

@detiber
Copy link
Contributor

detiber commented Oct 7, 2019

Do you think it makes more sense to match the configmap's namespace to the pod's namespace or the watchNamespace? I guess I would expect it to match the pod.

Maybe... This would be the default semantics today based on using the in cluster client for getting the Namespace.

@detiber
Copy link
Contributor

detiber commented Oct 7, 2019

Using watchNamespace is nice, since we know the user is specifically limiting the watchNamespace to a single Namespace, but it also then possibly exposes the leader election configmap to users, which might not be desirable...

That said, I wouldn't expect users to override the namespace for the Deployment unless they knew they wanted to run multiple controllers scoped down to a single namespace...

@ncdc
Copy link
Contributor

ncdc commented Oct 7, 2019

I would recommend (ordered by preference, high to low):

  1. Match pod namespace, don't expose --leader-election-namespace
  2. Match pod namespace, expose --leader-election-namespace to allow the user to customize
  3. Match watchNamespace, don't expose --leader-election-namespace
  4. Match watchNamespace, expose --leader-election-namespace to allow the user to customize

And 💯 that users 99.999% of the time don't need to know about the leader election namespace, or need to customize it.

@ncdc ncdc added this to the v0.1.x milestone Oct 7, 2019
@detiber
Copy link
Contributor

detiber commented Oct 7, 2019

Match pod namespace, don't expose --leader-election-namespace
Match pod namespace, expose --leader-election-namespace to allow the user to customize

This exposes a similar footgun to overriding leaderElectionID

Match watchNamespace, don't expose --leader-election-namespace
Match watchNamespace, expose --leader-election-namespace to allow the user to customize

This exposes a similar footgun to overriding leaderElectionID

Another question I just thought of: Do we ever think it should be possible to deploy multiple CABPK( or any Cluster API controller) in the same namespace with different watchNamespaces? If so, maybe it would make sense to alter leaderElectionID to include the watchNamespace if set?

@vincepri
Copy link
Contributor

vincepri commented Oct 7, 2019

Should we move this conversation to an issue? I'd propose to merge this change as-is given that we're planning to release CABPK today, and explore different options (as outlined above) in a new issue, possibly making it consistent for all of CAP* projects.

How does that sound?

@ncdc
Copy link
Contributor

ncdc commented Oct 7, 2019

Yeah, I don't think we should expose any of this as flags.

I think including the watchNamespace in the name of the configmap could be a slippery slope - what if the watchNamespace's length is 250 characters (with the max being 253)? Then we can't have both controller-leader-election-cabpk and the watchNamespace in the name...

+1 to merging as-is (pending @detiber's suggestion to remove helper). Note, we'd have to think very carefully before we change the naming convention in this minor version (upgrade impact).

@detiber
Copy link
Contributor

detiber commented Oct 7, 2019

So, the tricky part is that any change to the leaderElectionID and/or leaderElectionNamespace is a dangerous change... The release note should include:

Before upgrading the controller deployment, first scale to 0, then apply the updated manifest, and then scale back to 1.

Otherwise there will be competing controllers using different resource locks during the course of the deployment.

I think we should be careful to limit these types of disruptive changes. If we are going to release today, then we should likely not include this change, or we should be good with the naming/namespacing convention we choose for at least a few releases.

@ncdc
Copy link
Contributor

ncdc commented Oct 7, 2019

@noamran FYI, the title of the PR is used in the release notes. I would recommend retitling to something like "Support running alongside other Cluster API pods in the same namespace with leader election enabled"

@noamran noamran changed the title 🐛 Hard coding the LeaderElectionID 🐛 Support running alongside other Cluster API pods in the same namespace with leader election enabled Oct 7, 2019
@ncdc
Copy link
Contributor

ncdc commented Oct 7, 2019

LGTM. Could you please squash down to 1 commit?

Co-Authored-By: Jason DeTiberus <[email protected]>
@ncdc
Copy link
Contributor

ncdc commented Oct 7, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2019
@k8s-ci-robot k8s-ci-robot merged commit 175bf91 into kubernetes-retired:master Oct 7, 2019
@noamran noamran deleted the 271_uniqueID branch October 9, 2019 18:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CABPK controller contending lock with CAPI
6 participants