Skip to content
This repository was archived by the owner on Jul 30, 2021. It is now read-only.

🐛 Refresh token for provisioning machines #250

Merged

Conversation

sethp-nr
Copy link
Contributor

What this PR does / why we need it:

Refreshes the token for a machine that's not used it.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #248

@k8s-ci-robot
Copy link
Contributor

Welcome @sethp-nr!

It looks like this is your first PR to kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 23, 2019
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 23, 2019
@sethp-nr
Copy link
Contributor Author

I'm not attached to the new 9-minute sync interval, btw – I was considering trying to get clever with the requeueAfter duration for newly-minted tokens, but this was the simplest thing I could think of to do.

defaultTokenTTL = 10 * time.Minute
var (
// DefaultTokenTTL is the amount of time a bootstrap token (and therefore a KubeadmConfig) will be valid
DefaultTokenTTL = 10 * time.Minute
Copy link
Contributor

@chuckha chuckha Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bumping this to 15m and leaving the sync interval at 10m is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 9 minutes is too uneven for me as well 😂

Copy link
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a test case for this new behavior, the behavior, restating to make sure I'm reading it correctly is:

The bootstrap token will be regenerated if the machine infrastructure is not ready

@@ -140,6 +140,23 @@ func (r *KubeadmConfigReconciler) Reconcile(req ctrl.Request) (_ ctrl.Result, re
return ctrl.Result{}, err
}

// If we've already embedded a time-limited join token into a config, but are still waiting for the token to be used, refresh it
if config.Status.Ready {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is an appropriate time for a switch? It can be out of scope for this PR.

Something like this:

switch {
  case machine.Spec.Bootstrap.Data != nil && !config.Status.Ready:
  case config.Status.Ready && machine.Status.InfrastructureReady:
  case config.Status.Ready && !machine.Status.InfrastructureReady:
  default:
}

I find it easier to see edge cases and make sure every path is handled explicitly

@SataQiu
Copy link
Contributor

SataQiu commented Sep 25, 2019

/retitle 🐛 Refresh token for provisioning machines

@k8s-ci-robot k8s-ci-robot changed the title 🐛 refresh token for provisioning machines 🐛 Refresh token for provisioning machines Sep 25, 2019
@@ -30,8 +30,9 @@ import (
"sigs.k8s.io/controller-runtime/pkg/client"
)

const (
defaultTokenTTL = 10 * time.Minute
var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be a const rather than a var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was a half-completed thought to make it configurable via the CLI. I've finished it, but I have no objection to tearing it back out and making it a const again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until we do make it configurable let's leave it as a const

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh can you make this change @sethp-nr ?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 27, 2019
@sethp-nr
Copy link
Contributor Author

@chuckha I added a test case for the new behavior. I've started trying to reconcile the current logic into discrete switch-shaped states, I'll let you know how it goes.

@chuckha
Copy link
Contributor

chuckha commented Sep 27, 2019

sounds great, no worries if it doesn't work out

@sethp-nr
Copy link
Contributor Author

I think it worked out: see 794be6e. I left in the "early out" bits at the top to avoid patching / lookup on the cluster when we don't need to, but everything else was easy enough. I'm not sure if it's exactly what you were thinking though

@sethp-nr
Copy link
Contributor Author

/test pull-cluster-api-bootstrap-provider-kubeadm-verify

@@ -160,7 +154,29 @@ func (r *KubeadmConfigReconciler) Reconcile(req ctrl.Request) (_ ctrl.Result, re
}
}()

if !cluster.Status.ControlPlaneInitialized {
switch {
Copy link
Contributor

@chuckha chuckha Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking more for a switch to organize the quick-to-return cases before generating the cloud-init

What do you think about restructuring this a little bit to include just a few cases that return early like:

case: is the cluster's infrastructure ready?
    if it's not ready then we should return
case: is the config ready but the infrastructure not?
    refresh the token and return
case: is the config ready and the infrastructure ready?
    we're done; return

this switch wouldn't need a default case since it's encapsulating the idea "should we return before doing the work we want to be doing?". The default is no, we should not return (which is the same as doing nothing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, yeah, I like that much more. I'll give it a shot.

@chuckha
Copy link
Contributor

chuckha commented Sep 30, 2019

/approve

this looks great!

want to give this a look @detiber?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chuckha, sethp-nr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2019
return ctrl.Result{}, nil
// Ignore machines that already have bootstrap data we didn't generate
case machine.Spec.Bootstrap.Data != nil && !config.Status.Ready:
// TODO: mark the config as ready?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to marking the Config as ready in this case. This will also help with rebuilding Status if lost.

Copy link
Contributor

@vincepri vincepri Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this provider wasn't used to generate the user data, we shouldn't set it as ready. There is only one case I can think of that this is appropriate: if the bootstrap config reference points to this kubeadm config object and the bootstrap.data field is already populated, which covers the loss of Status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an ownerRef to a Machine, I'd assume that it would be because the Machine Controller set the ownerRef based on the bootstrap Config Reference, so us getting to this point should already mean that is the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more replying to the comment above the case condition, which seems misleading. +1 to what @detiber pointed out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes way more sense than what I was thinking (that case was a defense against a particular kind of user error). I'll change the comment / patch the "ready" status.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2019
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2019
@chuckha
Copy link
Contributor

chuckha commented Oct 2, 2019

If you don't want to make var to const change that's ok, we can merge without it, but can you squash your commits down to 1?

Includes:

Configurable token duration to extend the required sync interval when
necessary.

Also adds a requeueAfter parameter, because I realized doing this work
that we should get an immediate second crack at the config before it's
consumed.

Finally, reconciles status when the config has already been used. In
order for us to get to that point the owner ref must be set by the
machine controller, which implies that the machine has a pointer to this
config object. So either it's an extremely unlikely user error, or we
mistakenly dropped the "ready" flag for this config.
@sethp-nr
Copy link
Contributor Author

sethp-nr commented Oct 2, 2019

I squashed the commits – I didn't turn it back into a const because I made it configurable by CLI flag instead so I could set it to longer if we end up needing to push out the sync interval.

@chuckha
Copy link
Contributor

chuckha commented Oct 2, 2019

/lgtm

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bootstrap token can be expired even before infra exists
6 participants