-
Notifications
You must be signed in to change notification settings - Fork 65
🐛 Refresh token for provisioning machines #250
🐛 Refresh token for provisioning machines #250
Conversation
Welcome @sethp-nr! |
I'm not attached to the new 9-minute sync interval, btw – I was considering trying to get clever with the |
controllers/token.go
Outdated
defaultTokenTTL = 10 * time.Minute | ||
var ( | ||
// DefaultTokenTTL is the amount of time a bootstrap token (and therefore a KubeadmConfig) will be valid | ||
DefaultTokenTTL = 10 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bumping this to 15m and leaving the sync interval at 10m is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 9 minutes is too uneven for me as well 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a test case for this new behavior, the behavior, restating to make sure I'm reading it correctly is:
The bootstrap token will be regenerated if the machine infrastructure is not ready
@@ -140,6 +140,23 @@ func (r *KubeadmConfigReconciler) Reconcile(req ctrl.Request) (_ ctrl.Result, re | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
// If we've already embedded a time-limited join token into a config, but are still waiting for the token to be used, refresh it | |||
if config.Status.Ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is an appropriate time for a switch? It can be out of scope for this PR.
Something like this:
switch {
case machine.Spec.Bootstrap.Data != nil && !config.Status.Ready:
case config.Status.Ready && machine.Status.InfrastructureReady:
case config.Status.Ready && !machine.Status.InfrastructureReady:
default:
}
I find it easier to see edge cases and make sure every path is handled explicitly
/retitle 🐛 Refresh token for provisioning machines |
@@ -30,8 +30,9 @@ import ( | |||
"sigs.k8s.io/controller-runtime/pkg/client" | |||
) | |||
|
|||
const ( | |||
defaultTokenTTL = 10 * time.Minute | |||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be a const
rather than a var
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that was a half-completed thought to make it configurable via the CLI. I've finished it, but I have no objection to tearing it back out and making it a const
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until we do make it configurable let's leave it as a const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh can you make this change @sethp-nr ?
@chuckha I added a test case for the new behavior. I've started trying to reconcile the current logic into discrete switch-shaped states, I'll let you know how it goes. |
sounds great, no worries if it doesn't work out |
I think it worked out: see 794be6e. I left in the "early out" bits at the top to avoid patching / lookup on the cluster when we don't need to, but everything else was easy enough. I'm not sure if it's exactly what you were thinking though |
/test pull-cluster-api-bootstrap-provider-kubeadm-verify |
@@ -160,7 +154,29 @@ func (r *KubeadmConfigReconciler) Reconcile(req ctrl.Request) (_ ctrl.Result, re | |||
} | |||
}() | |||
|
|||
if !cluster.Status.ControlPlaneInitialized { | |||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking more for a switch to organize the quick-to-return cases before generating the cloud-init
What do you think about restructuring this a little bit to include just a few cases that return early like:
case: is the cluster's infrastructure ready?
if it's not ready then we should return
case: is the config ready but the infrastructure not?
refresh the token and return
case: is the config ready and the infrastructure ready?
we're done; return
this switch wouldn't need a default case since it's encapsulating the idea "should we return before doing the work we want to be doing?". The default is no, we should not return (which is the same as doing nothing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, yeah, I like that much more. I'll give it a shot.
/approve this looks great! want to give this a look @detiber? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chuckha, sethp-nr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
return ctrl.Result{}, nil | ||
// Ignore machines that already have bootstrap data we didn't generate | ||
case machine.Spec.Bootstrap.Data != nil && !config.Status.Ready: | ||
// TODO: mark the config as ready? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to marking the Config as ready in this case. This will also help with rebuilding Status if lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this provider wasn't used to generate the user data, we shouldn't set it as ready. There is only one case I can think of that this is appropriate: if the bootstrap config reference points to this kubeadm config object and the bootstrap.data field is already populated, which covers the loss of Status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an ownerRef to a Machine, I'd assume that it would be because the Machine Controller set the ownerRef based on the bootstrap Config Reference, so us getting to this point should already mean that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more replying to the comment above the case
condition, which seems misleading. +1 to what @detiber pointed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes way more sense than what I was thinking (that case was a defense against a particular kind of user error). I'll change the comment / patch the "ready" status.
25767b3
to
8e94b30
Compare
If you don't want to make var to const change that's ok, we can merge without it, but can you squash your commits down to 1? |
Includes: Configurable token duration to extend the required sync interval when necessary. Also adds a requeueAfter parameter, because I realized doing this work that we should get an immediate second crack at the config before it's consumed. Finally, reconciles status when the config has already been used. In order for us to get to that point the owner ref must be set by the machine controller, which implies that the machine has a pointer to this config object. So either it's an extremely unlikely user error, or we mistakenly dropped the "ready" flag for this config.
20b41df
to
c562833
Compare
I squashed the commits – I didn't turn it back into a const because I made it configurable by CLI flag instead so I could set it to longer if we end up needing to push out the sync interval. |
/lgtm Thank you! |
What this PR does / why we need it:
Refreshes the token for a machine that's not used it.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #248