Skip to content

🐛 Fix nil pointer reference during bastion deletion #1231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mkjpryor
Copy link

What this PR does / why we need it:
Fixes a nil pointer reference when a cluster does not have a bastion configured.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1230

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2022
@netlify
Copy link

netlify bot commented Apr 29, 2022

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 7715d0c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/626bdeb6395e9b0007c42c60
😎 Deploy Preview https://deploy-preview-1231--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 29, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @mkjpryor. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 29, 2022
@mkjpryor
Copy link
Author

This is currently blocking me from upgrading to the v1alpha5 resources. I'd appreciate if this can be rolled into a 0.6.2 release ASAP.

@mdbooth
Copy link
Contributor

mdbooth commented Apr 29, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 29, 2022
@seanschneeweiss
Copy link
Contributor

/ok-to-test

@mkjpryor
Copy link
Author

mkjpryor commented Apr 29, 2022

@mdbooth

Error: UPGRADE FAILED: cannot patch "matt-dev" with kind OpenStackCluster: admission webhook "validation.openstackcluster.infrastructure.cluster.x-k8s.io" denied the request: OpenStackCluster.infrastructure.cluster.x-k8s.io "matt-dev" is invalid: spec: Forbidden: cannot be modified

So once openStackCluster.Spec.Bastion is set, it cannot then become nil again, and vice-versa by the looks of it.

@mkjpryor
Copy link
Author

So the edge case I think you were worried about is if openStackCluster.Spec.Bastion is set but instance creation fails and so instanceStatus is nil right? That would be a problem with my patch.

However, given the fact that the spec is immutable, it seems a lot like we can just not bother reconciling the bastion at all when openStackCluster.Spec.Bastion = nil. Or does that sound dodgy?

@mkjpryor
Copy link
Author

mkjpryor commented Apr 29, 2022

However the case where openStackCluster.Spec.Bastion.Enabled changes between true and false (which is allowed to happen) would still be fine, because the gate for bastion reconciliation would be openStackCluster.Spec.Bastion != nil.

@mkjpryor
Copy link
Author

mkjpryor commented Apr 29, 2022

So it turns out you can go from openStackCluster.Spec.Bastion.Enabled = false to openStackCluster.Spec.Bastion = nil, but not from openStackCluster.Spec.Bastion.Enabled = true.

Not sure what the implications are for the edge cases from that, and don't have time to think about them now until after the weekend unfortunately. It seems like the proper solution would be to allow computeService.DeleteInstance to accept instanceSpec = nil if possible, and in that case just do the volume cleanup.

@mkjpryor
Copy link
Author

Having said all that, at the moment it isn’t even possible to make a cluster without a bastion which seems worse than these edge cases… 🤕

@mkjpryor
Copy link
Author

Looking at the code for DeleteInstance, it doesn’t seem like it uses instanceSpec for much except the name. Maybe we can modify it to just accept a name…

@mdbooth
Copy link
Contributor

mdbooth commented Apr 29, 2022

I'm going to revisit this because idempotent delete is something I'm very keen on for a number of reasons (for example moving the somewhat complicated port cleanup out of create). However, lets not hold up this real fix for that.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 29, 2022
@mkjpryor
Copy link
Author

mkjpryor commented Apr 30, 2022

@mdbooth Thanks for approving. I appreciate that this still leaves some edge cases when bastion instance creation fails that may cause issues, but that is better than not being able to create clusters without bastions at all IMHO. Is it worth just creating an issue for the idempotent instance deletion, which as you say is the real fix, so it doesn’t get forgotten about?

I think it just needs a second review and the hold removing then it can be merged. Any thoughts on when we can do a 0.6.2 containing this change? It is a bit of a showstopper for us using the 0.6.x releases.

@mkjpryor
Copy link
Author

mkjpryor commented May 3, 2022

@mdbooth @seanschneeweiss

Any chance we can get this reviewed, merged and in a point release ASAP? It is a bit of a showstopper so hopefully you agree getting that done quickly is appropriate 🙂

I’m currently running the fix in my staging environment with no issues, but would prefer to run an actual release in prod.

@seanschneeweiss
Copy link
Contributor

/lgtm
/approve

@apricote
Copy link
Member

apricote commented May 3, 2022

/lgtm

@mkjpryor once you remove the hold the PR will be merged. IMO we can cut another patch release in the next days, this is a critical bug.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth, mkjpryor, seanschneeweiss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mdbooth,seanschneeweiss]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mkjpryor
Copy link
Author

mkjpryor commented May 3, 2022

/hold cancel

@seanschneeweiss @apricote Thanks!

As @mdbooth says, this is not the final fix as it leaves some edge cases unsolved when bastion creation fails. However fixing that properly would require a rewrite of the machine deletion to be idempotent I think.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2022
@k8s-ci-robot k8s-ci-robot merged commit b59feea into kubernetes-sigs:main May 3, 2022
@mdbooth
Copy link
Contributor

mdbooth commented May 3, 2022

/cherry-pick release-0.6

@k8s-infra-cherrypick-robot

@mdbooth: new pull request created: #1232

In response to this:

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clusters without bastions do not reconcile
6 participants