Skip to content

🐛 Fix port name after port creation failure #1941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 16, 2024

Conversation

mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Mar 12, 2024

CreatePorts was creating a port name based on the index of the port in the current reconcile. This could be different to the absolute index of the port if ports had been partially created in a previous reconcile.

We fix this by passing all current state into CreatePorts so it can create an absolute index. This also ensures that partially created ports will be persisted on failure so we don't have to rely on adoption in the next reconcile.

Fixes: #1940

This PR includes a second commit which refactors and simplifies how we initialises Bastion resources. The motivation for it was that first commit introduces a bug which causes a panic due to an assumption about state having been initialised. The new commit makes this mistake harder to make. It's worth looking at separately.

@k8s-ci-robot k8s-ci-robot requested a review from EmilienM March 12, 2024 17:00
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from jichenjc March 12, 2024 17:00
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 12, 2024
Copy link

netlify bot commented Mar 12, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 801f5ef
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/65f49cc79b5aab00084bea2a
😎 Deploy Preview https://deploy-preview-1941--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 12, 2024
@@ -58,7 +58,7 @@ func (s *Service) GetPortFromInstanceIP(instanceID string, ip string) ([]ports.P
return s.client.ListPort(portOpts)
}

func (s *Service) CreatePort(eventObject runtime.Object, clusterName string, portName string, portOpts *infrav1.PortOpts, instanceSecurityGroups []string, instanceTags []string) (*ports.Port, error) {
func (s *Service) CreatePort(eventObject runtime.Object, clusterName string, portName string, portOpts *infrav1.PortOpts, defaultSecurityGroups []string, baseTags []string) (*ports.Port, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor notice about the name defaultSecurityGroups. Since there is a default security group inside Openstack this may cause confusion at first glance.

Maybe there is a good reason for not sticking with the managedSecurityGroups name from the openstackmachine_controller?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they're not managed security groups. They're the managed security group if defined + any security groups defined explicitly on the parent object (machine or bastion).

The problem is they really are 'default security groups'. From the existing code:

// inherit port security groups from the instance if not explicitly specified
if len(securityGroups) == 0 {
securityGroups = instanceSecurityGroups
}

So... if the port has no security groups defined, use these ones instead.

I appreciate the overloaded term isn't ideal, though. I'll have a think, but I'll gladly take suggestions. In the absence of anything better, though, this is what I've got.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's internal API anyway, I'm fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we can levar it as is after the explanation

@EmilienM
Copy link
Contributor

/test pull-cluster-api-provider-openstack-e2e-full-test
/lgtm
/hold

Feel free to hold cancel when the upgrade passed and when you think it's ready to be merged. Thanks for that work Matt, sorry I missed it in the initial iteration.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2024
@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 14, 2024

It looks like the full test failed to clean up a couple of ports on deletion. I'm not convinced that was a flake, but running it again to get another data point:

/test pull-cluster-api-provider-openstack-e2e-full-test

Unfortunately, if this is a bug on deletion we don't have controller logs from CI because it happens after the logs are collected.

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 14, 2024

#1949 should fix the panic seen in the upgrade test. I don't understand what's triggering it, though.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 15, 2024
@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 15, 2024

/test pull-cluster-api-provider-openstack-e2e-full-test

1 similar comment
@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 15, 2024

/test pull-cluster-api-provider-openstack-e2e-full-test

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 15, 2024

This PR broke cluster deletion when the cluster had a bastion, and the only test which caught it was the upgrade test. This is a bit worrying.

/test pull-cluster-api-provider-openstack-e2e-full-test

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2024
mdbooth added 2 commits March 15, 2024 18:31
CreatePorts was creating a port name based on the index of the port in
the *current reconcile*. This could be different to the absolute index
of the port if ports had been partially created in a previous reconcile.

We fix this by passing all current state into CreatePorts so it can
create an absolute index. This also ensures that partially created ports
will be persisted on failure so we don't have to rely on adoption in the
next reconcile.
Firstly, we remove the resource reconcile calls from the cluster flow
before calling reconcileNormal/reconcileDelete because the guards around
them and various other guards throughout the code are heavily
inter-depdendent and hard to reason about. Instead, we push them to the
the places we:
* know they are required
* know we are sufficiently initialised that they can work

Firstly we resolve references at the top of reconcileBastion. We know
the cluster has been initialised at this point, so we don't need to
guard against it. This also means that it is always called when entering
that function, so we don't need to guard against it not having been
called during first cluster initialisation.

We also force that function to re-reconcile if it calls deleteBastion(),
because deleteBastion() removes the bastion status. We reconcile again,
so we always know that it is set.

We also add an explicit call to resource reconcile in the
reconcileDelete flow. This is the only place we now need a 'weird' guard
against the cluster network not having been set. We add a comment about
that appropriate to its weirdness.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2024
@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 15, 2024

/test pull-cluster-api-provider-openstack-e2e-full-test

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 15, 2024

Note that #1951 fixes this much more comprehensively by pre-resolving everything about the port prior to creation, including its name.

That PR is based on this one, though, so we still need to merge this first.

@mdbooth
Copy link
Contributor Author

mdbooth commented Mar 15, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 15, 2024
@EmilienM
Copy link
Contributor

Thanks for this work, I'm glad you sorted this out.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit 6b87daf into kubernetes-sigs:main Mar 16, 2024
@mdbooth mdbooth deleted the issue1940 branch March 16, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Port can be created with incorrect name
4 participants