Skip to content

bug: failed instances with no port breaks the whole CAPO controller #1805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
okozachenko1203 opened this issue Dec 28, 2023 · 0 comments · Fixed by #1818
Closed

bug: failed instances with no port breaks the whole CAPO controller #1805

okozachenko1203 opened this issue Dec 28, 2023 · 0 comments · Fixed by #1818
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@okozachenko1203
Copy link
Contributor

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
A vm instance(corresponding to an OpenstackMachine CR) creation failed by some reason and there is no vif binded on that instance. That OpenstackMachine is marked as unhealthy and capo-controller tries to replace it. During the deletion reconcile loop, there is a task to remove ports binded on the instance. The port deletion task is failing with the following errors in capo-controller.

I1218 12:40:46.844270       1 controller.go:114] "Observed a panic in reconciler: runtime error: index out of range [0] with length 0" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="magnum-system/kube-a1d9n-default-worker-infra-t4kdt-f9xfs" namespace="magnum-system" name="kube-a1d9n-default-worker-infra-t4kdt-f9xfs" reconcileID=67e5c4eb-0d10-4d4f-ad5d-acd383736303
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0
 
goroutine 460 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:115 +0x1fa
panic({0x19480c0, 0xc0014870f8})
        /usr/local/go/src/runtime/panic.go:884 +0x212
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/networking.(*Service).GarbageCollectErrorInstancesPort(0xc008e98f40, {0x1d09a90, 0xc00b825900}, {0xc000b7ecc0, 0x2b}, {0xc001cdc200, 0x1, 0x0?})
        /workspace/pkg/cloud/services/networking/port.go:318 +0x249
sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute.(*Service).DeleteInstance(0xc000c99800, 0xc00af36280?, {0x1d09a90, 0xc00b825900}, 0xc008e988e0, 0xc00067e690)
        /workspace/pkg/cloud/services/compute/instance.go:620 +0x60c
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileDelete(0x1d1cfa0?, {0x1d1de00, 0xc001f53a40}, 0xc007fa9520, 0xc0039b8000, 0xc00af36280, 0xc00b825900)
        /workspace/controllers/openstackmachine_controller.go:278 +0x61d
sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).Reconcile(0xc00048c360, {0x1d19838, 0xc00aa251d0}, {{{0xc000c00e10?, 0x10?}, {0xc000b7ecc0?, 0x40dc07?}}})
        /workspace/controllers/openstackmachine_controller.go:150 +0xa8d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1d19838?, {0x1d19838?, 0xc00aa251d0?}, {{{0xc000c00e10?, 0x17a1c00?}, {0xc000b7ecc0?, 0x10?}}})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0009780a0, {0x1d19790, 0xc00053e080}, {0x18bd520?, 0xc000bd2960?})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314 +0x3a5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0009780a0, {0x1d19790, 0xc00053e080})
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x333

This just one failed OpenstackMachine breaks the whole capo-controller now. It ends up to crashloopbackoff of the capo-controller and causes validation webhook failures for operations against Openstack related CRs.

What did you expect to happen:
The failed instances with no port should be deleted without any issue.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): 0.9.0 >=
  • Cluster-API version: 1.5.x
  • OpenStack version: stable/zed
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version): 1.27
  • OS (e.g. from /etc/os-release): Ubuntu 22.04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
2 participants