Skip to content

Missing disks causes nodes to stick due to backoff #1029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattcary opened this issue Jul 23, 2022 · 0 comments · Fixed by #1028
Closed

Missing disks causes nodes to stick due to backoff #1029

mattcary opened this issue Jul 23, 2022 · 0 comments · Fixed by #1028

Comments

@mattcary
Copy link
Contributor

Around 07/13 issue started when a Pod attempted to use a non-existent volume we'll denote by X (this was a statically created PD, not dynamically provisioned from the driver). Presumably it existed at some point, because it was correctly attached to a node, and the PV, PVC and volume attachment (VA) still exist.

The VA csi-966b5b879f8fb9137a305680db1aff4e38231718d4dbc7883410d7a58530f161 was created pointing to that volume and a node Y. This attachment failed due to a missing disk error and put a backoff condition on the node.

This pod was deleted, adding a delete timestamp to the VA object. This triggered detach workflow in csi-attacher. Detach keeps failing infinitely because of missing disk error (and keeping the node in a backoff condition).

At the same time another pod & PVC were scheduled to the node, but the repeated detach attempts happen too fast for the attach to ever succeed; instead it kept hitting the backoff.

We've decided on two actions. One, to be tracked with this issue, is to key the backoff by both disk and node. See the PR about to mention this issue.

The other, which will be tracked in a different issue, will be to assume a missing disk has been detached if it remains not found for a long enough time. Watch for the new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant