Skip to content

MountDevice doesn't recover with temp failures #1123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
easeway opened this issue Feb 3, 2023 · 2 comments
Closed

MountDevice doesn't recover with temp failures #1123

easeway opened this issue Feb 3, 2023 · 2 comments
Labels
triage/duplicate Indicates an issue is a duplicate of other open issue.

Comments

@easeway
Copy link

easeway commented Feb 3, 2023

It's on GKE and the CSI driver is deployed by GKE. Some Pods stucks at ContainerCreating state forever and the relevant events show:

MountVolume.MountDevice failed for volume "pvc-xxxxxxx-xxxx-..." : rpc error: code = Internal desc = Error when getting device path: rpc error: code = Internal desc = error verifying GCE PD ("pvc-xxxxxxx-xxxx-...") is attached: failed to find and re-link disk pvc-xxxxxxx-xxxx-... with udevadm after retrying for 3s: failed to trigger udevadm fix of non existent disk for "pvc-xxxxxxx-xxxx-...": udevadm --trigger requested to fix disk pvc-xxxxxxx-xxxx-... but no such disk was found

It seems the driver gave up and the pod stucks forever. This never happened with the old in-tree kubernetes.io/gce-pd provisioner.

@mattcary
Copy link
Contributor

mattcary commented Feb 3, 2023

I suspect this is #608 and the virtio deadlock problem. See the details in the bug about identifying force detach or some other detach the could get the node into a bad state. Also, are you observing this for all mounts on a particular node?

/triage duplicate #608

@mattcary mattcary closed this as completed Feb 3, 2023
@k8s-ci-robot k8s-ci-robot added the triage/duplicate Indicates an issue is a duplicate of other open issue. label Feb 3, 2023
@k8s-ci-robot
Copy link
Contributor

@mattcary: The label(s) triage/#608 cannot be applied, because the repository doesn't have them.

In response to this:

I suspect this is #608 and the virtio deadlock problem. See the details in the bug about identifying force detach or some other detach the could get the node into a bad state. Also, are you observing this for all mounts on a particular node?

/triage duplicate #608

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/duplicate Indicates an issue is a duplicate of other open issue.
Projects
None yet
Development

No branches or pull requests

3 participants