MountDevice doesn't recover with temp failures #1123

easeway · 2023-02-03T05:08:11Z

It's on GKE and the CSI driver is deployed by GKE. Some Pods stucks at ContainerCreating state forever and the relevant events show:

MountVolume.MountDevice failed for volume "pvc-xxxxxxx-xxxx-..." : rpc error: code = Internal desc = Error when getting device path: rpc error: code = Internal desc = error verifying GCE PD ("pvc-xxxxxxx-xxxx-...") is attached: failed to find and re-link disk pvc-xxxxxxx-xxxx-... with udevadm after retrying for 3s: failed to trigger udevadm fix of non existent disk for "pvc-xxxxxxx-xxxx-...": udevadm --trigger requested to fix disk pvc-xxxxxxx-xxxx-... but no such disk was found

It seems the driver gave up and the pod stucks forever. This never happened with the old in-tree kubernetes.io/gce-pd provisioner.

The text was updated successfully, but these errors were encountered:

mattcary · 2023-02-03T16:29:44Z

I suspect this is #608 and the virtio deadlock problem. See the details in the bug about identifying force detach or some other detach the could get the node into a bad state. Also, are you observing this for all mounts on a particular node?

/triage duplicate #608

k8s-ci-robot · 2023-02-03T16:29:47Z

@mattcary: The label(s) triage/#608 cannot be applied, because the repository doesn't have them.

In response to this:

I suspect this is #608 and the virtio deadlock problem. See the details in the bug about identifying force detach or some other detach the could get the node into a bad state. Also, are you observing this for all mounts on a particular node?

/triage duplicate #608

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mattcary closed this as completed Feb 3, 2023

k8s-ci-robot added the triage/duplicate Indicates an issue is a duplicate of other open issue. label Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MountDevice doesn't recover with temp failures #1123

MountDevice doesn't recover with temp failures #1123

easeway commented Feb 3, 2023

mattcary commented Feb 3, 2023

k8s-ci-robot commented Feb 3, 2023

MountDevice doesn't recover with temp failures #1123

MountDevice doesn't recover with temp failures #1123

Comments

easeway commented Feb 3, 2023

mattcary commented Feb 3, 2023

k8s-ci-robot commented Feb 3, 2023