starting with one of the basic tests:
Log:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/sigs.k8s.io_gcp-compute-persistent-disk-csi-driver/788/pull-gcp-compute-persistent-disk-csi-driver-kubernetes-integration/1403383839518101504/

test: Kubernetes e2e suite: External Storage [Driver: csi-gcepd-sc-standard] [Testpattern: Dynamic PV (ext3)] volumes should store data

Persistent Volume: pvc-bfb47f28-5884-427a-b1a9-095692b9fc74
on node: csi-gce-pd-node-94p2r-gce-pd-driver

first writer pod on node-94p2r (node stage/publish success):

I0611 16:44:43.516018       1 utils.go:67] /csi.v1.Node/NodeStageVolume called with request: volume_id:"projects/gce-up-c1-3-glat-up-clu-n/zones/us-central1-b/disks/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount" volume_capability:<mount:<fs_type:"ext3" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1623429450560-8081-pd.csi.storage.gke.io" > 
I0611 16:44:44.018461       1 node.go:287] Successfully found attached GCE PD "pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" at device path /dev/disk/by-id/google-pvc-bfb47f28-5884-427a-b1a9-095692b9fc74.
I0611 16:44:44.018602       1 node.go:74] NodePublishVolume check volume path /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount is mounted false: error <nil>
I0611 16:44:44.029781       1 mount_linux.go:390] Disk "/dev/disk/by-id/google-pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" appears to be unformatted, attempting to format as type: "ext3" with options: [-F -m0 /dev/disk/by-id/google-pvc-bfb47f28-5884-427a-b1a9-095692b9fc74]
I0611 16:44:44.473564       1 mount_linux.go:400] Disk successfully formatted (mkfs): ext3 - /dev/disk/by-id/google-pvc-bfb47f28-5884-427a-b1a9-095692b9fc74 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount
I0611 16:44:44.531676       1 node.go:321] NodeStageVolume succeeded on projects/gce-up-c1-3-glat-up-clu-n/zones/us-central1-b/disks/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74 to /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount
I0611 16:44:44.531704       1 utils.go:72] /csi.v1.Node/NodeStageVolume returned with response: 
I0611 16:44:44.543615       1 utils.go:67] /csi.v1.Node/NodeGetCapabilities called with request: 
I0611 16:44:44.543676       1 utils.go:72] /csi.v1.Node/NodeGetCapabilities returned with response: capabilities:<rpc:<type:STAGE_UNSTAGE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > > capabilities:<rpc:<type:GET_VOLUME_STATS > > 
I0611 16:44:44.554693       1 utils.go:67] /csi.v1.Node/NodePublishVolume called with request: volume_id:"projects/gce-up-c1-3-glat-up-clu-n/zones/us-central1-b/disks/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount" target_path:"/var/lib/kubelet/pods/ebc9bc9b-125d-47a3-a1ea-779a6f96e920/volumes/kubernetes.io~csi/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/mount" volume_capability:<mount:<fs_type:"ext3" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1623429450560-8081-pd.csi.storage.gke.io" >

second validator pod (node stage/publish success):

I0611 16:45:31.512322       1 utils.go:67] /csi.v1.Node/NodeStageVolume called with request: volume_id:"projects/gce-up-c1-3-glat-up-clu-n/zones/us-central1-b/disks/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount" volume_capability:<mount:<fs_type:"ext3" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1623429450560-8081-pd.csi.storage.gke.io" > 
I0611 16:45:32.014601       1 node.go:287] Successfully found attached GCE PD "pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" at device path /dev/disk/by-id/google-pvc-bfb47f28-5884-427a-b1a9-095692b9fc74.
I0611 16:45:32.014660       1 node.go:74] NodePublishVolume check volume path /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount is mounted false: error <nil>
I0611 16:45:32.289570       1 node.go:321] NodeStageVolume succeeded on projects/gce-up-c1-3-glat-up-clu-n/zones/us-central1-b/disks/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74 to /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount
I0611 16:45:32.289608       1 utils.go:72] /csi.v1.Node/NodeStageVolume returned with response: 
I0611 16:45:32.300365       1 utils.go:67] /csi.v1.Node/NodeGetCapabilities called with request: 
I0611 16:45:32.300445       1 utils.go:72] /csi.v1.Node/NodeGetCapabilities returned with response: capabilities:<rpc:<type:STAGE_UNSTAGE_VOLUME > > capabilities:<rpc:<type:EXPAND_VOLUME > > capabilities:<rpc:<type:GET_VOLUME_STATS > > 
I0611 16:45:32.306553       1 utils.go:67] /csi.v1.Node/NodePublishVolume called with request: volume_id:"projects/gce-up-c1-3-glat-up-clu-n/zones/us-central1-b/disks/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74" staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/globalmount" target_path:"/var/lib/kubelet/pods/2e19bbb8-1e74-4d7e-802e-4aaaf64c0a9e/volumes/kubernetes.io~csi/pvc-bfb47f28-5884-427a-b1a9-095692b9fc74/mount" volume_capability:<mount:<fs_type:"ext3" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1623429450560-8081-pd.csi.storage.gke.io" >

Right after that the kubectl exec reported dial timeout:

Jun 11 16:45:35.790: INFO: ExecWithOptions {Command:[/bin/sh -c test -d /opt/0] Namespace:volume-9719 PodName:external-client ContainerName:external-client Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false Quiet:false}
Jun 11 16:45:35.790: INFO: >>> kubeConfig: /root/.kube/config
Jun 11 16:46:05.808: FAIL: "test -d /opt/0" should succeed, but failed with error message "error dialing backend: dial timeout"
stdout: 
stderr: 
Unexpected error:
    <*errors.StatusError | 0xc000c36640>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {
                SelfLink: "",
                ResourceVersion: "",
                Continue: "",
                RemainingItemCount: nil,
            },
            Status: "Failure",
            Message: "error dialing backend: dial timeout",
            Reason: "",
            Details: nil,
            Code: 500,
        },
    }
    error dialing backend: dial timeout
occurred

The persistent volume workflows are working as expected and this seems like a real dial timeout error.
Suggestion from the GKE master team is:
use verbosity level -v6 in the kubectl exec and also investigate konnectivity-server:

it could be one of two things 
1. there's a hightened latency in konnectivity's setup sporadically (leading to the konnectivity. server and clients not establishing connectitions before your test runs and there not being any "backends" to connect to from the konnectivity server's point of view)
2. The konnectivity server/client connection isn't robust

Next steps would be try out the test in my local setup and dump the konnectivity server logs
FYI @mattcary

dial timeouts causing test flakes #789

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions