failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded #576

ajaykmis · 2020-08-11T18:02:56Z

Hi,

I am trying to create a PVC from a VolumeSnapshot in GKE 1.17. but the PVC always fails the first time - with the error :
Warning ProvisioningFailed 3s pd.csi.storage.gke.io_gke-3292dc0cc953259ae863-6e51-ee44-vm_49f3a555-6e32-44d2-b129-36dc0b8c3a00 failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded

It eventually succeeds, but I am trying to understand why it failed the first time.
Type Reason Age From Message

This is on GCP cluster running "1.17.9-gke.600". I am using regional-pd as a replication type in my storage class.

Appreciate the help.

msau42 · 2020-08-11T18:09:39Z

The default timeout configured is only 10s, which is very short. But the operations should be idempotent, so subsequent calls should eventually succeed.

@saikat-royc can you verify if the fixes you made to increase the timeout and check the disk status before returning from CreateVolume are available in GKE yet?

ajaykmis · 2020-08-11T18:13:47Z

Thanks @msau42 - It's interesting to note that the first time I create fresh PVC (not with snapshot), it always succeeds within 10s. Do you mean creating PVC from snapshot takes longer than 10s, but it takes less time to provision it freshly?

could you point me to the @saikat-royc PR?

msau42 · 2020-08-11T18:17:48Z

Yes, creating from a snapshot takes longer because it also has to populate the data into the disk.

saikat-royc · 2020-08-11T18:20:31Z

Here is the patch to increase timeout for the sidecars: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/542/files
I see, it is available only in the staging-head and rc builds as of now.

As @msau42 pointed out, increasing the timeouts should only lessen the number of retries. The operations are idempotent and timeouts should not affect the functionality of the operation.
We have observed provisioning from snapshot does indeed take in the order of minutes, and is expected.

ajaykmis · 2020-08-11T20:08:29Z

Thanks @saikat-royc for the explanation. Any idea when the fix (increased timeout) might be available in main release for GKE?

saikat-royc · 2020-08-11T20:55:42Z

request @msau42 to answer the release timeline question

msau42 · 2020-08-11T21:16:34Z

We are close to cutting a 1.0 release (within a week or two). Once the 1.0 is cut, we will roll it out the new driver to the GKE rapid channel first, and once that looks stable, then to the regular channel. The whole process will take a few weeks.

Do you see any issues with the first timeout? It should resolve itself after a few retries.

ajaykmis · 2020-08-13T06:44:53Z

@msau42 : Thanks for the information.

No, the AtachVolume time out is not causing any issues, but just that we see failures in k8 events logs. I look forward to the increased timeout.

Thanks!

saikat-royc mentioned this issue Aug 11, 2020

Promote increase timeout change to stable #577

Merged

k8s-ci-robot closed this as completed in #577 Aug 12, 2020

msau42 mentioned this issue Aug 21, 2020

failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded kubernetes-csi/external-provisioner#462

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded #576

failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded #576

ajaykmis commented Aug 11, 2020 •

edited

Loading

msau42 commented Aug 11, 2020

ajaykmis commented Aug 11, 2020

msau42 commented Aug 11, 2020

saikat-royc commented Aug 11, 2020 •

edited

Loading

ajaykmis commented Aug 11, 2020

saikat-royc commented Aug 11, 2020

msau42 commented Aug 11, 2020

ajaykmis commented Aug 13, 2020

failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded #576

failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded #576

Comments

ajaykmis commented Aug 11, 2020 • edited Loading

msau42 commented Aug 11, 2020

ajaykmis commented Aug 11, 2020

msau42 commented Aug 11, 2020

saikat-royc commented Aug 11, 2020 • edited Loading

ajaykmis commented Aug 11, 2020

saikat-royc commented Aug 11, 2020

msau42 commented Aug 11, 2020

ajaykmis commented Aug 13, 2020

ajaykmis commented Aug 11, 2020 •

edited

Loading

saikat-royc commented Aug 11, 2020 •

edited

Loading