Skip to content

failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded #576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ajaykmis opened this issue Aug 11, 2020 · 8 comments · Fixed by #577

Comments

@ajaykmis
Copy link

ajaykmis commented Aug 11, 2020

Hi,

I am trying to create a PVC from a VolumeSnapshot in GKE 1.17. but the PVC always fails the first time - with the error :
Warning ProvisioningFailed 3s pd.csi.storage.gke.io_gke-3292dc0cc953259ae863-6e51-ee44-vm_49f3a555-6e32-44d2-b129-36dc0b8c3a00 failed to provision volume with StorageClass "gce-fast-regional": rpc error: code = DeadlineExceeded desc = context deadline exceeded

It eventually succeeds, but I am trying to understand why it failed the first time.
Type Reason Age From Message
Screen Shot 2020-08-11 at 11 03 49 AM

This is on GCP cluster running "1.17.9-gke.600". I am using regional-pd as a replication type in my storage class.

Appreciate the help.

@msau42
Copy link
Contributor

msau42 commented Aug 11, 2020

The default timeout configured is only 10s, which is very short. But the operations should be idempotent, so subsequent calls should eventually succeed.

@saikat-royc can you verify if the fixes you made to increase the timeout and check the disk status before returning from CreateVolume are available in GKE yet?

@ajaykmis
Copy link
Author

Thanks @msau42 - It's interesting to note that the first time I create fresh PVC (not with snapshot), it always succeeds within 10s. Do you mean creating PVC from snapshot takes longer than 10s, but it takes less time to provision it freshly?

could you point me to the @saikat-royc PR?

@msau42
Copy link
Contributor

msau42 commented Aug 11, 2020

Yes, creating from a snapshot takes longer because it also has to populate the data into the disk.

@saikat-royc
Copy link
Member

saikat-royc commented Aug 11, 2020

Here is the patch to increase timeout for the sidecars: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/542/files
I see, it is available only in the staging-head and rc builds as of now.

As @msau42 pointed out, increasing the timeouts should only lessen the number of retries. The operations are idempotent and timeouts should not affect the functionality of the operation.
We have observed provisioning from snapshot does indeed take in the order of minutes, and is expected.

@ajaykmis
Copy link
Author

Thanks @saikat-royc for the explanation. Any idea when the fix (increased timeout) might be available in main release for GKE?

@saikat-royc
Copy link
Member

request @msau42 to answer the release timeline question

@msau42
Copy link
Contributor

msau42 commented Aug 11, 2020

We are close to cutting a 1.0 release (within a week or two). Once the 1.0 is cut, we will roll it out the new driver to the GKE rapid channel first, and once that looks stable, then to the regular channel. The whole process will take a few weeks.

Do you see any issues with the first timeout? It should resolve itself after a few retries.

@ajaykmis
Copy link
Author

@msau42 : Thanks for the information.

No, the AtachVolume time out is not causing any issues, but just that we see failures in k8 events logs. I look forward to the increased timeout.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants