-
Notifications
You must be signed in to change notification settings - Fork 159
Behaviour of controller.CreateVolume when a Snapshot is Not Ready? #694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Diving into the code here. We see snapshots are assigned like so:
This snapshotID eventually gets passed to:
which calls:
and eventually calls:
So at the end of the day, the snapshotId gets passed to a Google Cloud Disk Create call setting VolumeSource. cloud.waitForZonalOp waits for the operation to complete (up to a maximum of 5 minutes.) What happens when you call a Google Cloud Disk Create with a SnapshotID that is not ready, does it fail, or does it work and eventually creates the volume when the snapshot is ready? Understanding this would be nice, the core question I am trying to ask is basically:
|
It looks like the behaviour, generally, from looking at #482, #541 and #527 is that the csi-sidecar like I discussed in 1) has a default timeout and I am unsure which line it returns an error, is it We want to set some timeouts for bigger PVCs, is the best place to do it in the sidecar and leave the timeouts inside |
The csi external provisioner has a check for the snapshot content being ready before calling the pd specific csi-sidecar - so the call does not reach the pd controller.CreateVolume logic linked above. For code-path, see controller.go lines:
From "Provision" call, state "controller.ProvisioningNoChange" is returned to a separate work item queue handler controller.go. Walking through the handler code, this results in retries as its a non-finalizing error and gets re-queued:
The amount of retries is defined by the external provisioner "DefaultFailedProvisionThreshold". Currently default in the code is 15 - setting this to 0 will make it an indefinite reconciliation. The external provisioner skips this default setting by instantiating the controller.Provisioner separately from that construction call - implicitly setting those values to 0 by not setting them here |
TLDR; you should not have to do anything extra to get your desired behavior, the csi driver should cover your use case. |
Here's the line in csi that sets the threshold to 0: https://github.com/kubernetes-csi/external-provisioner/blob/41af8a3920bf305cfa2da19f87d62d954c37fc98/cmd/csi-provisioner/csi-provisioner.go#L320 So it retries indefinitely. |
@annapendleton @msau42 Thank you so much for the detailed response, that's great news! |
I'm having a lot of trouble discovering the behaviour of CSI when it comes to Snapshots that have a
readyToUse
set to false.Specifically, I have two basic questions:
readyToUse
, what is the behaviour of the CSI driver? From what I have looked at, it seems like https://github.com/kubernetes-csi/external-provisioner#csi-error-and-timeout-handling defines the timing out. By default, it retries every--retry-interval-start
doubling every time until it hits --retry-interval-maxreadyToUse
may take an hour?Is the expected behaviour that the user would manually check the
VolumeSnapshot
and wait until it'sreadyToUse
flag is set to true? So for example, the user would wait 20 minutes, then apply the PVC to the cluster pointing to the Volume Snapshot?The text was updated successfully, but these errors were encountered: