-
Notifications
You must be signed in to change notification settings - Fork 159
Flaky unit test #1137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @amacaskill |
For zonal->zonal cloning, the zone needs to match, so I think pickZones](
I think the zonal test could also be flakey if we had different test cases, so this change is needed for regional and zonal. The problem is that the volKey is calculated before we determine we are creating the volume with cloning. I think we should check if we are using volume cloning, and what the source volume replication type / zone is before we calculate the volKey. That way, we can do a different calculation of volKey that will follow [volume cloning restrictions](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-cloning#limitations restrictions if we are using cloning. |
When I initially looked at this issue, I thought the volume cloning implementation was wrong which is what made the test flakey, but I think the test is actually working as expected and the parameters of the test are wrong. The request.AccessibilityRequirements is created in the external provisioner in GenerateAccessibilityRequirements. If the selectedNode param passed to For volume cloning in immediate binding mode, users need to set allowedTopologies in their StorageClass to make sure that the correct requisite/preferred topologies get passes to the CSIDriver. If there is some signal within the CreateVolumeRequest that can tell us if the CreateVolume request is for a PVC with Immediate binding mode (and not WaitForFirstConsumer), then we could use the volume cloning fix that I described in my previous commentI. However, I don't think such a signal exists, so I don't think we can do anything here. The fix I made will definitely not work for WaitForFirstConsumer because we need to respect the zone the pod is scheduled to. Therefore, users either need to set node/pod affinity or set allowedTopologies in their StorageClass to ensure they will not get this error. So in summary:
|
Unit tests have been flaky: https://prow.k8s.io/job-history/gs/kubernetes-jenkins/pr-logs/directory/pull-gcp-compute-persistent-disk-csi-driver-unit
Here is an example log:
The text was updated successfully, but these errors were encountered: