-
Notifications
You must be signed in to change notification settings - Fork 159
change GetDisk error reporting to temporary in CreateVolume codepath #1558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mattcary addressed the comments, can you take another look |
@@ -324,7 +324,8 @@ func (gceCS *GCEControllerServer) CreateVolume(ctx context.Context, req *csi.Cre | |||
existingDisk, err := gceCS.CloudProvider.GetDisk(ctx, gceCS.CloudProvider.GetDefaultProject(), volKey, gceAPIVersion) | |||
if err != nil { | |||
if !gce.IsGCEError(err, "notFound") { | |||
return nil, common.LoggedError("CreateVolume, failed to getDisk when validating: ", err) | |||
// failed to GetDisk, however the Disk may already be created, the error code should be non-Final | |||
return nil, common.LoggedError("CreateVolume, failed to getDisk when validating: ", status.Error(codes.Unavailable, err.Error())) | |||
} | |||
} | |||
if err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I hate that you can't comment on lines outside the PR...)
On 345 there's a !ready check that returns an internal error code --- we should probably turn this into a temporary error?
You know, I think this is going to screw up our slo filtering :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and on line 423
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(offline discussion: our slo filtering will be fine)
/lgtm thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: leiyiz, mattcary The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-1.12 |
@mattcary: new pull request created: #1600 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.11 |
/cherry-pick release-1.10 |
@mattcary: new pull request created: #1601 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mattcary: new pull request created: #1602 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.9 |
/cherry-pick release-1.8 |
@mattcary: new pull request created: #1603 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mattcary: #1558 failed to apply on top of branch "release-1.8":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Cherry-pick #1558 to release-1.8
What type of PR is this?
/kind bug
What this PR does / why we need it:
When we do CreateVolume, the csi provisioner expects either "final" or "temporary" errors, if we report final error when disk creation could be ongoing or the disk is already created, then a PVC deletion before successful volume creation could lead to disk leakage as DeleteVolume won't happen.
The temporary errors are defined here: https://github.com/kubernetes-csi/external-provisioner/blob/master/pkg/controller/controller.go#L1904
This PR changes all the GCE API calls in
CreateVolume
code path that potentially happens AFTERinsertDisk
API calls happened to report a non-final error.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: