-
Notifications
You must be signed in to change notification settings - Fork 159
Separate user errors from internal errors #1092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate user errors from internal errors #1092
Conversation
Hi @sunnylovestiramisu. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
pkg/gce-pd-csi-driver/controller.go
Outdated
@@ -212,7 +212,11 @@ func (gceCS *GCEControllerServer) CreateVolume(ctx context.Context, req *csi.Cre | |||
|
|||
volumeID, err := common.KeyToVolumeID(volKey, gceCS.CloudProvider.GetDefaultProject()) | |||
if err != nil { | |||
return nil, status.Errorf(codes.Internal, "Failed to convert volume key to volume ID: %v", err) | |||
klog.Errorf("Failed to convert volume key to volume ID: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so much boilerplate for every error. It will be hard to be consistent about all of this in the future. Instead of IsUserError could we have something like CodeForError so the whole thing looks like the following?
if err != nil {
return nil, status.Errorf(CodeForError(err), "Failed to convert volume key to volume ID: %v", err)
}
That does mean that we don't get the klog---but we weren't getting that before anyway. So maybe we could have
if err != nil {
return nil, LoggedError("Failed to convert volume key to volume ID: %w", err)
}
and then LoggedError will klog as well as produce the status.Errorf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything in go that is similar to java's global exception handler style at all? https://stackoverflow.com/questions/48785878/spring-boot-intercept-all-exception-handlers
I looked at the codebase most of the error handling seems very boilerplate compared to the example in java.
Here we want to avoid printing error in %v so that the error object does not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golang error handling is crap as far as I can tell---in addition to all the boilerplate, go's aversion to stack traces mean you never really know where an error originally came from.
In particular I like doing a klog in addition to an error return, because the klog gives you generally better information about where the error came from---but the error return is necessary as well because that will get plumbed into an RPC reply which will give clues in the sidecars or kubelet logs.
Maybe if LoggedError(msg, err) looks like { klog(msg, err); return status.Errorf(code, err)}, and we use %v in msg, then it will be enough? We don't care about the %v conversion in the klog, as we only human-read that, it's the status.Errorf call that's important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, if we are passing the error as a return value, I prefer to leave it up to the caller to log and only log if we are going to swallow an error and not pass it up. That helps avoid duplicate logging of errors at multiple levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I've had more problems with seeing an error up high in the stack, and not being able to trace back to where it was emitted, than any ambiguities around duplicated messages in the logs. That may be the style in go but if it's possible to push back against it, I'd like to....
Hey there! Just letting you know that I have just fixed and re-enabled the sanity tests for PD in go/pdcsi-oss-driver/issues/990 so if you experience failures in your sanity test PR gate build because you're on an old branch before the tests were fixed, merging the latest commits from master into this branch it should make them pass. Let me know if you have any questions! |
716b1da
to
1809ff1
Compare
/retest |
/retest pull-gcp-compute-persistent-disk-csi-driver-sanity I see the fix is in your history of the branch so this is concerning. The error is coming from the backoff condition threshold so wondering if 1ms could cause flakes. Retest should let us know. |
@judemars: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Woops: /test pull-gcp-compute-persistent-disk-csi-driver-sanity |
Failed again. Will try again: /test pull-gcp-compute-persistent-disk-csi-driver-sanity |
kubernetes/test-infra#28158 should disable the sanity PR gate, sorry for the hassle! |
/assign @mattcary |
pkg/gce-pd-csi-driver/node.go
Outdated
*/ | ||
return true | ||
if err != nil { | ||
klog.V(4).Infof("NodePublishVolume check volume path %s is mounted %t: error %v", path, !notMnt, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do
errInfo = "nil"
if err != nil {
errInfo = err.Error()
}
klog.V(4).Infof("NodePublishVolume check volume path %s is mounted %t: error %v", path, !notMnt, errInfo)
to log everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do a %v on err does that print out the same thing as err.Error()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we print out %v the error cannot get unwrap properly anymore. That is what we are trying to solve?
336c354
to
602b3d0
Compare
pkg/gce-pd-csi-driver/controller.go
Outdated
} | ||
default: | ||
return nil, status.Error(codes.InvalidArgument, fmt.Sprintf("CreateVolume replication type '%s' is not supported", params.ReplicationType)) | ||
} | ||
|
||
ready, err := isDiskReady(disk) | ||
if err != nil { | ||
return nil, status.Error(codes.Internal, fmt.Sprintf("CreateVolume disk %v had error checking ready status: %v", volKey, err)) | ||
return nil, status.Error(codes.Internal, fmt.Sprintf("CreateVolume disk %v had error checking ready status: %v", volKey, err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we LoggedError here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isDiskReady does not throw user error we defined in this pr:
func isDiskReady(disk *gce.CloudDisk) (bool, error) {
status := disk.GetStatus()
switch status {
case "READY":
return true, nil
case "FAILED":
return false, fmt.Errorf("Disk %s status is FAILED", disk.GetName())
case "CREATING":
klog.V(4).Infof("Disk %s status is CREATING", disk.GetName())
return false, nil
case "DELETING":
klog.V(4).Infof("Disk %s status is DELETING", disk.GetName())
return false, nil
case "RESTORING":
klog.V(4).Infof("Disk %s status is RESTORING", disk.GetName())
return false, nil
default:
klog.V(4).Infof("Disk %s status is: %s", disk.GetName(), status)
}
return false, nil
}
Do we still want to use the LoggedError(which is logging error for user error case)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, then you're right it's not necessary.
It would be nice to have a more uniform location to log errors so its clear what is available for debugging. But that's out of scope here.
} | ||
return nil, status.Errorf(codes.Internal, "ControllerUnpublishVolume error repairing underspecified volume key: %v", err) | ||
return nil, LoggedError("ControllerUnpublishVolume error repairing underspecified volume key: ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we logging only here, and not when the volume is not found? Probably we should log everything (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add logging for notFound. I was trying to keep the pr focused. Initially this pr is just fix unwrap error. But at this point it is a blend of adding logs + fixing error unwrap + fixing status.Error to status.Errorf(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay i added the change for fixing status.Error to status.Errorf as well in this pr but committed it in a second commit. It looks much simpler :D
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattcary, sunnylovestiramisu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/release-note-none |
…k-of-#1092-upstream-release-1.7 Automated cherry pick of #1092: Separate user errors from internal errors
/cherry-pick release-1.10 |
@judemars: #1092 failed to apply on top of branch "release-1.10":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Exclude user errors for googleapis from codes.internal errors for better error categorization
Conversation from #1084
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
You are reviewing three different things:
Does this PR introduce a user-facing change?:
NONE