Skip to content

Improve error messages for ControllerExpandVolume / CreateSnapshot of multi-zone PV. #1718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 28, 2024

Conversation

hungnguyen243
Copy link
Contributor

@hungnguyen243 hungnguyen243 commented May 21, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Improve error messages for ControllerExpandVolume / CreateSnapshot of multi-zone PV.

Before:
CreateSnapshot: "Failed to create snapshot: failed to take snapshot of the volume projects/psch-gke-dev/zones/multi-zone/disks/hdml-llama2-70b-hf: "rpc error: code = Unknown desc = CreateSnapshot, failed
to getDisk: googleapi: Error 400: Invalid value for field 'zone': 'multi-zone'. Unknown zone., invalid"
ControllerExpandVolume: "resize volume "my-disk-pv" by resizer "pd.csi.storage.gke.io" failed: rpc error: code = Unknown desc = ControllerExpandVolume failed to resize disk: failed to get disk: googleapi: Error 400: In
valid value for field 'zone': 'multi-zone'. Unknown zone., invalid"

After:
CreateSnapshot: "Snapshots are not supported with the multi-zone PV volumeHandle feature"
ControllerExpandVolume: "Resize operation is not supported with the multi-zone PVC volumeHandle feature. Please re-create the disk from source if you want a larger size."

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Update logging on multi-zone feature support for volume snapshot and resize

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 21, 2024
@k8s-ci-robot k8s-ci-robot requested review from leiyiz and mattcary May 21, 2024 20:53
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 21, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @hungnguyen243. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 21, 2024
@Sneha-at
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 21, 2024
@Sneha-at
Copy link
Contributor

/retest-required

1 similar comment
@hungnguyen243
Copy link
Contributor Author

/retest-required

@amacaskill
Copy link
Member

I checked the e2e snapshot failure, the snapshot operation hadn't returned success before we tried to delete the disk, so that's why we got the operation error: https://screenshot.googleplex.com/7aHcpQdMhw4sBr5

@hungnguyen243
Copy link
Contributor Author

hungnguyen243 commented May 23, 2024

/retest-required

@hungnguyen243
Copy link
Contributor Author

/retest

@Sneha-at
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2024
@@ -1158,6 +1158,11 @@ func (gceCS *GCEControllerServer) CreateSnapshot(ctx context.Context, req *csi.C
return nil, status.Errorf(codes.InvalidArgument, "CreateSnapshot Volume ID is invalid: %v", err.Error())
}

volumeIsMultiZone := isMultiZoneVolKey(volKey)
if gceCS.multiZoneVolumeHandleConfig.Enable && volumeIsMultiZone {
return nil, fmt.Errorf("Snapshots are not supported with the `multi-zone` PV volumeHandle feature.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should return a proper status.Error (eg: InvalidArgument)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also omit the backticks. This will be printed to the terminal, so it won't be markdown formatted


volumeIsMultiZone := isMultiZoneVolKey(volKey)
if gceCS.multiZoneVolumeHandleConfig.Enable && volumeIsMultiZone {
return nil, fmt.Errorf("Resize operation is not supported with the `multi-zone` PVC volumeHandle feature. Please re-create the disk from source if you want a larger size.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think naming this "ControllerExpandVolume" instead of "Resize operation" is actually better. It describes the RPC, as "Resize" is more of a Container Orchestrator term.


volumeIsMultiZone := isMultiZoneVolKey(volKey)
if gceCS.multiZoneVolumeHandleConfig.Enable && volumeIsMultiZone {
return nil, fmt.Errorf("Resize operation is not supported with the `multi-zone` PVC volumeHandle feature. Please re-create the disk from source if you want a larger size.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "volume" instead of "disk". Also, print out the volumeID in the error message

Copy link
Contributor

@pwschuurman pwschuurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests that validate passing in a "multi-zone" volumeID returns the appropriate error (eg: InvalidArgument) for these two RPCs.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 24, 2024
@@ -1158,6 +1158,11 @@ func (gceCS *GCEControllerServer) CreateSnapshot(ctx context.Context, req *csi.C
return nil, status.Errorf(codes.InvalidArgument, "CreateSnapshot Volume ID is invalid: %v", err.Error())
}

volumeIsMultiZone := isMultiZoneVolKey(volKey)
if gceCS.multiZoneVolumeHandleConfig.Enable && volumeIsMultiZone {
return nil, status.Errorf(codes.InvalidArgument, "Snapshots are not supported with the multi-zone PV volumeHandle feature")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also emit the volumeID here?

@@ -255,6 +255,77 @@ func TestCreateSnapshotArguments(t *testing.T) {
}
}

func TestUnsupporteddMultiZoneCreateSnapshot(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/TestUnsupporteddMultiZoneCreateSnapshot/TestUnsupportedMultiZoneCreateSnapshot

@pwschuurman
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2024
@pwschuurman
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hungnguyen243, pwschuurman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note-none Denotes a PR that doesn't merit a release note. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. release-note-none Denotes a PR that doesn't merit a release note. labels May 24, 2024
@hungnguyen243
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit 82c6deb into kubernetes-sigs:master May 28, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants