Skip to content

Feature: Changes to support Hyperdisk Multi-Writer mode by moving to Beta APIs. #1919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 13, 2025

Conversation

sjswerdlow
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:
/kind bug

What this PR does / why we need it:
Hyper disk multi-writer support is now in compute/v1 but the persistent disk csi driver still calls v0.beta which leads a error while creating a hyperdisk with multi-writer mode.

Error:

"balanced-storage": rpc error: code = InvalidArgument desc = CreateVolume failed: rpc error: code = InvalidArgument desc = CreateVolume failed to create single zonal disk pvc-0b65d680-636b-46c6-876d-d3a6c412c3ef: failed to insert zonal disk: unknown Insert disk error:
googleapi: Error 400: Invalid value for field 'resource.multiWriter': 'true'.
Cannot specify the multi writer field for 'hyperdisk-balanced' disks,
please use access mode instead., invalid%!v(MISSING)

Which issue(s) this PR fixes:
Fixes: #1863

Special notes for your reviewer:

Implementation of Proposal V2 from Matt. Changes done on top of #1901.

Does this PR introduce a user-facing change?:

This PR introduces READ_WRITE_MANY AccessMode also know as Multi-writer support for hyperdisks. Refer [here](https://cloud.google.com/compute/docs/disks/sharing-disks-between-vms#hd-multi-writer) for details about the multi-writer mode support, disk and machine types that support it.

Users can enable this mode by setting `access-mode: "READ_WRITE_MANY"` for the corresponding disk section to attach it in multi-writer mode.

karkunpavan and others added 10 commits January 10, 2025 12:56
Added fixes for test key handling.
* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.
Fixing a linting issue.
* Removing alpha disk from tests.

* Changes setup the test to run with hyperdisk extreme.

* Updates the disk type back to balanced, as HDX doesn't support multi writer.

* Moving over to m1 megamem as thats the only type of machine that can support all needed disk types.

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

* Fixing some git oddness

* Fixing some formatting.

* More formatting fixes.

* Hopefully last changes for formatting.

* Fixing linting issue.

* Cleaning up un-used / un-needed GetMultiWriter test function.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 5, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @sjswerdlow. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 5, 2025
@mattcary
Copy link
Contributor

mattcary commented Feb 7, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 7, 2025
@@ -212,11 +212,6 @@ func (cloud *FakeCloudProvider) ValidateExistingDisk(ctx context.Context, resp *
params.DiskType, respType[len(respType)-1])
}

// We are assuming here that a multiWriter disk could be used as non-multiWriter
if multiWriter && !resp.GetMultiWriter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep this check, to validate that we aren't trying to use a non-multiwriter disk when we need a multiwriter one?

@@ -407,11 +395,6 @@ func (cloud *CloudProvider) ValidateExistingDisk(ctx context.Context, resp *Clou
reqBytes, common.GbToBytes(resp.GetSizeGb()), limBytes)
}

// We are assuming here that a multiWriter disk could be used as non-multiWriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, we should keep the multiwriter check.

Hmm, this won't verify that a hyperdisk is multiwriter, however, since we aren't passing in the access mode. It looks like it's just called from the one place in pk/gce-pd-csi-driver/controller.go, so should be easy to add accessMode to the method (this makes it parallel to the insert disk call. Part of me thinks we should push this detail of how multiwriter is declared down into cloud disk, but we can defer that for now, HdHA might clarify the question).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HdHA PR has already been merged by the way: #1915

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2025
@karkunpavan
Copy link
Contributor

/test pull-gcp-compute-persistent-disk-csi-driver-kubernetes-integration

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2025
@mattcary
Copy link
Contributor

/lgtm
/approve

Thanks for working though this!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 12, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattcary, sjswerdlow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2025
@mattcary
Copy link
Contributor

/retest

1 similar comment
@karkunpavan
Copy link
Contributor

/retest

@karkunpavan
Copy link
Contributor

/retest-required

2 similar comments
@karkunpavan
Copy link
Contributor

/retest-required

@karkunpavan
Copy link
Contributor

/retest-required

@sjswerdlow
Copy link
Contributor Author

/retest

@karkunpavan
Copy link
Contributor

/retest-required

@k8s-ci-robot
Copy link
Contributor

@sjswerdlow: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gcp-compute-persistent-disk-csi-driver-kubernetes-integration 10375f2 link true /test pull-gcp-compute-persistent-disk-csi-driver-kubernetes-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mattcary
Copy link
Contributor

The integration test errors are known problems with the snapshot tests. Force-merging.

@mattcary mattcary merged commit 6e43a14 into kubernetes-sigs:master Feb 13, 2025
5 of 7 checks passed
@mattcary
Copy link
Contributor

/cherry-pick release-1.16

@k8s-infra-cherrypick-robot

@mattcary: #1919 failed to apply on top of branch "release-1.16":

Applying: changes to support hyperdisk multi-writer mode
Using index info to reconstruct a base tree...
M	pkg/common/parameters.go
M	pkg/gce-cloud-provider/compute/cloud-disk.go
M	pkg/gce-cloud-provider/compute/gce-compute.go
M	pkg/gce-pd-csi-driver/controller.go
M	test/e2e/tests/single_zone_e2e_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/tests/single_zone_e2e_test.go
Auto-merging pkg/gce-pd-csi-driver/controller.go
Auto-merging pkg/gce-cloud-provider/compute/gce-compute.go
Auto-merging pkg/gce-cloud-provider/compute/cloud-disk.go
Auto-merging pkg/common/parameters.go
Applying: Added fixes for test key handling.
Applying: Multiwriter Test Update (#3)
Using index info to reconstruct a base tree...
M	test/e2e/tests/setup_e2e_test.go
M	test/e2e/tests/single_zone_e2e_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/tests/single_zone_e2e_test.go
Auto-merging test/e2e/tests/setup_e2e_test.go
CONFLICT (content): Merge conflict in test/e2e/tests/setup_e2e_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0003 Multiwriter Test Update (#3)

In response to this:

/cherry-pick release-1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mattcary mattcary mentioned this pull request Feb 13, 2025
dfajmon pushed a commit to dfajmon/gcp-pd-csi-driver that referenced this pull request Feb 14, 2025
Upstream PR: kubernetes-sigs#1919

Added fixes for test key handling.

Multiwriter Test Update (openshift#3)

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

Update parameters.go

Fixing a linting issue.

Karkunpavan (openshift#4)

* Removing alpha disk from tests.

* Changes setup the test to run with hyperdisk extreme.

* Updates the disk type back to balanced, as HDX doesn't support multi writer.

* Moving over to m1 megamem as thats the only type of machine that can support all needed disk types.

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

* Fixing some git oddness

* Fixing some formatting.

* More formatting fixes.

* Hopefully last changes for formatting.

* Fixing linting issue.

* Cleaning up un-used / un-needed GetMultiWriter test function.

Completley removing the multiwriter check from fake-gce.

Changes remove the API version concept from GetDisk and InsertDisk, now just defaults to using the Beta API.

Cleaning up un-used variables.

pkg/gce-cloud-provider/compute/fake-gce.go

Addressing comments around GetMultiWriter checks.

Removing multi-writer as a disk paramter.

Fixing rebasing issues.

Merge branch 'master' into followup

Merge branch 'kubernetes-sigs:master' into hd-mw

Merge branch 'kubernetes-sigs:master' into hd-mw

Merge pull request openshift#1 from karkunpavan/karkunpavan

Added fixes for test key handling.
@tonyzhc
Copy link
Contributor

tonyzhc commented Feb 26, 2025

/cherry-pick release-1.16

@k8s-infra-cherrypick-robot

@tonyzhc: #1919 failed to apply on top of branch "release-1.16":

Applying: changes to support hyperdisk multi-writer mode
Using index info to reconstruct a base tree...
M	pkg/common/parameters.go
M	pkg/gce-cloud-provider/compute/cloud-disk.go
M	pkg/gce-cloud-provider/compute/gce-compute.go
M	pkg/gce-pd-csi-driver/controller.go
M	test/e2e/tests/single_zone_e2e_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/tests/single_zone_e2e_test.go
Auto-merging pkg/gce-pd-csi-driver/controller.go
Auto-merging pkg/gce-cloud-provider/compute/gce-compute.go
Auto-merging pkg/gce-cloud-provider/compute/cloud-disk.go
Auto-merging pkg/common/parameters.go
Applying: Added fixes for test key handling.
Applying: Multiwriter Test Update (#3)
Using index info to reconstruct a base tree...
M	test/e2e/tests/setup_e2e_test.go
M	test/e2e/tests/single_zone_e2e_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/tests/single_zone_e2e_test.go
Auto-merging test/e2e/tests/setup_e2e_test.go
CONFLICT (content): Merge conflict in test/e2e/tests/setup_e2e_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0003 Multiwriter Test Update (#3)

In response to this:

/cherry-pick release-1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
7 participants