Skip to content

Update RAID logic and post-RAID validation to integrate Datacache support for GKE nodes #1950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 26, 2025

Conversation

hungnguyen243
Copy link
Contributor

@hungnguyen243 hungnguyen243 commented Feb 24, 2025

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
Update RAIDing logic to fetch requested local SSD count for Datacache from GKE node label, set up RAIDing for Datacache with the corresponding local SSD number, and validate RAID-ing config afterwards.
Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Update RAID logic and post-RAID validation to integrate Data Cache support for GKE nodes

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 24, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 24, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @hungnguyen243. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 24, 2025
@Sneha-at
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 24, 2025
@hungnguyen243 hungnguyen243 force-pushed the release-999.2.new branch 2 times, most recently from 7e5a1e7 to 68e324e Compare February 24, 2025 22:04
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 24, 2025
@hungnguyen243
Copy link
Contributor Author

/retest

@hungnguyen243 hungnguyen243 changed the title update RAIDing logic and validation for Datacache Update RAIDing logic and validation for Datacache Feb 25, 2025
@hungnguyen243 hungnguyen243 changed the title Update RAIDing logic and validation for Datacache Update RAIDing logic and post-RAID validation for Datacache to integrate with GKE nodes Feb 25, 2025
@hungnguyen243 hungnguyen243 changed the title Update RAIDing logic and post-RAID validation for Datacache to integrate with GKE nodes Update RAID logic and post-RAID validation to integrate Datacache support for GKE nodes Feb 25, 2025
@hungnguyen243
Copy link
Contributor Author

/retest

@hungnguyen243
Copy link
Contributor Author

/retest-required

@sunnylovestiramisu
Copy link
Contributor

Please update release note in the PR description to:

Update RAID logic and post-RAID validation to integrate Data Cache support for GKE nodes

The script we use to create a change log for the release is reading these release notes to automatically generate the change list.

@hungnguyen243 hungnguyen243 force-pushed the release-999.2.new branch 2 times, most recently from 477d1cc to f2bb4b3 Compare February 25, 2025 22:19
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 25, 2025
@sunnylovestiramisu sunnylovestiramisu self-assigned this Feb 25, 2025
infoSlice := strings.Split(infoString, " ")

// We want to get the second element in the array, which is the path to the RAIDed device
return infoSlice[1], nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunnylovestiramisu do you think we can add any regex for any other check here to ensure we always pick up the path.
Command output:

$ sudo mdadm --detail --scan
ARRAY /dev/md/kubelet_ephemeral_storage metadata=1.2 name=kubelet_ephemeral_storage UUID=xxxx
ARRAY /dev/md126 metadata=1.2 name=csi-driver-data-cache UUID=xxxx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are greping the "raidedLocalSsdName" will always get 1 array(hopefully):
ARRAY /dev/md126 metadata=1.2 name=csi-driver-data-cache UUID=xxxx

The path itself should be in some format /dev/md* ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the grep 'csi-driver-data-cache' is able to fetch the line, the path should be correct since how else is a RAID array created with the same name but at some random path? A possible error is when the RAID fails and so the grep will not return anything, which is caught before we process the infoString.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might as well be /dev/csi-driver-data-cache for our E2E tests and in certain scenarios :( like if ephemeral storage is not used. @hungnguyen243 to confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our E2E tests, it will be /dev/md/csi-driver-datacache (I double checked on this), so adding a regex check should be fine, but as I mentioned above, it shouldn't be necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of regex check without test coverage anyway, and for this one it seems not necessary to add really.

Copy link
Contributor

@sunnylovestiramisu sunnylovestiramisu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for data cache

infoSlice := strings.Split(infoString, " ")

// We want to get the second element in the array, which is the path to the RAIDed device
return infoSlice[1], nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are greping the "raidedLocalSsdName" will always get 1 array(hopefully):
ARRAY /dev/md126 metadata=1.2 name=csi-driver-data-cache UUID=xxxx

The path itself should be in some format /dev/md* ?

@sunnylovestiramisu
Copy link
Contributor

/lgtm
/approve

We need @mattcary to force merge because the windows test is not fixed yet.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hungnguyen243, sunnylovestiramisu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 26, 2025
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 26, 2025

@hungnguyen243: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019 e33bb0d link false /test pull-gcp-compute-persistent-disk-csi-driver-e2e-windows-2019

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hungnguyen243
Copy link
Contributor Author

/retest-required

@k8s-ci-robot k8s-ci-robot merged commit ed6b156 into kubernetes-sigs:master Feb 26, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants