-
Notifications
You must be signed in to change notification settings - Fork 159
NodeStage on arm64 occasionally failing while running e2e tests #1398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We're encountering similar issues with arm64 (t2a) instances. We have an statefulset with 22 replicas that initially worked well on ARM instances. After 1 day working normally, I find all the pods in
If I change the statefulset definition so pods are scheduled to AMD64 instances (n2 in this case), the same disks are mounted and the same workload start normally.
|
The issue has been root caused to a bug in google_nvme_id introduced in GoogleCloudPlatform/guest-configs#49. The change has been reverted. The affected versions of guest-configs are:
All versions after 20230626.00 should not contain the breaking change. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/close The comment by @msau42 should resolve this issue. |
@mattcary: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We have noticed that - occasionally few of our e2es fail with following error when running on gcp-pd csi driver:
Once the disk enters this state, it doesn't recover and hence the test fails.
The text was updated successfully, but these errors were encountered: