Skip to content

Filter multiattach errors #1559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 2, 2024
Merged

Conversation

mattcary
Copy link
Contributor

@mattcary mattcary commented Jan 2, 2024

/kind bug

What this PR does / why we need it:
User misconfiguration causing multiattach errors clouds up our SLO.

Filter user misconfigured multiattach errors.

/assign @msau42

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 2, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 2, 2024
@@ -374,6 +377,17 @@ func isContextError(err error) (codes.Code, error) {
return codes.Unknown, fmt.Errorf("Not a context error: %w", err)
}

// isUserMultiAttachError returns an InvalidArgument if the error is
// multi-attach detected from the API server. If we get this error from the API
// server, it means that the kubelet doesn't know about the multiattch so it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible there could be a race condition in K8s that also triggers this.

For example, with StatefulSet, the replacement Pod is created with the same name when the old Pod is deleted. Pod deletion is blocked on pod-volume unmounting, but not node-level unmount or detach. So a replacement Pod can be created before we have successfully detached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in that cause the kubelet knows the volume is still attached and so the controller will figure out not to attach? https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/reconciler/reconciler.go#L341

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that the only time I've seen this error from GCP is when the user has made two static PVs that refer to the same disk --- at least that's the case in the current SLOs that are firing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, in the race condition I am thinking of, ADC prevents the attach call from getting down to the CSI driver. So filtering the error at the CSI driver level is fine.

@msau42
Copy link
Contributor

msau42 commented Jan 2, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattcary, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit bce4256 into kubernetes-sigs:master Jan 2, 2024
@mattcary
Copy link
Contributor Author

mattcary commented Jan 3, 2024

/cherry-pick release-1.12

@k8s-infra-cherrypick-robot

@mattcary: new pull request created: #1560

In response to this:

/cherry-pick release-1.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants