Skip to content

Filter multiattach errors #1559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions pkg/common/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,9 @@ func CodeForError(sourceError error) codes.Code {
return codes.Internal
}

if code, err := isUserMultiAttachError(sourceError); err == nil {
return code
}
if code, err := existingErrorCode(sourceError); err == nil {
return code
}
Expand Down Expand Up @@ -374,6 +377,17 @@ func isContextError(err error) (codes.Code, error) {
return codes.Unknown, fmt.Errorf("Not a context error: %w", err)
}

// isUserMultiAttachError returns an InvalidArgument if the error is
// multi-attach detected from the API server. If we get this error from the API
// server, it means that the kubelet doesn't know about the multiattch so it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible there could be a race condition in K8s that also triggers this.

For example, with StatefulSet, the replacement Pod is created with the same name when the old Pod is deleted. Pod deletion is blocked on pod-volume unmounting, but not node-level unmount or detach. So a replacement Pod can be created before we have successfully detached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in that cause the kubelet knows the volume is still attached and so the controller will figure out not to attach? https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/reconciler/reconciler.go#L341

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that the only time I've seen this error from GCP is when the user has made two static PVs that refer to the same disk --- at least that's the case in the current SLOs that are firing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, in the race condition I am thinking of, ADC prevents the attach call from getting down to the CSI driver. So filtering the error at the CSI driver level is fine.

// due to user configuration.
func isUserMultiAttachError(err error) (codes.Code, error) {
if strings.Contains(err.Error(), "The disk resource") && strings.Contains(err.Error(), "is already being used") {
return codes.InvalidArgument, nil
}
return codes.Unknown, fmt.Errorf("Not a user multiattach error: %w", err)
}

func existingErrorCode(err error) (codes.Code, error) {
if err == nil {
return codes.Unknown, fmt.Errorf("null error")
Expand Down
33 changes: 33 additions & 0 deletions pkg/common/utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1016,6 +1016,11 @@ func TestCodeForError(t *testing.T) {
inputErr: nil,
expCode: codes.Internal,
},
{
name: "user multiattach error",
inputErr: fmt.Errorf("The disk resource 'projects/foo/disk/bar' is already being used by 'projects/foo/instances/1'"),
expCode: codes.InvalidArgument,
},
}

for _, tc := range testCases {
Expand Down Expand Up @@ -1077,6 +1082,34 @@ func TestIsContextError(t *testing.T) {
}
}

func TestIsUserMultiAttachError(t *testing.T) {
cases := []struct {
errorString string
expectedCode codes.Code
expectCode bool
}{
{
errorString: "The disk resource 'projects/foo/disk/bar' is already being used by 'projects/foo/instance/biz'",
expectedCode: codes.InvalidArgument,
expectCode: true,
},
{
errorString: "The disk resource is ok!",
expectCode: false,
},
}
for _, test := range cases {
code, err := isUserMultiAttachError(fmt.Errorf(test.errorString))
if test.expectCode {
if err != nil || code != test.expectedCode {
t.Errorf("Failed with non-nil error %v or bad code %v: %s", err, code, test.errorString)
}
} else if err == nil {
t.Errorf("Expected error for test but got none: %s", test.errorString)
}
}
}

func TestIsValidDiskEncryptionKmsKey(t *testing.T) {
cases := []struct {
diskEncryptionKmsKey string
Expand Down