Skip to content

[pvc] potential workaround for mount volume failure #13594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 4, 2022
Merged

Conversation

sagor999
Copy link
Contributor

@sagor999 sagor999 commented Oct 4, 2022

Description

Potential workaround for random mount volume failures.

Related Issue(s)

Fixes #13353

How to test

It is very hard to repro this issue.
So another way to test this PR is to ensure opening workspace works.

Release Notes

none

Documentation

Werft options:

  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-integration-tests=all
    Valid options are all, workspace, webapp, ide

@werft-gitpod-dev-com
Copy link

started the job as gitpod-build-pavel-13353.1 because the annotations in the pull request description changed
(with .werft/ from main)

@sagor999 sagor999 marked this pull request as ready for review October 4, 2022 20:55
@sagor999 sagor999 requested a review from a team October 4, 2022 20:55
@github-actions github-actions bot added the team: workspace Issue belongs to the Workspace team label Oct 4, 2022
@roboquat roboquat merged commit efde427 into main Oct 4, 2022
@roboquat roboquat deleted the pavel/13353 branch October 4, 2022 21:02
// if we failed to mount volume, we need to re-create the pod
// ref: https://github.com/gitpod-io/gitpod/issues/13353
// ref: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/608
if strings.Contains(pod.Status.Reason, "MountVolume.MountDevice failed for volume") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a log when we-manager detects the issue happens? So we could know the number of times we hit this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have it. On line 361.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we haven't reproduced it yet, and also we did not dump the pod YAML manifest when the last time Toru was able to reproduce. I worried that the ws-manager did not recreate the pod because one of the conditions mismatch

if c.Type == corev1.PodScheduled &&
c.Status == corev1.ConditionFalse &&
c.Reason == corev1.PodReasonUnschedulable {
return true
}

@roboquat roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note-none size/XS team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PVC] loadgen testing Pod can't mount Volume
4 participants