-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[PVC] orphan PVC left if the ws-manager unable to start workspace pod #13282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The original solution has side effect that the workspace can not be terminated because we delete PVC object too early which makes the PVC in terminated state (thank to finalizer). However, when snapshot controller takes snapshot, it adds finalizer to PVC object but it fails because the finalizer can't add when object is terminated. So, the snapshot failed. I think it's not blocker because the PVC never be mounted by the workspace pod, and the PVC object is in Pending state. What we left is we need to garbage collection the Pending PVC object. |
I see, thank you, @jenting . |
Another scenario is the PVC in a Pending state within one hour because of the GCP limitation, and then the PVC bound because the limitation is gone. However, the workspace pod timed out, causing it to be in a Terminating state but PVC in a bound state. |
Uh oh!
There was an error while loading. Please reload this page.
Bug description
Two scenarios:
PVC bound, but workspace pod gone: the PVC object exists even though the workspace pod is gone.
The orphan PVC object still exists within the cluster even though the workspace pod is gone. And the ws-manager with errors
"error":"timed out waiting for the condition","instanceId":"ea1e5ba2-54d6-4611-b8a6-ba8e9d791cb7","level":"warning","message":"was unable to start workspace","pod":"ws-ea1e5ba2-54d6-4611-b8a6-ba8e9d791cb7"
PVC bound and workspace pod Terminating: the PVC state from pending -> bound after one hour because of the GCP limitation, but the workspace pod timed out. The workspace pod is Terminating, and PVC bound.
Steps to reproduce
When running the loadgen with 200 workspaces simultaneously (100 regular workspaces + 100 regular workspaces + PVC). After testing is done, there is some orphan PVC left, and the ws-manager does not handle it. Reference code
gitpod/components/ws-manager/pkg/manager/manager.go
Line 338 in ba74cdb
[PVC] orphan PVC left if the ws-manager unable to start workspace pod #13282 (comment)
Workspace affected
No response
Expected behavior
The workspace pod and PVC object should be removed.
Example repository
No response
Anything else?
#7901
The text was updated successfully, but these errors were encountered: