Skip to content

Performance test PVC with a single node saturated with workspaces #12744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #7901
kylos101 opened this issue Sep 7, 2022 · 8 comments
Closed
Tracked by #7901

Performance test PVC with a single node saturated with workspaces #12744

kylos101 opened this issue Sep 7, 2022 · 8 comments
Assignees
Labels
aspect: performance anything related to performance aspect: testing Anything related to testing Gitpod manually, automated integration tests or even unit tests

Comments

@kylos101
Copy link
Contributor

kylos101 commented Sep 7, 2022

Is your feature request related to a problem? Please describe

We should test how a node behaves when it is full of workspaces using PVCs, where there is disk activity in the workspaces.

Describe the behaviour you'd like

  1. Start a regular workspace in the cluster using PVC on a single node.
  2. Cordon the other two nodes
  3. Run loadgen, once to fill half of the node, ~9 workspaces
  4. Begin to stop the first loadgen run & start second loadgen run with ~20 workspaces (to fill the initial node and trigger scale-up of a new one)

Questions

  • What IOPS and bandwidth do we achieve on the initial node? How is it different from a normal cluster?
  • How does the regular workspace that was initially started respond as we go through the various phases described above?

Additional context

We're not sure if the IO limiter is needed to cover these disks or not, and whether there will excessive CPU usage as a result of using PVCs for /workspace.

@kylos101 kylos101 changed the title Manual loadgen test for a single node with 17-18 3gb workspaces, how does it perform? Performance test PVC with a single node saturated with workspaces Sep 7, 2022
@kylos101 kylos101 added aspect: testing Anything related to testing Gitpod manually, automated integration tests or even unit tests aspect: performance anything related to performance labels Sep 7, 2022
@sagor999
Copy link
Contributor

It worked without issues.
I did not see any IOPS spikes in grafana as well.
Tested with loadgen with 20 workspaces, then with 30, then with 100. (20 and 30 with stop and start at the same time).

@kylos101
Copy link
Contributor Author

Thanks, @sagor999 ! Could you share What IOPS and bandwidth were achieved? I'm curious to compare with what we have now.

I did not see any IOPS spikes in grafana as well.

Sweet!! Could you share a link to the Grafana and some pictures for posterity? Just super excited, and am glad the test worked out well. 😁

@kylos101
Copy link
Contributor Author

Also, @sagor999 could you add this to our project and update the status? 🙏

@sagor999
Copy link
Contributor

IOPS:
image
Large spikes are from RAID device when node start ups I believe (pulling images my guess).
PVC disks seem to have around 100-200 iops on them (some workspaces were stress-ng).

@sagor999 sagor999 moved this to In Progress in 🌌 Workspace Team Sep 23, 2022
@sagor999 sagor999 self-assigned this Sep 23, 2022
@kylos101
Copy link
Contributor Author

kylos101 commented Oct 3, 2022

Thanks, @sagor999 . It looks like when a node is saturated with PVC workspaces and has high R/W data, that system load also goes up (which is expected because we're not limiting I/O on each PVC).

How is CPU usage when there's heavy disk R/W? Can you share a link to the Node Resource Usage for the above node from Grafana and attach a picture to this issue for posterity? It should still be present because it is from 11 days ago.

Aside from the ☝ concern, I cannot think of anything else warranting keeping this issue open. Once you've got that in here, assuming you see no issue with CPU, please close the issue. However, if you do see an issue, please loop in @Furisto to help examine.

@kylos101
Copy link
Contributor Author

@sagor999 is there anything left to do for stress testing individual nodes?

@sagor999
Copy link
Contributor

I don't think so, plus individual PVCs should be less taxing on the system then massive RAID drives that we have currently. 🙏 (especially that we use software RAID)

@kylos101
Copy link
Contributor Author

Okay, thank you @sagor999 ! Closing. 😁

Repository owner moved this from In Progress to Awaiting Deployment in 🌌 Workspace Team Oct 14, 2022
@kylos101 kylos101 moved this from Awaiting Deployment to Done in 🌌 Workspace Team Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aspect: performance anything related to performance aspect: testing Anything related to testing Gitpod manually, automated integration tests or even unit tests
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants