-
Notifications
You must be signed in to change notification settings - Fork 115
Manual Scale Test for Data and Control Plane Separation #3011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unassigning myself from this story until we figure out the rollback restart issue of pods during leader election. |
I have collected the results in this report. General overview - with one control plane, I am able to scale to 700 nginx instances without crashes. I tried scaling to 1000 pods with 20 control plane instances and all NGINX pods terminated. I saw only 913 pods become running before they all crashed. Some of the error messages that seemed concerning with regards to lease acquiring deadlocks have been noted in the report. |
Found some issues with Locks in the code that I am fixing that should help with scaling. |
@sjberman, Do we need to do a scale test again after the fixes are done? |
@sindhushiv, already did this manually. Results look much better. No issues seen. |
@sjberman @sindhushiv let me know if you need any help from my side |
As a maintainer of NGF
I want to ensure that when we scale to ~1000 Agent connections to our control plane
So that we can ensure our control plane does not get overwhelmed with connections when deployed with a highly scale data plane.
Acceptance
The text was updated successfully, but these errors were encountered: