Skip to content

Set the resources in the integration tests properly #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
siegfriedweber opened this issue Mar 5, 2024 · 4 comments
Closed

Set the resources in the integration tests properly #487

siegfriedweber opened this issue Mar 5, 2024 · 4 comments
Assignees
Labels

Comments

@siegfriedweber
Copy link
Member

siegfriedweber commented Mar 5, 2024

The integration tests request and require a lot of CPU resources. If too much is requested then not all pods can be scheduled and the tests fail, if not enough is requested then the start up takes too long, pods are restarted because the liveness probes fail, the timeouts of the tests are exceeded and the tests fail.

The pull request #472 lowered the requested resources, so that the tests work on the AKS cluster but now they seem to fail on the OpenShift cluster.

POD CONTAINER CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
hdfs-test-runner-0 hdfs-test-runner 0m 0m 0Mi 0Mi
hdfs-vector-aggregator-0 vector 0m 0m 0Mi 0Mi
test-hdfs-automatic-log-datanode-default-0 datanode 50m 250m 512Mi 512Mi
test-hdfs-automatic-log-datanode-default-0 vector 250m 500m 128Mi 128Mi
test-hdfs-automatic-log-journalnode-default-0 journalnode 50m 250m 512Mi 512Mi
test-hdfs-automatic-log-journalnode-default-0 vector 250m 500m 128Mi 128Mi
test-hdfs-automatic-log-namenode-default-0 namenode 50m 250m 1024Mi 1024Mi
test-hdfs-automatic-log-namenode-default-0 vector 250m 500m 128Mi 128Mi
test-hdfs-automatic-log-namenode-default-0 zkfc 100m 400m 512Mi 512Mi
test-hdfs-automatic-log-namenode-default-1 namenode 50m 250m 1024Mi 1024Mi
test-hdfs-automatic-log-namenode-default-1 vector 250m 500m 128Mi 128Mi
test-hdfs-automatic-log-namenode-default-1 zkfc 100m 400m 512Mi 512Mi
test-hdfs-custom-log-datanode-default-0 datanode 50m 250m 512Mi 512Mi
test-hdfs-custom-log-datanode-default-0 vector 250m 500m 128Mi 128Mi
test-hdfs-custom-log-journalnode-default-0 journalnode 50m 250m 512Mi 512Mi
test-hdfs-custom-log-journalnode-default-0 vector 250m 500m 128Mi 128Mi
test-hdfs-custom-log-namenode-default-0 namenode 50m 250m 1024Mi 1024Mi
test-hdfs-custom-log-namenode-default-0 vector 250m 500m 128Mi 128Mi
test-hdfs-custom-log-namenode-default-0 zkfc 100m 400m 512Mi 512Mi
test-hdfs-custom-log-namenode-default-1 namenode 50m 250m 1024Mi 1024Mi
test-hdfs-custom-log-namenode-default-1 vector 250m 500m 128Mi 128Mi
test-hdfs-custom-log-namenode-default-1 zkfc 100m 400m 512Mi 512Mi
test-zk-server-default-0 zookeeper 200m 800m 512Mi 512Mi
= = = =
3000m 8400m 9728Mi 9728Mi

The resource requests and limits must be set to proper values or the liveness probes must be adjusted, see #488.

@siegfriedweber
Copy link
Member Author

siegfriedweber commented Mar 5, 2024

Resource consumption on the OpenShift cluster if the two logging tests run in parallel:

NODE CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
openshift412-master-0 2270m (28%) 250m (3%) 9402Mi (33%) 1200Mi (4%)
openshift412-master-1 2245m (28%) 200m (2%) 9387Mi (33%) 1000Mi (3%)
openshift412-master-2 1704m (21%) 200m (2%) 6902Mi (24%) 1000Mi (3%)
openshift412-worker-1 3456m (88%) 7750m (197%) 11714Mi (91%) 9280Mi (72%)
openshift412-worker-2 3344m (85%) 7050m (179%) 11288Mi (87%) 9052Mi (70%)
openshift412-worker-3 3619m (92%) 7700m (196%) 9940Mi (77%) 13288Mi (103%)
= = = =
16638m (46%) 23150m (65%) 58633Mi (47%) 34820Mi (28%)

@siegfriedweber
Copy link
Member Author

Resource consumption on the AKS cluster if the two logging tests run in parallel:

NODE CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
aks-0 3530m (45%) 9200m (117%) 9712Mi (35%) 12258Mi (44%)
aks-1 2832m (36%) 11492m (146%) 6956Mi (25%) 10260Mi (37%)
aks-2 3430m (43%) 8500m (108%) 8560Mi (30%) 11106Mi (40%)
= = = =
9792m (41%) 29192m (124%) 25228Mi (30%) 33624Mi (40%)

@siegfriedweber siegfriedweber moved this from Next to Refinement: In Progress in Stackable Engineering Mar 5, 2024
@razvan
Copy link
Member

razvan commented Mar 5, 2024

Resource consumption on OKD 4.14 with two logging tests running in parallel:

NODE CPU REQUESTS CPU LIMITS MEMORY REQUESTS MEMORY LIMITS
master0 2958m (39%) 500m (6%) 10366Mi (33%) 640Mi (2%)
worker-cm9uu5vw 4004m (53%) 9100m (121%) 11412Mi (36%) 10624Mi (34%)
worker-ibttvxjb 1204m (16%) 1500m (20%) 2964Mi (9%) 2176Mi (7%)
worker-sjmxj2mt 3904m (52%) 8400m (112%) 11028Mi (35%) 10240Mi (33%)
= = = =
* 12070m (40%) 19500m (65%) 35770Mi (28%) 23680Mi (19%)

@siegfriedweber
Copy link
Member Author

Fixed in #491 by adjusting the liveness probes and not the requested resources, see #491 (comment)

@siegfriedweber siegfriedweber moved this from Refinement: In Progress to Development: Done in Stackable Engineering Mar 6, 2024
@lfrancke lfrancke moved this from Development: Done to Done in Stackable Engineering Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

2 participants