Skip to content

chore: Lower requested resources in the logging test #472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

siegfriedweber
Copy link
Member

Description

The logging test creates the following pods:

  • hdfs-test-runner-0
  • hdfs-vector-aggregator-0
  • test-hdfs-automatic-log-datanode-default-0
  • test-hdfs-automatic-log-journalnode-default-0
  • test-hdfs-automatic-log-namenode-default-0
  • test-hdfs-automatic-log-namenode-default-1
  • test-hdfs-custom-log-datanode-default-0
  • test-hdfs-custom-log-journalnode-default-0
  • test-hdfs-custom-log-namenode-default-0
  • test-hdfs-custom-log-namenode-default-1
  • test-zk-server-default-0

They request a lot of CPU resources, especially because a Vector container is started for every HDFS pod.

The logging test usually passes if the other test which is running in parallel, does not request too much CPU. But if two logging tests for different Hadoop versions are running in parallel then not all pods can be scheduled (on our AKS cluster) due to insufficient CPU resources. If pods from both tests cannot be scheduled then the tests are deadlocked and will fail both.

Failed test due to deadlock: https://ci.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/hdfs-operator-it-custom/119/

The solution is to lower the requested resources.

This pull request lowers only the CPU requests for the HDFS containers because this can be easily done in the configuration. Lowering the requests for the Vector containers would require pod overrides (https://github.com/stackabletech/hdfs-operator/blob/23.11.0/rust/operator-binary/src/container.rs#L181-L187). However, this seems to be sufficient.

Successful test with lowered resources: https://ci.stackable.tech/view/02%20Operator%20Tests%20(custom)/job/hdfs-operator-it-custom/121/

@siegfriedweber siegfriedweber requested a review from a team February 7, 2024 16:49
@siegfriedweber siegfriedweber self-assigned this Feb 7, 2024
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the writeup!

@siegfriedweber siegfriedweber added this pull request to the merge queue Feb 8, 2024
Merged via the queue into main with commit 730cc83 Feb 8, 2024
@siegfriedweber siegfriedweber deleted the chore/lower-requested-resources-in-tests branch February 8, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants