Skip to content

One of our service is failing due to out of memory #5973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
sigs3gv opened this issue Mar 19, 2025 · 2 comments
Closed
1 task

One of our service is failing due to out of memory #5973

sigs3gv opened this issue Mar 19, 2025 · 2 comments
Labels
bug This issue is a bug. closed-for-staleness p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Comments

@sigs3gv
Copy link

sigs3gv commented Mar 19, 2025

Bug Description

We are encountering a service failure due to a memory leak.

Image
This graph shows the gradual increase in heap usage for our service. The service eventually OOMs and then k8s restarts, which is why you see a new line coming up after one ends.

Upon investigation, we found that AWS SDK v2 is trying to register connections in a HashMap within IdleConnectionReaper and deregister it later, but heap dump shows a huge retained space not getting picked by the GC.

Image
Above is the screenshot of heapdump visualised in VisualVM. You can see 15k+ objects of PoolingHttpClientConnectionManager, causing 512MB+ of retained space, which could have been garbage collected.

My suspicion is that the deregister method is not getting called once the API call to AWS is complete.

We are using AWS SDK v2 to talk to S3 and EMR Serverless API.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The deregisterConnectionManager method should be called every time the connection manager the task is completed. This would ensure proper memory management by allowing garbage collection to free up memory, preventing memory leaks that ultimately lead to service failures.

Current Behavior

Currently, when the connectionManager is registered, the deregisterConnectionManager method is not called, preventing garbage collection from releasing unused memory. This results in a gradual memory buildup, eventually leading to memory failure. (Reference: [IdleConnectionReaper.java - Line 36](

)).

Reproduction Steps

  1. Create a function that calls registerConnectionManager.
  2. After the connection manager has completed its tasks, verify if the deregisterConnectionManager method is invoked.

Possible Solution

No response

Additional Information/Context

No response

AWS SDK for Java Version

awsV2SdkVers = '2.20.38'

JDK Version

ENV JAVA_VERSION="21.0.6+7-1~20.04.1"

Operating System and Version

Ubuntu 20.04

@sigs3gv sigs3gv added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 19, 2025
@zoewangg
Copy link
Contributor

zoewangg commented Mar 19, 2025

This is likely fixed in #4087, which was released in 2.20.84. Could you try with the latest version?

@zoewangg zoewangg added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 19, 2025
Copy link

It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.

@github-actions github-actions bot added the closing-soon This issue will close in 4 days unless further comments are made. label Mar 30, 2025
@bhoradc bhoradc added the p2 This is a standard priority issue label Apr 1, 2025
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will close in 4 days unless further comments are made. labels Apr 3, 2025
@github-actions github-actions bot closed this as completed Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. closed-for-staleness p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.
Projects
None yet
Development

No branches or pull requests

3 participants