Skip to content

Possible Memory Leak - Connection Reaper #1679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mtamassia opened this issue Mar 3, 2020 · 10 comments
Closed

Possible Memory Leak - Connection Reaper #1679

mtamassia opened this issue Mar 3, 2020 · 10 comments
Assignees
Labels
closing-soon This issue will close in 4 days unless further comments are made. investigating This issue is being investigated and/or work is in progress to resolve the issue. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Comments

@mtamassia
Copy link

We have an application that creates and destroys EC2, S3, and several other SDK clients.

We build clients within a thread and close() the clients within the thread. Example:

Ec2Client ec2Client = Ec2Client.builder().region(Region.of(currentRegion)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

// DO SOMETHING

ec2Client.close();

Memory consumption grows over time and here is a link to the memory heap dump, which is self explicatory: https://heaphero.io/my-heap-report.jsp?p=YXJjaGl2ZWQvMjAyMC8wMy8yLy0tbWVtb3J5LnppcC0xOS01NC0zMi5qc29uLS0=

Expected Behavior

client close() should not be leaving resources behind

Current Behavior

Threads and their associated data should be removed upon calling a client close() method

Your Environment

  • OpenJDK Runtime Environment (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1)
  • SDK 2.10.77
@dagnir
Copy link
Contributor

dagnir commented Mar 4, 2020

Thanks for the report. Looks like the connection managers are not being removed from the map during client close shutdown during client close.

Looks like I spoke too soon; looking over the code again, everything seems in order. And doing a simple test continuously creating a client, making a request, and closing it keeps the reaper map size at 0. Do you have a repro case that we can run?

@dagnir dagnir added bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed bug This issue is a bug. labels Mar 4, 2020
@debora-ito debora-ito added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Mar 5, 2020
@mtamassia
Copy link
Author

Thank you for the quick response @dagnir

I have reviewed our code and found a couple instances where close() was not being properly called, however after the changes, we still see the memory buildup as you can see on the new report below. BTW, this time I forced a GC, just for sanity, before doing the heap dump.

https://heaphero.io/my-heap-report.jsp?p=YXJjaGl2ZWQvMjAyMC8wMy81Ly0tbWVtb3J5My5kdW1wLmd6LTE2LTQ0LTMwLmpzb24tLQ==

Not sure if it matters but here are the SDK clients we are using and how we are instantiating each of them.

SecretsManagerClient smClient = SecretsManagerClient.builder().credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

Ec2Client ec2Client = Ec2Client.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

PricingClient pClient = PricingClient.builder().region(Region.US_EAST_1).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

RdsClient rdsClient = RdsClient.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

CloudTrailClient client = CloudTrailClient.builder().credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).overrideConfiguration(ClientOverrideConfiguration.builder().retryPolicy(RetryPolicy.builder().throttlingBackoffStrategy(BackoffStrategy.defaultStrategy()).build()).build()).build();

ElasticLoadBalancingV2Client client = ElasticLoadBalancingV2Client.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

DirectConnectClient client = DirectConnectClient.builder().region(Region.of(currentRegion)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

S3Client client = S3Client.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

EfsClient client = EfsClient.builder().region(Region.of(currentRegion)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

@debora-ito debora-ito removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Mar 5, 2020
@zoewangg
Copy link
Contributor

I tried creating multiple sdk clients, making api calls, and then closing the clients but was not able to reproduce the issue. (verified that IdleConnectionReaper#connectionManagers was empty in the end)

Could you provide a complete repro case?

@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. closing-soon This issue will close in 4 days unless further comments are made. labels Mar 12, 2020
@debora-ito
Copy link
Member

We were not able to reproduce it, closing this due to no response. Please feel free to reopen if you are still experiencing this issue and can provide us a complete repro case.

@gdhananjay
Copy link

gdhananjay commented Apr 12, 2020

@debora-ito , @dagnir , @zoewangg, @mtamassia

I am facing same issue , I also suspect that there is possible memory leak in Connection Repeater.
My application used to run pretty well for 1 week and then it was crashed because of Out Of memory.
I attached Jprofiler and it seems connectionrepeater was not freeing up.
Please look at below screenshot:

Screenshot 2020-04-12 at 10 29 01 AM

further drilling this heavy object i saw thousands of below instances,

Screenshot 2020-04-12 at 10 30 36 AM

And here is path to GC root to those thousands of objects,
Screenshot 2020-04-12 at 10 31 05 AM

My application is simple, it creates thread => create credential provide as shown in below code => Collect some data from cloudwatch using this credential provider => and then thread dies.

Code to create credential provider :

public static AwsCredentialsProvider getRoleCredentialsProvider(String _roleArn, String _roleSessionName,
			String _externalId, String _region) {

		AssumeRoleRequest assumeRoleRequest = null;

		if (_externalId == null || _externalId.isEmpty()) {
			assumeRoleRequest = AssumeRoleRequest.builder().roleArn(_roleArn).roleSessionName(_roleSessionName)
					.durationSeconds(900).build();
		} else {
			assumeRoleRequest = AssumeRoleRequest.builder().roleArn(_roleArn).externalId(_externalId)
					.roleSessionName(_roleSessionName).durationSeconds(900).build();
		}

		StsClient stsClient = null;

		// AWS recommends using Regional AWS STS endpoints hence region as input.
		if (_region == null || _region.isEmpty()) {
			stsClient = StsClient.builder().region(Region.US_EAST_1).build();
		} else {
			stsClient = StsClient.builder().region(Region.of(_region)).build();
		}

		return StsAssumeRoleCredentialsProvider.builder().stsClient(stsClient).refreshRequest(assumeRoleRequest)
				.asyncCredentialUpdateEnabled(true).build();
	}

Code in finally block of my application to close this provider:

			// Close StsAssumeRoleCredentialsProvider.
			if(credentialProvider != null && credentialProvider instanceof StsAssumeRoleCredentialsProvider) {
				((StsAssumeRoleCredentialsProvider)credentialProvider).close();
			}

I can easily reproduce this issue, As jprofiler is running live it clearly show memory ++ because of these objects.

I am using below SDK version

		<dependency>
  			<groupId>software.amazon.awssdk</groupId>
 			 <artifactId>sts</artifactId>
 			 <version>2.5.40</version>
		</dependency> 

Please let me know if you need any additional info.

@zoewangg
Copy link
Contributor

Hi @Dhananjay-Ghongane closing the credential provider itself does not close the stsClient.

So you need to close the stsClient too. The reason that the stsClient is not closed automatically is because it's provided by customers.

@gdhananjay
Copy link

@zoewangg ,
Thanks, That should work. My jobs are short lived so, I may not want backend thread to periodically updated credentials. I am not using StsAssumeRoleCredentialsProvider, simply calling AssumeRole => take credentials => Use it and repeat the same.

@mtamassia
Copy link
Author

@zoewangg I think what you posted above might resolve my issue too.

Our code had an implicit credentials provider:

Ec2Client ec2Client = Ec2Client.builder().region(Region.of(this.region)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();

...

ec2Client.close();

so based on your message above, we changed it to this:

InstanceProfileCredentialsProvider credProvider = InstanceProfileCredentialsProvider.builder().build();
Ec2Client ec2Client = Ec2Client.builder().region(Region.of(this.region)).credentialsProvider(credProvider).build();

...

ec2Client.close();
credProvider.close();

In short, we are now closing the credentials provider. I have not tested this at scale but do you think that could explain the leak on our end?

@zoewangg
Copy link
Contributor

@mtamassia I don't think not closing InstanceProfileCredentialProvider could cause IdleConnectionReaper memory leak since it's not using IdleConnectionReaper at all. Also, asyncCredentialUpdateEnabled is false by default, so there's really nothing to be closed in InstanceProfileCredentialProvider.

Is it possible that some SDK client were not closed properly (failed to be closed silently) in your use case? Can you enable DEBUG log level to see if there were any error messages?

It should be something like this:

logger.debug("Ignore failure in closing the Closeable", ex);

@BartXZX
Copy link

BartXZX commented Nov 13, 2020

Hi @zoewangg , I'm running into a similar issue. The only difference is that I am using the default credential provider chain, which in turn resolves to the StsWebIdentityCredentialsProvider. After a lot of debugging I figured out that closing the client does not close the STS client created by the StsWebIdentityCredentialsProvider when it is closed.

TLDR;
Provider chain in Lazy<T> never gets closed.

Long version;
DefaultCredentialsProvider#close calls LazyAwsCredentialsProvider#close, which calls IoUtils.closeIfCloseable, but the provided delegate is not an AutoClosable type. It is actually a Lazy<AwsCredentialsProvider>, and Lazy<T> does not implement AutoClosable. So the chain (and consequently the STS client) in the delegate field never receives a close() call.

Would you like me to open a new issue?

Screenshot of the debugger
image

Stack dump from IntelliJ

closeIfCloseable:89, IoUtils (software.amazon.awssdk.utils) [2]
close:50, LazyAwsCredentialsProvider (software.amazon.awssdk.auth.credentials.internal)
close:109, DefaultCredentialsProvider (software.amazon.awssdk.auth.credentials)
closeQuietly:70, IoUtils (software.amazon.awssdk.utils)
closeIfCloseable:87, IoUtils (software.amazon.awssdk.utils) [1]
lambda$close$0:86, AttributeMap (software.amazon.awssdk.utils)
accept:-1, 912038923 (software.amazon.awssdk.utils.AttributeMap$$Lambda$1914)
forEach:981, HashMap$Values (java.util)
close:86, AttributeMap (software.amazon.awssdk.utils)
close:79, SdkClientConfiguration (software.amazon.awssdk.core.client.config)
close:80, HttpClientDependencies (software.amazon.awssdk.core.internal.http)
close:73, AmazonSyncHttpClient (software.amazon.awssdk.core.internal.http)
close:118, BaseSyncClientHandler (software.amazon.awssdk.core.internal.handler)
close:12198, DefaultIamClient (software.amazon.awssdk.services.iam)
discover:37, IamServiceRoleAwsDiscovery (com.planonsoftware.cloud.library.discovery.iamrole)
provideParameters:63, PlanonEnvironmentActivitiesImpl (com.planonsoftware.cloud.pco.orchestrator.controllers.planonenvironment.activities)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:567, Method (java.lang.reflect)
execute:214, POJOActivityTaskHandler$POJOActivityImplementation (com.uber.cadence.internal.sync)
handle:190, POJOActivityTaskHandler (com.uber.cadence.internal.sync)
handle:175, ActivityWorker$TaskHandlerImpl (com.uber.cadence.internal.worker)
handle:146, ActivityWorker$TaskHandlerImpl (com.uber.cadence.internal.worker)
lambda$process$0:71, PollTaskExecutor (com.uber.cadence.internal.worker)
run:-1, 640627828 (com.uber.cadence.internal.worker.PollTaskExecutor$$Lambda$1491)
runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
run:830, Thread (java.lang)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will close in 4 days unless further comments are made. investigating This issue is being investigated and/or work is in progress to resolve the issue. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.
Projects
None yet
Development

No branches or pull requests

6 participants