Skip to content

firehoseAsyncClient intermittently fails to publishBatchRecords #2312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aiasmartakis opened this issue Mar 4, 2021 · 9 comments
Closed
Labels
guidance Question that needs advice or information.

Comments

@aiasmartakis
Copy link

Describe the issue

A java client (running in docker container on k8s cluster) sends metrics to firehose through the SDK (v2.7.10). We use AWS Organizations so have different production vs QA (staging) environments. In our production environment (that has significant extra load) this issue has not shown, however in our QA environment this fails some of the time. I estimate over 80% is send using this same code fi
full_code.zip
ne. The load per invoke of putRecordBatch is in the KBs, well below documented limits. Annoyingly I also do not receive any details as to why it fails, using the ASyncClient I extended the BIConsumer to override the accept method.

Steps to Reproduce

public abstract class KinesisFirehoseService implements BiConsumer<FirehoseResponse, Throwable>
....
public void putRecordBatch(List<String> records) {
  ...
  logger.debug("entry putRecordBatch {} nr of records {}", streamName, records.size());
    firehoseAsyncClient.putRecordBatch(builder ->
        builder
            .deliveryStreamName(streamName)
            .records(recordList)
            .build()).whenCompleteAsync(this);

    @Override
    public final void accept(FirehoseResponse response, Throwable exception) {
        if (exception != null) {
            logger.error("Failed to publish records to kinesis with exception: {} - {} - {} response: {}", exception.getMessage(), exception.toString(), exception.getCause(), response);
        } 

Current Behavior

2021-03-04 19:30:56.332 DEBUG [AWSGateway,,,] 1 --- [ AWS_GATEWAY_QA] c.b.a.g.service.KinesisFirehoseService   : entry putRecordBatch power-all-stream nr of records 50
2021-03-04 19:30:56.414 ERROR [AWSGateway,,,] 1 --- [  Thread-477422] c.b.a.g.s.KinesisFirehosePowerService    : Failed to publish records to kinesis with exception: software.amazon.awssdk.core.exception.SdkClientException - java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException - software.amazon.awssdk.core.exception.SdkClientException response: null

Your Environment

  • AWS Java SDK version used: 2.7.10
  • JDK version used: Open JDK 1.8.0_212
  • Operating System and version: Linux Alpine 3.9.4
  • Hosted K8s cluster on EC2 (kops configured)
  • Note almost identical to production environment (that has no issues)
@aiasmartakis aiasmartakis added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Mar 4, 2021
@aiasmartakis
Copy link
Author

Forgot to mention, K8s shows no 'Events' indicating an issue with the pod.

@aiasmartakis aiasmartakis changed the title Firehose intermittendly fails to publishBatchRecords firehoseAsyncClient intermittendly fails to publishBatchRecords Mar 4, 2021
@aiasmartakis aiasmartakis changed the title firehoseAsyncClient intermittendly fails to publishBatchRecords firehoseAsyncClient intermittently fails to publishBatchRecords Mar 4, 2021
@debora-ito
Copy link
Member

debora-ito commented Mar 6, 2021

@aiasmartakis is it possible to log the whole exception object, instead of the exceptions parts? Having the stacktrace would be very helpful in the investigation.

@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Mar 6, 2021
@aiasmartakis
Copy link
Author

@debora-ito changing to:
logger.error("Failed to publish records to kinesis with exception:", exception);

I'll get back to you.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Mar 9, 2021
@debora-ito
Copy link
Member

Let us know what you find.

@debora-ito debora-ito added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Mar 13, 2021
@aiasmartakis
Copy link
Author

apologies it took longer than it should:

this is the full error:

java.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:61)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:75)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryErrorIfNeeded(AsyncRetryableStage.java:175)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:126)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:107)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onError(MakeAsyncHttpRequestStage.java:249)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:253)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.lambda$writeRequest$7(NettyRequestExecutor.java:207)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
	at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
	at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:57)
	at io.netty.channel.DelegatingChannelPromiseNotifier.operationComplete(DelegatingChannelPromiseNotifier.java:31)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1391)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
	at io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:113)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
	at io.netty.handler.ssl.SslHandler.finishWrap(SslHandler.java:890)
	at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:858)
	at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:793)
	at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:774)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.flush(CombinedChannelDuplexHandler.java:533)
	at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
	at io.netty.channel.CombinedChannelDuplexHandler.flush(CombinedChannelDuplexHandler.java:358)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.netty.handler.logging.LoggingHandler.flush(LoggingHandler.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
	at com.typesafe.netty.http.HttpStreamsHandler.completeBody(HttpStreamsHandler.java:298)
	at com.typesafe.netty.http.HttpStreamsHandler.access$400(HttpStreamsHandler.java:14)
	at com.typesafe.netty.http.HttpStreamsHandler$3$1.run(HttpStreamsHandler.java:276)
	at com.typesafe.netty.http.HttpStreamsHandler.executeInEventLoop(HttpStreamsHandler.java:342)
	at com.typesafe.netty.http.HttpStreamsHandler.access$300(HttpStreamsHandler.java:14)
	at com.typesafe.netty.http.HttpStreamsHandler$3.complete(HttpStreamsHandler.java:273)
	at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:244)
	at com.typesafe.netty.HandlerSubscriber$3.operationComplete(HandlerSubscriber.java:241)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:485)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:162)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
	at com.typesafe.netty.HandlerSubscriber.onComplete(HandlerSubscriber.java:241)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest$1.onNext(NettyRequestExecutor.java:475)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest$1.onNext(NettyRequestExecutor.java:450)
	at software.amazon.awssdk.core.internal.http.async.SimpleHttpContentPublisher$SubscriptionImpl.request(SimpleHttpContentPublisher.java:70)
	at com.typesafe.netty.HandlerSubscriber.maybeRequestMore(HandlerSubscriber.java:271)
	at com.typesafe.netty.HandlerSubscriber.maybeStart(HandlerSubscriber.java:207)
	at com.typesafe.netty.HandlerSubscriber.provideSubscription(HandlerSubscriber.java:196)
	at com.typesafe.netty.HandlerSubscriber.access$000(HandlerSubscriber.java:18)
	at com.typesafe.netty.HandlerSubscriber$1.run(HandlerSubscriber.java:184)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:466)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
	at java.lang.Thread.run(Thread.java:748)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
	... 83 common frames omitted
Caused by: java.io.IOException: The channel was closed. This may have been done by the client (e.g. because the request was aborted), by the service (e.g. because the request took too long or the client tried to write on a read-only socket), or by an intermediary party (e.g. because the channel was idle for too long).
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.decorateException(NettyRequestExecutor.java:267)
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor.handleFailure(NettyRequestExecutor.java:252)
	... 76 common frames omitted
Caused by: java.nio.channels.ClosedChannelException: null
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

I also see these similar but slightly different traces:

Caused by: software.amazon.awssdk.core.exception.SdkClientException: null
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
	at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:125)
	... 63 common frames omitted
Caused by: java.io.IOException: Server failed to send complete response
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.notifyIfResponseNotCompleted(ResponseHandler.java:396)
	... 54 common frames omitted

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Mar 19, 2021
@debora-ito debora-ito added the needs-triage This issue or PR still needs to be triaged. label Mar 23, 2021
@debora-ito
Copy link
Member

@aiasmartakis I'm sorry for loosing track of this. Are you still experiencing the issue?

java.io.IOException: The channel was closed can have several causes, as the message says. We recently made changes to how the connection pool is handled in the async clients, the changes were released in SDK version 2.17.101. Could you test a newer version and check if you see improvements on your side?

@debora-ito debora-ito added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Jan 14, 2022
@aiasmartakis
Copy link
Author

I admit also forgetting about it. I don't think it has showed up again. We did increase memory etc on our k8s cluster but I'm hesitant to point in a certain direction as far as what could've fixed it.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jan 14, 2022
@debora-ito
Copy link
Member

That's ok, glad you are not seeing this anymore.

I'll go ahead and close this, feel free to reach out if you have any other question.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws-sdk-java-automation added a commit that referenced this issue Jan 3, 2023
…b9d95f04b

Pull request: release <- staging/224b5b02-7f71-46fc-9c45-a2fb9d95f04b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

2 participants