Skip to content

Bug: Retry of previously failed SQS record is skipped when skipGroupOnError is enabled #3673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mlrprananta opened this issue Feb 27, 2025 · 4 comments · Fixed by #3674
Closed
Assignees
Labels
batch This item relates to the Batch Processing Utility bug Something isn't working completed This item is complete and has been merged/shipped

Comments

@mlrprananta
Copy link

Expected Behavior

When a SQS record that failed in a previous Lambda invocation is retried and handled by the same Lambda instance again, it should be processed by the provided recordHandler

Current Behavior

When a SQS record that failed in a previous Lambda batch invocation is retried and handled by the same Lambda instance again, it is skipped and returned as a failure without going through the recordHandler.

Only if the record is handled by a concurrent Lambda instance will it be processed again.

Code snippet

import { processPartialResponse, SqsFifoPartialProcessorAsync } from "@aws-lambda-powertools/batch";
import { SQSHandler, SQSRecord } from "aws-lambda";

const processor = new SqsFifoPartialProcessorAsync();

const recordHandler = async (record: SQSRecord): Promise<void> => {
    console.debug("RECORD HANDLER INVOKED");
    throw new Error("Random error occurred");
};

export const handle: SQSHandler = async (event, context) => {
    console.debug('SQS HANDLER INVOKED');
    return processPartialResponse(event, recordHandler, processor, {
        context,
        skipGroupOnError: true,
        throwOnFullBatchFailure: false,
    });
};

Steps to Reproduce

  1. Set up a FIFO queue, a DLQ and a Lambda with the above handler code and set reserved concurrency to 0
  2. Publish any message to the queue
  3. In the logs you should observe SQS HANDLER INVOKED and RECORD HANDLER INVOKED for the first invocation, and after the visibility timeout it will only log SQS HANDLER INVOKED for follow-up invocations, until maxReceiveCount is hit and the message is moved to the DLQ

Possible Solution

In SqsProcessor, the failedGroupIds should be cleared after each Lambda invocation.

Introduce a method to clear the failedGroupIds in SqsProcesor.

In SqsFifoPartialProcessorAsync override the prepare method and add a call to clear failedGroupIds.

Powertools for AWS Lambda (TypeScript) version

latest

AWS Lambda function runtime

18.x

Packaging format used

npm

Execution logs

@mlrprananta mlrprananta added bug Something isn't working triage This item has not been triaged by a maintainer, please wait labels Feb 27, 2025
Copy link

boring-cyborg bot commented Feb 27, 2025

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link

@dreamorosi dreamorosi self-assigned this Feb 27, 2025
@dreamorosi dreamorosi added confirmed The scope is clear, ready for implementation batch This item relates to the Batch Processing Utility and removed triage This item has not been triaged by a maintainer, please wait labels Feb 27, 2025
@dreamorosi dreamorosi moved this from Triage to Working on it in Powertools for AWS Lambda (TypeScript) Feb 27, 2025
@dreamorosi
Copy link
Contributor

Hi @mlrprananta, thank you for opening this issue and providing a potential solution.

I was able to reproduce the issue you described. I'll work on a PR to fix it.

Copy link
Contributor

⚠️ COMMENT VISIBILITY WARNING ⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@github-actions github-actions bot added pending-release This item has been merged and will be released soon and removed confirmed The scope is clear, ready for implementation labels Feb 27, 2025
Copy link
Contributor

github-actions bot commented Mar 7, 2025

This is now released under v2.16.0 version!

@github-actions github-actions bot added completed This item is complete and has been merged/shipped and removed pending-release This item has been merged and will be released soon labels Mar 7, 2025
@dreamorosi dreamorosi moved this from Coming soon to Shipped in Powertools for AWS Lambda (TypeScript) Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
batch This item relates to the Batch Processing Utility bug Something isn't working completed This item is complete and has been merged/shipped
Projects
Development

Successfully merging a pull request may close this issue.

2 participants