-
Notifications
You must be signed in to change notification settings - Fork 421
Feature request: Optimize Circuit-Breaking for SQS FIFO Partial Processor on records with different group IDs #2981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here's our plan:
|
Hello @duc00! I'm adding this issue to our next iteration (09/09 - 22/09) so we have time to validate scenarios and decide the way forward. Thank you for your patience, but this is a complex case and may involve a breaking change in the current behavior. |
Hello everybody! Updating this issue after some time. We continued working on this and we opened an internal ticket to the service team in order to fully understand this issue, as we found some more data that leaves me confused. As soon as we receive a response to this ticket, I will update this issue and we will work on it. Thank you, everyone, for your patience. |
Hello everybody! After a long time without an update, we finally have a direction for this PR! We have received confirmation from the SQS team that a batch can be delivered to Lambda with different group ids. When we implemented this specialized batch for FIFO, we thought that each
Given all the information we have, our next steps will be: 1 - Add a flag to I'll work to send a PR until the next week. Thank you for everyone's patience until we could get the correct information. |
Hey @leandrodamascena, great news, thanks. 😄 Let me know if I can help review the code once it's out.
Do you think the flag set to MessageGroupId only could be made default in a future major release? This seems like a sensible default behaviour, once the AWS SQS doc is updated. |
Hello @duc00! I think we can wait for customer feedback before deciding whether to make this behavior the default in Powertools v3. However, I'll keep this idea in mind to ensure we don't lose track of it. The PR is all set for review. If you have any comments or spot any issues, feel free to share them and make comments. Your feedback is always appreciated! |
|
This is now released under 2.36.0 version! |
Awesome! Thank you all 😊 |
And the Lambda team has updated the documentation to make it clear that a batch can have more than one https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#events-sqs-scaling Before
Now
|
Use case
Discussed in #2936
We confirmed that SQS FIFO queues can deliver messages from different Group IDs to the same Lambda invocation. Right now, when a record fails processing, we short-circuit an fail the rest of the items on the invocation, regardless if they are from the same message group ID or not.
This un-optimal experience can lead to unintended messages landing on DLQs or failing completely.
We need to explore the idea of continuing to process other group IDs on the same invocation.
Solution/User Experience
When a message fails processing on the SQS FIFO resolver, collect the remaining messages from the same group ID. When a new group ID is found, start processing again.
At the end, return all the failed messages from each message group ID.
Alternative solutions
Depending how the implementation and behaviour turns out, we might consider not making the new behavior the default one, and keep it behind a feature flag.
Acknowledgement
From the discussion
Originally posted by duc00 August 8, 2023
Hello,
According to Implementing partial batch responses AWS doc:
I see that Powertools implementation of the doc, with
SqsFifoPartialProcessor
, is strictly following this recommendation. This question is thus more specific to AWS implementation of partial batch responses with FIFO. Posting the question on this repo seems to be a good entry point nonetheless, since both AWS developers and community interact on those subjects.My problem is the following:
I am processing a SQS FIFO queue with
SqsFifoPartialProcessor
. My batch size is 10. Since the queue is not high-scale, the batch often contains messages with different message group IDs. When a failure occurs, the rest of the processing is stopped and all records left are returned as failures to the queue. So I often end-up with valid records in my dead-letter queue just because they were processed in a batch containing an unrelated invalid one.My question:
Would it be valid, after a failure, to return all remaining records in the batch but only the ones with the same group ID? According to the doc, current implementation is recommended to preserve the ordering of messages in your queue. I am not seeing why processing other records with a different group ID would go against that. Thus my question.
Many thanks!
The text was updated successfully, but these errors were encountered: