-
Notifications
You must be signed in to change notification settings - Fork 421
"Batch" of SQS messages above 10 count causes problems during cleanup since boto3.client('sqs').delete_message_batch is limited to 10. #824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hey @whardier with this new launch we won't need this anymore, and it'll be less error prone since deleting a large number of messages could catch customers off guard if their timeouts aren't adjusted accordingly. |
Actually there is still an issue at hand where the batch deletion fails and the entire event breaks. I am not sure if AWS lambda powertools will instead insist on a different return for the event that includes the failed messages. If so.. cool.. but it is a breaking change and the code to delete processed messages needs to be removed otherwise 11 processed messages will error the entire event. From the docs for reference: Reporting batch item failures When your Lambda function encounters an error while processing a batch, all messages in that batch become visible in the queue again by default, including messages that Lambda processed successfully. As a result, your function can end up processing the same message several times. To avoid reprocessing all messages in a failed batch, you can configure your event source mapping to make only the failed messages visible again. To do this, when configuring your event source mapping, include the value ReportBatchItemFailures in the FunctionResponseTypes list. This lets your function return a partial success, which can help reduce the number of unnecessary retries on records. Report syntax After you include ReportBatchItemFailures in your event source mapping configuration, you can return a list of the failed message IDs in your function response. For example, suppose you have a batch of five messages, with message IDs id1, id2, id3, id4, and id5. Your function successfully processes id1, id3, and id5. To make messages id2 and id4 visible again in your queue, your response syntax should look like the following: { |
“ but it is a breaking change ”
That’s what we’ve been discussing today - it’ll likely be a flag in the
existing sqs_batch_processor decorator, and a new decorator that could do
this by default and accept either SQS, DynamoDB streams, or Kinesis Data
Streams.
That said, we will still have to make sure this issue is solved too - the
existing PR would need to use threads (Python futures) to delete more
efficiently, and docs would need to be updated to advise customers to
increase their timeout due to the delete operation.
We should have a decision on how to move forward by Friday EOD.
…On Wed, 24 Nov 2021 at 17:07, Shane R. Spencer ***@***.***> wrote:
Actually there is still an issue at hand where the batch deletion fails
and the entire event breaks.
I am not sure if AWS lambda powertools will instead insist on a different
return for the event that includes the failed messages. If so.. cool.. but
it is a breaking change and the code to delete processed messages needs to
be removed otherwise 11 processed messages will error the entire event.
From the docs for reference:
Reporting batch item failures
When your Lambda function encounters an error while processing a batch,
all messages in that batch become visible in the queue again by default,
including messages that Lambda processed successfully. As a result, your
function can end up processing the same message several times.
To avoid reprocessing all messages in a failed batch, you can configure
your event source mapping to make only the failed messages visible again.
To do this, when configuring your event source mapping, include the value
ReportBatchItemFailures in the FunctionResponseTypes list. This lets your
function return a partial success, which can help reduce the number of
unnecessary retries on records.
Report syntax
After you include ReportBatchItemFailures in your event source mapping
configuration, you can return a list of the failed message IDs in your
function response. For example, suppose you have a batch of five messages,
with message IDs id1, id2, id3, id4, and id5. Your function successfully
processes id1, id3, and id5. To make messages id2 and id4 visible again in
your queue, your response syntax should look like the following:
{
"BatchItemFailures": [
{
"ItemIdentifier": "id2"
},
{
"ItemIdentifier": "id4"
}
]
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZPQBGPK3KRW46EYRRAGC3UNUEVZANCNFSM5ICOM7OA>
.
|
From the docs: "Similar to Resource objects, Session objects are not thread safe and should not be shared across threads and processes. You should create a new Session object for each thread or process" I've not had a chance to test out performance between concurrent futures and simple threading using the threading module. I only have experience using the threading module for this. I'm not sure what approach aws_lambda_powertools_python would like to take as threading eeks its way into the codebase or how session setup should be passed around. |
You’d pass the client not the session. This has been a common practice for
over 5 years to paralelize S3 download for example.
You can use concurrent.futures, create a dedicated function that accepts a
SQS Client and a maximum batch of messages that can be deleted at a time,
and submit it to thread pool executor - max workers would be defined based
on number of messages to be deleted / max no of messages that can be
deleted.
There’s a crude example using concurrent.futures in an older PR clarifying
the thread safe aspects - we can help if you don’t feel comfortable with
the approach.
https://github.com/boto/boto3/pull/2848/files
…On Wed, 24 Nov 2021 at 21:08, Shane R. Spencer ***@***.***> wrote:
From the docs: "Similar to Resource objects, Session objects are not
thread safe and should not be shared across threads and processes. You
should create a new Session object for each thread or process"
I've not had a chance to test out performance between concurrent futures
and simple threading using the threading module. I only have experience
using the threading module for this. I'm not sure what approach
aws_lambda_powertools_python would like to take as threading eeks its way
into the codebase or how session setup should be passed around.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZPQBGGOBOE7YK7DO7OJRDUNVA2ZANCNFSM5ICOM7OA>
.
|
OK. I am mostly concerned with this:
https://github.com/boto/boto3/blob/develop/boto3/__init__.py#L87
I had been more familiar with passing botocore.config.Config or some
specific session parameters rather than the an instance of boto.client
since it uses a globally cached session.
I'll review things and push out an update to the PR.
<https://about.me/ShaneSpencer?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=gmail_api&utm_content=thumb>
Shane Spencer
about.me/ShaneSpencer
<https://about.me/ShaneSpencer?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=gmail_api&utm_content=thumb>
Sponsor @whardier on GitHub Sponsors <https://github.com/sponsors/whardier>
On Wed, Nov 24, 2021 at 5:31 PM Heitor Lessa ***@***.***>
wrote:
… You’d pass the client not the session. This has been a common practice for
over 5 years to paralelize S3 download for example.
You can use concurrent.futures, create a dedicated function that accepts a
SQS Client and a maximum batch of messages that can be deleted at a time,
and submit it to thread pool executor - max workers would be defined based
on number of messages to be deleted / max no of messages that can be
deleted.
There’s a crude example using concurrent.futures in an older PR clarifying
the thread safe aspects - we can help if you don’t feel comfortable with
the approach.
https://github.com/boto/boto3/pull/2848/files
On Wed, 24 Nov 2021 at 21:08, Shane R. Spencer ***@***.***>
wrote:
> From the docs: "Similar to Resource objects, Session objects are not
> thread safe and should not be shared across threads and processes. You
> should create a new Session object for each thread or process"
>
> I've not had a chance to test out performance between concurrent futures
> and simple threading using the threading module. I only have experience
> using the threading module for this. I'm not sure what approach
> aws_lambda_powertools_python would like to take as threading eeks its way
> into the codebase or how session setup should be passed around.
>
> —
> You are receiving this because you commented.
>
>
> Reply to this email directly, view it on GitHub
> <
#824 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAZPQBGGOBOE7YK7DO7OJRDUNVA2ZANCNFSM5ICOM7OA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#824 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKRFIQPSF2PBO4XOKCPWDUNWNYBANCNFSM5ICOM7OA>
.
|
"Batch" of SQS messages above 10 count causes problems during cleanup since boto3.client('sqs').delete_message_batch is limited to 10. There is a hard limit of 10 when utilizing delete_message_batch that is not properly documented in boto3 (whereas other batch methods do show 10). There is also no documentation reflecting a limit of 10 when implementing the sqs batch processor provided in this project. Infact the SQS documentation itself does not mention a limit - https://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_DeleteMessageBatch.html but the return from this function when attempting to delete more than 10 messages is inline with errors for other messages when the records are greater than 10 (and those are properly documented).
I don't believe this is a documentation related issue - however - and this was simply an oversight where > 10 was expected to "just work".
What were you trying to accomplish?
I typically limit between 10-20 messages when performing longer running record processing functions and set my timeout for my lambda appropriately. I've toyed with making a toggle to flush deletions one by one (by only processing one record as a list at a time) for these sorts of tasks and think that is something that may be appropriate some day.
I also have (had) a limit of 1000 or 30 seconds for tracking data (route changes in frontend app) that is aggregated and spooled to AWS timestream. Processing time for 1000 records is typically less than 4 seconds... however I wasn't properly deleting these messages and had accidentally squashed the exception that would have told me there were some limits.
Expected Behavior
delete_message_batch isn't documented for having a 10 record id limit.. Ideally it would have deleted more than 10 at a time.
Current Behavior
delete_message_batch properly causes an exception to come back if there are more than 10.
Possible Solution
I've implemented a patch to allow arbitrary amounts of records to be spooled to delete_message_batch.
Currently I limit processing to 10 records at a time (while also allowing for more than 10) as a pattern (when processing batches can be done this way without causing an upstream issue). It IS nice that later batches can fail while earlier successful ones are properly cleaned.
May need to rethink documentation and examples to properly show supporting more than 10 records per event.
Steps to Reproduce (for bugs)
N/A
Environment
The text was updated successfully, but these errors were encountered: