Skip to content

RFC: SQS partial batch failure middleware #92

Closed
@gmcrocetti

Description

@gmcrocetti

First things first: Congratulations for the amazing repo.

Key information

  • RFC PR: (leave this empty)
  • Related issue(s), if known:
  • Area: Utilities
  • Meet tenets: Yes

Summary

A lambda processing a batch of messages from SQS is a very common approach and it works smooth for most use cases. Now, for the sake of example, suppose we're processing a batch and one of the messages failed, lambda is going to redrive this batch to the queue again, including the successful ones ! Re-running successful messages is not acceptable for all use cases.

Motivation

A very common execution pattern is running a lambda connected in sqs, in most cases with a batch size not equal to one. In such cases, an error to one of the processed messages will cause the whole batch to return to the queue. For some use cases, it's impossible to rely on such behavior - non idempotent actions. A solution for this problem would improve the experience of using a lambda with SQS.

Proposal

I'm going to propose a very simple code that's not complete.

from aws_lambda_powertools.middleware_factory import lambda_handler_decorator

@lambda_handler_decorator
def sqs_partial_batch_failure(handler, event, context):
	sent_records = event['Records']
	sqs_client = boto3.client('sqs')

	response = handler(event, context)

	successful_messages = get_successful(response)
	sqs_client.delete_message_batch(successful_messages)  # deletes 3rd and 7th messages


# batchsize of 10, fails for 3rd and 7th message
@sqs_partial_batch_failure
def handler(event, context):
	for record in event['Records']:
		do_sth(record)

Drawbacks

  • Add boto3 as dependency;
  • It may be unrelated with the proposal of this package;
  • Adds a little performance overhead to track failed records and delete the successful ones.

Rationale and alternatives

  • What other designs have been considered? Why not them?
    The lambda may be invoked with a single message (batch size one) but it just feels like ill-use of the full power of this integration. Running as much as possible message per lambda call is the best scenario.
    Inspired by (middy)[https://github.com/middyjs/middy/tree/master/packages]

  • What is the impact of not doing this?
    It may attract more users to use such a powerful option: batch processing with sqs.

Unresolved questions

Optional, stash area for topics that need further development e.g. TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions