-
Notifications
You must be signed in to change notification settings - Fork 420
Feature request: add support for large message handling (SQS, SNS, EventBridge, etc.) #1615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @heitorlessa am happy to pick this one up, I am busy for a week but will have time after that to work on it. |
Would Step Functions invocations of Lambda be a good candidate for this feature? |
@ims-swilkinson you mean a Lambda function receiving a Payload referencing a S3 object URL? As in, Step Function -> Lambda? If so, that would be possible with a generic function a consumer could use to fetch data from S3, optionally deserialize, and optionally transform it (maybe something else?). Reason I say generic is that the RFC @walmsles wrote focuses the Consumer angle within the Batch Processing feature. For Step Functions, they don't pass any particular metadata by default to make this experience nicer. The more I think the more I find that if we create a new contract and suggest customers to invoke Lambda functions using this contract as a payload, we could easily create a new Event Handler to decouple function handler and tasks/steps, and bring these nice features like this. |
Yes it could work with a simple contract specifically for large payloads. But I understand how the lack of a contract for Step Functions invocation of Lambda is a problem here, I forgot there isn't one. |
Oooooh - this makes so much sense ...... if powertools could introduce an adapter pattern, it opens up so many possibilities! I have been thinking so much about this lately, and fits with where I want to push serverless development in large enterprises today. It opens up not just idempotent handling but large message handling across EVERY service. |
Will leave here my 2 cents on this matter, since I too suffer of this problem. Payload too large. I believe having an established contract in the payload itself its a must, so we can use a specific handler based on some flag in that contract ( assuming). One thing that comes to mind, which is specific to my use case but might as well be for others. For my workloads, I actually don't know the payload size ahead of time. The same pipeline operates on small and large payloads. For small payloads, I send the content in the message itself (call it the normal and usual pathway). For large payload I send a reference to an S3 location where the payload can be found. My handler then checks a flag in the payload itself to see where to grab the content from. It's an adapter pattern of sorts. What this allows me is to have a single Lambda handling both type of payloads, indiscriminately. I don't have to force the Lambda to use the S3 Reference for all payloads even if they are small. Saves on cost $. |
Not knowing the exact size of payloads is common, with larger payloads being the exception rather than the rule in most situations. The intention with this utility is to focus on the processor side first with compatibility to existing sqs-extended-client libraries for java and python. Fetching of the large file would be handled by Powertools batch processor where required according to the additional meta-data added to the SQS Message content by the client-side (refer links for java and python) So the intent of the feature would exactly match your requirement. |
This functionality is already present in the Java version of lambda powertools: https://docs.powertools.aws.dev/lambda/java/utilities/large_messages/ That said, it would be awesome if this were not limited to SQS/SNS-- my usecase is step functions, where the returned payload from a lambda task may exceed the 256KB threshold and would be truncated/swallowed by stepfunctions |
AWS Labs have released Python extended clients for SNS (https://github.com/awslabs/amazon-sns-python-extended-client-lib) and SQS (https://github.com/awslabs/amazon-sqs-python-extended-client-lib) which is nice for sending of large messages to SNS and SQS. At a minimum, the Powertools Batch utility should support these 2 use cases as a starting point using a "LargeMessage" decorator similar to the Java implementation mentioned above (so it is an opt-in) This issue should focus on these 2 use cases only, with all other large message handling being split into another issue for clarity and simplicity. Other service use cases have no client implementations which is a separate discussion on where the client components sit and should powertools maintain those? I am happy to take this on for SNS and SQS now the clients are officially supported. |
Let's revisit this in April - if we're going to take a dependency on the extended client lib, we need to be sure this will be available for TypeScript/Node/.NET too. Otherwise we need a good abstraction on that, ideally beyond SNS/SQS too hence need time to do this properly. Higher demand now is to allow customers to know when a transaction was idempotent . Event Handler for WebSockets also has a higher demand -- we can revisit the board in April to rank them |
Thanks @heitorlessa, makes sense. For when this comes up again here is another example of AWS integration looking to deal with large events: Support Sidelining large events (eventbridge-kafka connector) |
Is this related to an existing feature request or issue?
No response
Which AWS Lambda Powertools utility does this relate to?
Batch processing
Summary
A common pain and friction point in developing Event-Driven Architectures using AWS Serverless managed services is the existing service limit on message size which generally equates to 256KB for messages passing through any of the messaging services - SQS, SNS, EventBridge, [insert others here].
In an enterprise integration landscape, 256KB is a challenging size. It requires the building of store-and-forward systems where the inbound message is stored in an S3 bucket (or another storage mechanism) and then pushing meta-data through the messaging service to enable processing by consumers, which all need to retrieve the actual message using the metadata before processing.
Use case
System Integration Use-case where data is submitted to an integration service in AWS which uses EventBridge to route event data to different destinations. API gateway to Lambda proxy integration allows up to 6MB of payload to be processed which is way larger then 256KB limit of event bridge.
In this scenario the Lambda behind the API is required to store the large payload first in S3 and then push meta-data for routing through EventBridge. Consumers then need to read the original large message and process it.
Proposal
Similar to the idempotency Utility, an abstract Persistence Class should be created to allow for the storing and retrieval of a message into an AWS storage Service (defaulting to S3 seems sensible).
Build out a message client handler like the sqs-extended-client-lib (which is based on the AWS Java implementation for large message sending to SQS and appears abandoned on GitHub) for storing, creating meta-data for forwarding through the messaging service.
Like idempotency would need to consider JMESPath for extracting meta-data or a mechanism for building the required message structure for submitting to AWS message service (SQS, SNS, Eventbridge, etc.).
Feels like a nice utility to cover more than just SQS but also others - ideally should start with the most common use case, and SQS seems logical in this regard since there are well-known implementations that already exist and match the existing user experience from the Java utility as a starting point seems reasonable.
Would also integrate the capability of detecting the large message events within the existing Powertools batch processing utilities so that the retrieval of the large message can be done as part of the batch processing utilities when they are used.
Need to provide a stand-alone mechanism for retrieving the large messages for customers who wish to partially adopt Lambda powertools so there are pathways for everyone to gain advantage from this feature.
Producer:
Consumer:
Out of scope
Initially a single messaging service should be targetted but consideration of other Messaging services (SNS, EventBridge, Kinesis, etc) should be considered through design to enable this feature on all batch processing utilities in the future.
Potential challenges
A customisable method of taking the large message and creating the meta-data for the messaging service with data detailing how to retrieve the large message is needed. JMESPath or JSON path can be considered for simple use cases but should also provide a custom function implementation since not every use case will be so straightforward.
Need to also consider how the consumer would know where to access the large message data given it will be in a different Lambda function and potentially without knowledge of the storage mechanism - if this could be embedded in meta-data for Powertools to determine from the consumer side it would lessen boilerplate considerations.
Dependencies and Integrations
Batch Utilities would need to understand how to retrieve large messages when this utility is in use. Having the retrieval as an automated mechanism removes a lot of boilerplate code from existing solutions.
Alternative solutions
Acknowledgment
The text was updated successfully, but these errors were encountered: