Skip to content

Feature request: Allow the recordHandler() function to receive all the event records at one time instead of one at a time #1658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
msbeeman opened this issue Aug 22, 2023 · 4 comments
Labels
batch This item relates to the Batch Processing Utility feature-request This item refers to a feature request for an existing or new utility rejected This is something we will not be working on. At least, not in the measurable future

Comments

@msbeeman
Copy link

msbeeman commented Aug 22, 2023

Use case

As a user of the Batch Processing feature, I often want to process the batch of events the lambda recieves, in batches (like mapping over the records) rather than a single record at a time.

Currently, with the way things are set up, the user returns a processPartialResponse() function from the lambda.
image

The first parameter of the processPartialResponse() function is the event object containing the entire batch of records passed to the lambda, while the second parameter of the function, recordHandler, is a function that's used to process each record in the batch:
image

Since the recordHandler() function can only be passed a single record at a time, this prevents the user from being able to process the entire batch and forward requests concurrently.

For example, let's say I have a lambda function that receives batches of dynamo db stream records, and for each record I want my lambda to do some processing, and then make a put request to write an item to a table...

  • With the current behavior: If the lambda receives 100 dynamo db stream records, the recordHandler() will have to process each item and make a separate put request for each record (100 put item requests). Since the recordHandler() function only processes a single item at a time, I can't batch those 100 put item requests together into 4 BatchWriteItem requests.

  • With the desired behavior: The user should have the option for passing in a recordHandler() function that can either process one record at a time, or an option to pass in a recordHandler() function that can process all records at one time, so they have the flexibility to process the records concurrently and make batch requests to down stream services.

For an example of the desired behavior, see pic below:
The recordHandler() receives an array of all the records at once (DynamoDBRecord[]), a function maps over them and returns an item for each record, then turns each of those items into a put request, batches those put requests into arrays containing 25 put requests each, then writes each batch of 25 put requests to dynamodb using the batchWriteItem API.
image

Solution/User Experience

Extend/modify the processPartialResponse() function to accept:

  • A recordHandler() function that receives and processes one event at a time (current behavior)
  • OR a recordHandler() function that receives all the passed in records from the lambda event, so the records can be processed concurrently and batched into write requests. So for example a lambda that receives an event containing 100 dynamodb stream records, should have the ability to process those records into 4 batchWriteRequests consisting of 25 put requests each.

Alternative solutions

No response

Acknowledgment

Future readers

Please react with 👍 and your use case to help us understand customer demand.

@msbeeman msbeeman added triage This item has not been triaged by a maintainer, please wait feature-request This item refers to a feature request for an existing or new utility labels Aug 22, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 22, 2023

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link

@dreamorosi dreamorosi added discussing The issue needs to be discussed, elaborated, or refined batch This item relates to the Batch Processing Utility and removed triage This item has not been triaged by a maintainer, please wait labels Sep 4, 2023
@dreamorosi
Copy link
Contributor

Hi @msbeeman apologies for taking so long to get back to you here on GitHub.

As mentioned in the response to your message on Discord, at the moment this type of pattern is not possible as we designed the Batch Processing utility to receive a record handler function that is called once per each record in the batch.

The main goal of the utility is to provide an easy way to handle partial failures, which require a special response shape, rather than simply iterating through a batch of records, so if all you want is to chunk your batch into smaller batches and write them to DynamoDB perhaps you don't need this utility.

In TypeScript you could handle this with a logic similar to this:

export const handler = async (event: SQSEvent) => {
  const records = event.Records;
  const chunkSize = 25;
  let failedIds = [];
  for (let i = 0; i< records.length; i += chunkSize) {
    const chunk = records.slice(i, i + chunkSize):
    try {
      // .. do DDB stuff or other
    } catch (err) {
      failedIds = chunk.map((record) => record.id);
      break;
    }
  }

  return { partialFailures: failedIds };
}

Alternatively, if you still want to use the Batch Processing utility to perform some side effect in your record handler one at the time and then write the results or the items themselves to DynamoDB in batches of 25, you could extend the successHandler() method of the BatchProcessor class to issue a DynamoDB write once there are 25 records in the successMessages array.

Our docs have an example of how to extend the BatchProcessor class, even though the example shows how to do something with failed messages the concept is similar.

@dreamorosi dreamorosi added need-customer-feedback Requires more customers feedback before making or revisiting a decision need-response This item requires a response from a customer and will considered stale after 2 weeks labels Sep 4, 2023
@github-actions
Copy link
Contributor

This issue has not received a response in 2 weeks. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

@github-actions github-actions bot added the pending-close-response-required This issue will be closed soon unless the discussion moves forward label Sep 19, 2023
@github-actions
Copy link
Contributor

Greetings! We are closing this issue because it has been open a long time and hasn’t been updated in a while and may not be getting the attention it deserves. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to comment or reopen the issue.

@github-actions github-actions bot added the rejected This is something we will not be working on. At least, not in the measurable future label Sep 27, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2023
@dreamorosi dreamorosi removed pending-close-response-required This issue will be closed soon unless the discussion moves forward need-customer-feedback Requires more customers feedback before making or revisiting a decision discussing The issue needs to be discussed, elaborated, or refined need-response This item requires a response from a customer and will considered stale after 2 weeks labels Feb 10, 2024
@dreamorosi dreamorosi moved this from Coming soon to Closed in Powertools for AWS Lambda (TypeScript) Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
batch This item relates to the Batch Processing Utility feature-request This item refers to a feature request for an existing or new utility rejected This is something we will not be working on. At least, not in the measurable future
Projects
Development

No branches or pull requests

2 participants