Skip to content

Error: Using Batch processing with a Pydantic models doesn't folllow under the main idea of batch processing #2091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nepshshsh opened this issue Apr 6, 2023 · 6 comments · Fixed by #2099
Assignees
Labels
batch Batch processing utility bug Something isn't working

Comments

@nepshshsh
Copy link
Contributor

Expected Behaviour

Lambda shouldn't failure if there is at least one successful record and return {batchItemFailures: [...] }

in the example the result of the lambda should be like this:
{ "batchItemFailures": [ { "itemIdentifier": "messageId-2" } ] }

Current Behaviour

Code throw exception without processing all batches

Traceback (most recent call last): File "C:\Users\Crumpet\Desktop\AWS\powertools\model_test.py", line 90, in <module> lambda_handler(event,{}) File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\tracing\tracer.py", line 305, in decorate response = lambda_handler(event, context, **kwargs) File "C:\Users\Crumpet\Desktop\AWS\powertools\model_test.py", line 42, in lambda_handler processed_messages: List[Union[SuccessResponse, FailureResponse]] = processor.process() File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 97, in process return [self._process_record(record) for record in self.records] File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 97, in <listcomp> return [self._process_record(record) for record in self.records] File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 474, in _process_record data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model) File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 347, in _to_batch_type return model.parse_obj(record) File "pydantic\main.py", line 526, in pydantic.main.BaseModel.parse_obj File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for OrderSqsRecord body -> item value is not a valid dict (type=type_error.dict)

Code snippet

import json

from typing import Any, List, Literal, Union
from aws_lambda_powertools import Logger, Tracer
from aws_lambda_powertools.utilities.batch import BatchProcessor, EventType, batch_processor
from aws_lambda_powertools.utilities.parser.models import SqsRecordModel
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
from aws_lambda_powertools.utilities.batch import (BatchProcessor,
                                                   EventType,
                                                   FailureResponse,
                                                   SuccessResponse,
                                                   batch_processor)

from aws_lambda_powertools.utilities.parser import BaseModel, validator


class Order(BaseModel):
    item: dict

class OrderSqsRecord(SqsRecordModel):
    body: Order

    # auto transform json string
    # so Pydantic can auto-initialize nested Order model
    @validator("body", pre=True)
    def transform_body_to_dict(cls, value: str):
        return json.loads(value)

processor = BatchProcessor(event_type=EventType.SQS, model=OrderSqsRecord)
tracer = Tracer()
logger = Logger()


@tracer.capture_method
def record_handler(record: OrderSqsRecord):
    return record.body.item

@tracer.capture_lambda_handler
def lambda_handler(event, context):
    batch = event["Records"]
    with processor(records=batch, handler=record_handler):
        processed_messages: List[Union[SuccessResponse, FailureResponse]] = processor.process()

    for message in processed_messages:
        status: Union[Literal["success"], Literal["fail"]] = message[0]
        result: Any = message[1]
        record: SQSRecord = message[2]

    logger.info(processor.response())
    return processor.response()


if __name__ == "__main__":
    event = {
        "Records": [
            {
            "messageId": "messageId-1",
            "receiptHandle": "AQEBwJnKyrHigUMZj6rYigCgxlaS3SLy0a...",
            "body": '{"item": 1}',
            "attributes": {
                "ApproximateReceiveCount": "1",
                "SentTimestamp": "1545082649183",
                "SenderId": "1",
                "ApproximateFirstReceiveTimestamp": "1545082649185"
            },
            "messageAttributes": {},
            "md5OfBody": "e4e68fb7bd0e697a0ae8f1bb342846b3",
            "eventSource": "aws:sqs",
            "eventSourceARN": "arn:aws:sqs:us-east-2:123456789012:my-queue",
            "awsRegion": "us-east-2"
            },
{
            "messageId": "messageId-2",
            "receiptHandle": "AQEBwJnKyrHigUMZj6rYigCgxlaS3SLy0a...",
            "body": 'Hi',
            "attributes": {
                "ApproximateReceiveCount": "1",
                "SentTimestamp": "1545082649183",
                "SenderId": "AIDAIENQZJOLO23YVJ4VO",
                "ApproximateFirstReceiveTimestamp": "1545082649185"
            },
            "messageAttributes": {},
            "md5OfBody": "e4e68fb7bd0e697a0ae8f1bb342846b3",
            "eventSource": "aws:sqs",
            "eventSourceARN": "arn:aws:sqs:us-east-2:123456789012:my-queue",
            "awsRegion": "us-east-2"
            }
        ]
    }
    lambda_handler(event,{})

Possible Solution

I think it can be easily fixed if we add code for bring to the model under try section (https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/utilities/batch/base.py#L474)

But perhaps there are some reasons because of which they decided not to do so. Please share your thoughts

Steps to Reproduce

You can try code snippet. also you can try any test case where you have more than two records in one batch, and one record doesn't follow under your model

AWS Lambda Powertools for Python version

latest

AWS Lambda function runtime

3.9

Packaging format used

PyPi

Debugging logs

Traceback (most recent call last):
  File "C:\Users\Crumpet\Desktop\AWS\powertools\model_test.py", line 90, in <module>
    lambda_handler(event,{})
  File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\tracing\tracer.py", line 305, in decorate
    response = lambda_handler(event, context, **kwargs)
  File "C:\Users\Crumpet\Desktop\AWS\powertools\model_test.py", line 42, in lambda_handler
    processed_messages: List[Union[SuccessResponse, FailureResponse]] = processor.process()
  File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 97, in process
    return [self._process_record(record) for record in self.records]
  File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 97, in <listcomp>
    return [self._process_record(record) for record in self.records]
  File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 474, in _process_record
    data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model)
  File "C:\Users\Crumpet\AppData\Local\Programs\Python39\lib\site-packages\aws_lambda_powertools\utilities\batch\base.py", line 347, in _to_batch_type
    return model.parse_obj(record)
  File "pydantic\main.py", line 526, in pydantic.main.BaseModel.parse_obj
  File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for OrderSqsRecord
body -> item
  value is not a valid dict (type=type_error.dict)
@nepshshsh nepshshsh added bug Something isn't working triage Pending triage from maintainers labels Apr 6, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 6, 2023

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link

@heitorlessa
Copy link
Contributor

@LuckIlNe thank you so much for this super detailed bug report!

There was a reason before we added Pydantic support, so we should definitely push that line within the try block as a fix.

Feel free to make a PR and we can review, amend any necessary details, and merge it tomorrow morning (CET tz).

I'm working on a new Batch improved experience tomorrow and can prioritise this bugfix if you're tight on bandwidth.

Thank you!

@heitorlessa heitorlessa added batch Batch processing utility and removed triage Pending triage from maintainers labels Apr 6, 2023
@heitorlessa heitorlessa self-assigned this Apr 6, 2023
heitorlessa added a commit that referenced this issue Apr 7, 2023
@heitorlessa
Copy link
Contributor

Notes for maintainers:

Paid additional tech debts as part of this PR

@heitorlessa heitorlessa added the pending-release Fix or implementation already in dev waiting to be released label Apr 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@heitorlessa
Copy link
Contributor

Congrats on your first contribution @LuckIlNe we appreciate a lot! I'm gonna finish the newly improved batch experience to cut boilerplate, along with doc fixes, and will merge this today.

This is the new concise experience with no need for a decorator + additional return.

image

heitorlessa added a commit to heitorlessa/aws-lambda-powertools-python that referenced this issue Apr 7, 2023
* develop:
  fix(batch): handle early validation errors for pydantic models (poison pill) aws-powertools#2091 (aws-powertools#2099)
  update changelog with latest changes
  docs(homepage): remove banner for end-of-support v1 (aws-powertools#2098)
  chore(deps-dev): bump aws-cdk-lib from 2.72.1 to 2.73.0 (aws-powertools#2097)
  chore(deps-dev): bump filelock from 3.10.7 to 3.11.0 (aws-powertools#2094)
  chore(deps-dev): bump coverage from 7.2.2 to 7.2.3 (aws-powertools#2092)
  chore(deps-dev): bump aws-cdk from 2.72.1 to 2.73.0 (aws-powertools#2093)
  chore(deps-dev): bump mypy-boto3-cloudformation from 1.26.60 to 1.26.108 (aws-powertools#2095)

Signed-off-by: heitorlessa <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2023

This is now released under 2.12.0 version!

@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
batch Batch processing utility bug Something isn't working
Projects
None yet
2 participants