-
Notifications
You must be signed in to change notification settings - Fork 421
Bug: Slow import triggered by ApigatewayRestResolver.enable_swagger (Pydantic 2) #4372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
Hey @dajmeister, thank you for taking the time to submit a report and I'm stoked you used Tuna to investigate it <3 That's largely I've created two examples using a simple
Please let me know if I missed anything. Pydantic and Powertoolsprofile_pydantic_plus_data_validation.log ![]() Pydantic import alone![]() |
Hi @heitorlessa, thanks for responding! The pydantic import is definitely very slow when there is no compiled bytecode. Your Your
|
I’m in between events and 30+ hours flights so I’ll dig deeper when I’m
back home during the week of June 3rd — to set expectations.
Thanks for coming back to me quickly
…On Tue, 21 May 2024 at 23:21, dajmeister ***@***.***> wrote:
Hi @heitorlessa <https://github.com/heitorlessa>, thanks for responding!
The pydantic import is definitely very slow when there is no compiled
bytecode. Your profile_pydantic.log file shows an import time of 1.371
seconds. This however gets much better on subsequent imports which reuse
the bytecode. In your profile_pydantic_plus_data_validation.log file the
pydantic import is only 0.073 seconds.
Your profile_pydantic_plus_data_validation.log example didn't actually
trigger the import I was interested in. You hit ModuleNotFoundError: No
module named 'jmespath' when running: from
aws_lambda_powertools.event_handler import APIGatewayRestResolver (see:
#4340
<#4340>).
app.enable_swagger(path="dummy") is the line of code specifically that
triggers the "slow" import of interest. Per the profile almost all of the
time is spent in the aws_lambda_powertools.event_handler.openapi.models
module. I'm not sure what aspect of that module is contributing to the long
import time.
—
Reply to this email directly, view it on GitHub
<#4372 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZPQBAPU7T63NUFMFFEIUTZDNC33AVCNFSM6AAAAABH54LAFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGYZDINJXGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
My 2 cents here. I know that every millisecond matters when running on Lambda, but I believe we may be experiencing a more significant performance regression in some utility/function/class in V2, which could be challenging to detect using tools like Tuna or similar profiling utilities due to Pydanticv2 be a pre-compiled library and we can't inspect methods/functions in detail. I'll continue digging into the codebase to see if we have a way to improve this. With model_rebuildWithout model_rebuild |
Hey everyone! I spent some time investigating this a bit further and excluded Powertools from some tests to create a scenario where we are only using Pydantic, without any other dependencies. This helps isolate the issue to understand how Pydantic v1/v2 is behaving in this case. From my investigation, it seems the way Pydantic handles importing the library and performing validation/serialization has changed between the previous version and the current v2 release. I'm giving my personal opinion here, I can't say that I'm 100% sure, but I think it could be due to the refactoring they did to generate bindings in Rust and/or some underlying architectural changes made by the Pydantic team. The primary use of Powertools with Pydantic involves model validation and model serialization/deserialization. To reproduce the problem, I performed some tests. Please consider this event for all the following tests: event = {
"name": "company x",
"address": "street y",
"employees": [
{
"name": "Leandro",
"roles": [
{
"role_name": "DevOps",
"date_start": "2010-01-01",
"date_end": "2011-01-01",
},
{
"role_name": "Developer",
"date_start": "2011-01-01",
"date_end": "2014-01-01",
},
]
},
{
"name": "X",
"roles": [
{
"role_name": "DevOps",
"date_start": "2010-01-01",
"date_end": "2011-01-01",
},
{
"role_name": "Developer",
"date_start": "2011-01-01",
"date_end": "2014-01-01",
},
]
}
]
} First testThis test is using only pydantic + tuna to see how they import Pydantic and I see that in v2 they are loading plugins + core + pydantic + libmetada + fields, while in v1 they just import Pydantic and that's it. In this test, we are just testing ColdStart situations. Pydantic v1 - codeimport datetime
from typing import List
import pydantic
class Role(pydantic.BaseModel):
role_name: str
date_start: datetime.date
date_end: datetime.date
class Employees(pydantic.BaseModel):
name: str
roles: List[Role]
class Company(pydantic.BaseModel):
name: str
address: str
employees: List[Employees]
parsed_model = Company.parse_obj(event)
dump_model = parsed_model.dict() Pydantic v1 - tunaImport + execution time: 0.032s Pydantic v2 - codeimport datetime
from typing import List
import pydantic
class Role(pydantic.BaseModel):
role_name: str
date_start: datetime.date
date_end: datetime.date
class Employees(pydantic.BaseModel):
name: str
roles: List[Role]
class Company(pydantic.BaseModel):
name: str
address: str
employees: List[Employees]
parsed_model = Company.model_validate(event)
dump_model = parsed_model.model_dump() Pydantic v2 - tunaImport + execution time: 0.053s Second testThis test is using only Pydantic + timeit to measure the execution time when ColdStart is not happening. We know that cold starts is an important thing when working with AWS Lambda and we can see that even after ColdStart (first interaction) the performance of v1 for simple operations remains faster than v2. Pydantic v1 - codeimport timeit
import pydantic
import_module = "import pydantic"
code = '''
import datetime
from typing import List
class Role(pydantic.BaseModel):
role_name: str
date_start: datetime.date
date_end: datetime.date
class Employees(pydantic.BaseModel):
name: str
roles: List[Role]
class Company(pydantic.BaseModel):
name: str
address: str
employees: List[Employees]
parsed_model = Company.parse_obj(event)
dump_model = parsed_model.dict()
'''
if __name__ == "__main__":
print("Pydantic version ---->", pydantic.__version__)
print("Executions ----->", timeit.repeat(stmt=code, setup=import_module, repeat=20, number=1000)) timeit Pydantic v1 - results(venv) ➜ model-with-union python timeit_pydantic.py
Pydantic version ----> 1.10.16
Executions -----> [0.3602069579064846, 0.3270717919804156, 0.3260737080127001, 0.32723504211753607, 0.32821470801718533, 0.3474197092000395, 0.331315791932866, 0.3305481248535216, 0.34439937490969896, 0.38623699988238513, 0.3627227919641882, 0.41134462505578995, 0.34216895792633295, 0.3431239160709083, 0.331632292130962, 0.32780124992132187, 0.3368915000464767, 0.33176445798017085, 0.32961120805703104, 0.32948400010354817] Pydantic v2 - codeimport timeit
import pydantic
import_module = "import pydantic"
code = '''
import datetime
from typing import List
class Role(pydantic.BaseModel):
role_name: str
date_start: datetime.date
date_end: datetime.date
class Employees(pydantic.BaseModel):
name: str
roles: List[Role]
class Company(pydantic.BaseModel):
name: str
address: str
employees: List[Employees]
parsed_model = Company.model_validate(event)
dump_model = parsed_model.model_dump()
'''
if __name__ == "__main__":
print("Pydantic version ---->", pydantic.__version__)
print("Executions ----->", timeit.repeat(stmt=code, setup=import_module, repeat=20, number=1000)) timeit Pydantic v2 - results(venv) ➜ model-with-union python timeit_pydantic.py
Pydantic version ----> 2.7.4
Executions -----> [0.7977909999899566, 0.7425271249376237, 0.7596947909332812, 0.7473582909442484, 0.7580573339946568, 0.766909166937694, 0.7376072078477591, 0.7410291249398142, 0.7420287081040442, 0.7721679999958724, 0.7524414169602096, 0.7514087499585003, 0.7521860001143068, 0.7402477920986712, 0.745006832992658, 0.7440242499578744, 0.7420089999213815, 0.7393557080067694, 0.7388893750030547, 0.7491392500232905] Third testUsing Pyperf to run a benchmark - Check Minimum, Mediam, Mean and Maximum values. Pydantic v1 - codeimport pyperf
runner = pyperf.Runner()
import_module = "import pydantic"
code = '''
import datetime
from typing import List
class Role(pydantic.BaseModel):
role_name: str
date_start: datetime.date
date_end: datetime.date
class Employees(pydantic.BaseModel):
name: str
roles: List[Role]
class Company(pydantic.BaseModel):
name: str
address: str
employees: List[Employees]
parsed_model = Company.parse_obj(event)
dump_model = parsed_model.dict()
'''
runner.timeit(name="Pydantic test",
stmt=code,
setup=import_module) Pyperf pydantic v1 - result
Pydantic v2 - codeimport pyperf
runner = pyperf.Runner()
import_module = "import pydantic"
code = '''
import datetime
from typing import List
class Role(pydantic.BaseModel):
role_name: str
date_start: datetime.date
date_end: datetime.date
class Employees(pydantic.BaseModel):
name: str
roles: List[Role]
class Company(pydantic.BaseModel):
name: str
address: str
employees: List[Employees]
parsed_model = Company.model_validate(event)
dump_model = parsed_model.model_dump()
'''
runner.timeit(name="Pydantic test",
stmt=code,
setup=import_module) Pyperf pydantic v2 - result
Fourth testUsing this project from @samuelcovin, the creator of Pydantic, we have better performance in Pydantic v2 compared to v1. However, this usage scenario is not representative of how we use Pydantic in our application. We do not typically execute validation and go through 100k+ records in our utilities. Our use case is more modest, where we may have around 10k+ records when customers use the BatchProcessor and Parser, but even this is considered an edge case. When running this project with 10 to 20 records, I don't observe any performance difference. I hope to continue receiving feedback from the community about this issue, but from my perspective, there is not much we can do on the Powertools side. |
@dajmeister Not a solution, but at least you'll know that they are working on it: |
I'm closing this issue because there is already a discussion going on in the Pydantic repository. I'm watching the Pydantic thread and as soon as we see any progress on Pydantic's performance issues, we'll let the customer knows in our release notes. Pydantic thread: pydantic/pydantic#6748 |
|
Expected Behaviour
The enable_swagger method can be used without such a large impact to initialization time.
Current Behaviour
When running the APIGatewayRestResolver.enable_swagger function the following import statement is executed:
This import runs for ~300ms (when using Pydantic 2). I suspect this is due to the
model_rebuild()
s executed at the end of theaws_lambda_powertools.event_handler.openapi.models
module.profile.log
Code snippet
Possible Solution
No response
Steps to Reproduce
Put this code on a Python file;
Install tuna package using pip install tuna;
Run command
Observe the ~300ms runtime of the import
aws_lambda_powertools.event_handler.middlewares.openapi_validation
Powertools for AWS Lambda (Python) version
latest
AWS Lambda function runtime
3.11
Packaging format used
PyPi
Debugging logs
No response
The text was updated successfully, but these errors were encountered: