Skip to content

Pre-compile all python code and remove original .py files. #476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 22, 2024

Conversation

purple4reina
Copy link
Contributor

What does this PR do?

Ships python layer with only pre-built bytecode binaries and none of the original *.py files.

Motivation

For python 3.12, reduces cold start by >24%. Also reduces memory overhead and runtime duration by small but statistically significant amounts.

Screenshot 2024-04-17 at 3 15 40 PM

Similar overhead improvements are seen with python 3.9.

Screenshot 2024-04-17 at 2 55 32 PM

Additional Notes

Impact of prebuilding bytecode binaries

When the python interpreter starts up, it always first looks for any *.pyc files and if found, loads those instead of the *.py files. If it is loading any *.py files, it will first compile those into bytecode and save to disk so that the next call is faster. See flow chart:

flow

Note that these *.pyc files are not actually "compiled" code, because python is still an interpreted language. They are also not platform dependent, meaning we can build these bytecode files anywhere and expect them to work everywhere.

These bytecode files do not change the running/execution of the python code. The code will still run exactly the same and at the same speed. What changes instead is the time it takes to load the instructions, prior to them being executed.

Impact of removing *.py files

In addition to pre-compiling all *.pyc files, this PR also removes all *.py files. As can be seen from flow chart and discussion above, this will make no change to the execution of the code.

What is potentially lost is the ability for the interpreter to add proper line numbers and code samples to tracebacks. TODO: confirm this and add example here.

Impact of compiling at "optimization level" 2

In addition to pre-compiling bytecode files, this PR also builds those files using "optimization level" 2 by including -o 2 in the commandline args.

From the python docs

-O
Remove assert statements and any code conditional on the value of debug. Augment the filename for compiled (bytecode) files by adding .opt-1 before the .pyc extension (see PEP 488). See also PYTHONOPTIMIZE.

Changed in version 3.5: Modify .pyc filenames according to PEP 488.

-OO
Do -O and also discard docstrings. Augment the filename for compiled (bytecode) files by adding .opt-2 before the .pyc extension (see PEP 488).

Changed in version 3.5: Modify .pyc filenames according to PEP 488.

Assert statements. On April 18, 2024 I attended the python guild meeting and discussed this change with the team. They said that ddtrace should not require execution of assert statements in order to run properly. If there is a failure due to the remove of the assert statements, then this is considered a bug and should be filed with them directly. I ran several of the ddtrace unittest suites using python -OO -m pytest and found that the same tests failed using this as did when I ran the tests using python -m pytest.

__debug__. The -o 2 flag also removes all code conditional on __debug__. I did a grep of ddtrace and found no instances of this variable in the code base. I also did a grep for it in all repos I currently have checked out on my laptop and did not find any instances which would affect us.

Docstrings. It is safe to remove the docstrings. The only time the customer would care to use them is in a python shell where they can run help(my_func) which will then print the docstring to the screen. This is clearly not necessary for the running of any lambda function. Removing them also reduces the size of the layer binary by a noticeable amount.

Impact of compiling with env var PYTHONNODEBUGRANGES=1

This env var is only available in python 3.11 and newer. See python docs:

PYTHONNODEBUGRANGES
If this variable is set, it disables the inclusion of the tables mapping extra location information (end line, start column offset and end column offset) to every instruction in code objects. This is useful when smaller code objects and pyc files are desired as well as suppressing the extra visual location indicators when the interpreter displays tracebacks.

New in version 3.11.

Adding this env var leads to a noticeable reduction in cold start time (approx 0.5-1%) for python 3.12. I believe what is lost by including this is the column information displayed in any traceback. As these are not required, the trade off here (debug info for cold start improvement) I believe is worth it.

Testing Guidelines

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

https://datadoghq.atlassian.net/browse/SVLS-4713

@purple4reina
Copy link
Contributor Author

Errors

We can see what is lost because we've removed the *.py files. Line numbers are still present in the tracebacks. What is lost is the line of code itself. I would argue that these are unnecessary and losing them is worth the cost.

Chances are errors are not going to be caused by our code and thus would just be a distraction to helping the user find the real origin of their error. Also, if they really want to see the line of code, it's all open sourced.

Raw

Before
Screenshot 2024-04-18 at 1 50 24 PM

After
Screenshot 2024-04-18 at 1 51 45 PM

Pretty

Before
Screenshot 2024-04-18 at 1 50 30 PM

After
Screenshot 2024-04-18 at 1 51 51 PM

@purple4reina purple4reina marked this pull request as ready for review April 18, 2024 21:00
@purple4reina purple4reina requested a review from a team as a code owner April 18, 2024 21:01
Dockerfile Outdated
RUN rm ./python/lib/$runtime/site-packages/ddtrace/appsec/_iast/_taint_tracking/*.so
RUN rm ./python/lib/$runtime/site-packages/ddtrace/appsec/_iast/_stacktrace*.so
RUN rm ./python/lib/$runtime/site-packages/ddtrace/internal/datadog/profiling/libdd_wrapper.so
RUN rm ./python/lib/$runtime/site-packages/ddtrace/internal/datadog/profiling/ddup/_ddup.*.so
RUN rm ./python/lib/$runtime/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.*.so

RUN PYTHONNODEBUGRANGES=1 python -m compileall -o 2 -b ./python/lib/$runtime/site-packages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add comments just to remind what's happening here?

Copy link
Contributor

@duncanista duncanista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM – would like to see this in self monitoring for a while before we release, it might be amazing to see this changes go there!

@purple4reina purple4reina force-pushed the rey.abolofia/compileall branch 2 times, most recently from 8a73394 to df7bad0 Compare April 19, 2024 18:49
MAX_LAYER_COMPRESSED_SIZE_KB=$(expr 4 \* 1024)
MAX_LAYER_UNCOMPRESSED_SIZE_KB=$(expr 13 \* 1024)
MAX_LAYER_COMPRESSED_SIZE_KB=$(expr 5 \* 1024)
MAX_LAYER_UNCOMPRESSED_SIZE_KB=$(expr 12 \* 1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bytecode is harder to compress (hence increase from 4->5) but uncompressed is much smaller (hence decrease from 13->12)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Could we update this changes in serverless-tools once this is merged? Thanks!

@purple4reina purple4reina force-pushed the rey.abolofia/compileall branch from 583f37e to 582e612 Compare April 22, 2024 17:57
@purple4reina purple4reina force-pushed the rey.abolofia/compileall branch from 60a64f5 to 43947e9 Compare April 22, 2024 17:58
@purple4reina purple4reina merged commit 21012cb into main Apr 22, 2024
51 checks passed
@purple4reina purple4reina deleted the rey.abolofia/compileall branch April 22, 2024 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants