Skip to content

Fix memory leak with ujson module #49466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 7, 2022

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Nov 2, 2022

This is mostly vendored from ujson upstream

https://github.com/ultrajson/ultrajson/blob/main/python/ujson.c

Using a static PyObject the way we did before is strongly discouraged. This is more verbose but should be more correct, although interestingly pushes us back to the legacy module initialization instead of PEP 489 multi phase initialization

@WillAyd
Copy link
Member Author

WillAyd commented Nov 2, 2022

@JelleZijlstra if you have the ability to confirm this patch fixes your issue would be appreciated

@mroeschke mroeschke added Performance Memory or execution speed performance IO JSON read_json, to_json, json_normalize labels Nov 2, 2022
@akx
Copy link
Contributor

akx commented Nov 4, 2022

As an aside, has Pandas considered https://github.com/ijl/orjson (a Rust-based JSON (de)serializer)?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WillAyd. Do you think it's worth adding and ASV peakmem_* benchmarks or are we sufficiently covered?

@WillAyd
Copy link
Member Author

WillAyd commented Nov 4, 2022

Its a pretty small memory leak. I don't think it would even register on a peakmem ASV. Let's see if it helps @JelleZijlstra and can go from there

Copy link

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I can confirm this removes the leak. I left a few comments on the code.

@mroeschke this memleak only appears when pandas is unimported from a running process. It's unlikely to affect many current use cases.

@mroeschke mroeschke added this to the 2.0 milestone Nov 7, 2022
@mroeschke
Copy link
Member

Looks good @WillAyd. Would be good to have a whatsnew note for this then LGTM

@mroeschke mroeschke merged commit ef23fc7 into pandas-dev:main Nov 7, 2022
@mroeschke
Copy link
Member

Thanks @WillAyd

@WillAyd WillAyd deleted the ujson-state-cleanup branch November 8, 2022 16:23
phofl pushed a commit to phofl/pandas that referenced this pull request Nov 9, 2022
* Fix memory leak with ujson module

* fixups

* Whatsnew
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* Fix memory leak with ujson module

* fixups

* Whatsnew
@rhshadrach
Copy link
Member

This patch may have induced a potential regression. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected appear as subbullets. This is a partially automated message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Reference leak in pandas/_libs/src/ujson/python/objToJSON.c
5 participants