-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Memory leak in json encoding for time related objects #40443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @WillAyd |
Wow thanks for the details - very nice debugging. At an initial glance I think you are correct. We might need to make those date time functions look similar to pandas/pandas/_libs/src/ujson/python/objToJSON.c Line 1989 in 7af47c9
|
Thanks for the reply. I think there is still an issue there. All the function linked to the As an example, pandas/pandas/_libs/src/ujson/python/objToJSON.c Lines 1560 to 1561 in 7af47c9
pandas/pandas/_libs/src/ujson/python/objToJSON.c Lines 321 to 326 in 3b4aec2
As specified by the cpython documentation:
On the contrary for the function May be, we could use a |
Oh sorry, now that I have read the code one more time, I understand why using the cStr struct member would be a good solution. Indeed. |
Hi, I'm gonna piggyback on this issue a bit to mention that I'm pretty sure it's not just time related objects, see my comment here on a closed issue from quite a while ago (I wasn't sure if to create a new ticket): #24889 (comment) There I used a DF which didn't have any time data, just numeric columns and the
|
@mikicz: I have ran your small test script using Valgrind and it appears the memory leak comes from numpy.
It's clear both those issues do not have the same cause. |
Thanks for doing that! Do you think I should report this to numpy directly or make a new issue here? I don't really understand pandas internals, so am not sure if the issue is in numpy directly or in the usage of numpy here? |
I know I'm coming late to the party but I would like to recommend memray. I would've never be able to detect this one python-lz4/python-lz4#247 without it. |
Hello, indeed I have not done much about this issue at the moment. It is not always easy to find time 😅. The profiler memray looks very nice. |
Hello,
While using pandas in my project, I saw the memory usage of my process raising. After some digging, it looks like there is a memory leak in the JSON encoding code.
This simple test should be able to reproduce the issue:
Which ran using Valgrind should show that kind of result:
Which points to the
int64ToIso()
function in this case but mostly any function used in thegetStringValue()
function is allocating memory and this memory appears to not be freed after that (if I'm not missing something).pandas/pandas/_libs/src/ujson/lib/ultrajsonenc.c
Line 1082 in 2d51ebb
It would be great if someone can confirm my deduction. If I'm right, I will try to submit a PR.
Thanks.
The environment I used:
The text was updated successfully, but these errors were encountered: