Skip to content

Fix Memory Leak in to_json with Numeric Values #26239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 30, 2019

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Apr 29, 2019

It looks like the extension module is unnecessarily incrementing the reference count for numeric objects and never releasing it, which causes a leak in to_json.

Still trying to grok exactly what the purpose of npyCtxtPassthru (comment in same file mentions it is required when encoding multi-dimensional arrays).

Removing the check for PyArray_ISDATETIME caused segfaults but that doesn't appear to leak memory anyway so most likely intentional to increment that ref count.

ASV results:

       before           after         ratio
     [9feb3ad9]       [9bfd45d3]
     <master>         <json-mem-fix~1>
-           69.1M            57.7M     0.84  io.json.ToJSONMem.peakmem_float
-           69.1M            57.7M     0.83  io.json.ToJSONMem.peakmem_int

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Apr 29, 2019
@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #26239 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26239      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.52% <ø> (ø) ⬆️
#single 40.69% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...24e1a23. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #26239 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26239      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.52% <ø> (ø) ⬆️
#single 40.69% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...24e1a23. Read the comment docs.

@jreback jreback added this to the 0.25.0 milestone Apr 30, 2019
@jreback jreback merged commit 5cb006f into pandas-dev:master Apr 30, 2019
@jreback
Copy link
Contributor

jreback commented Apr 30, 2019

thanks @WillAyd

@WillAyd WillAyd deleted the json-mem-fix branch January 16, 2020 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in df.to_json
2 participants