-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_json does not respect chunksize #38293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…son chunksize.
Thanks. Mind adding a unit test and whatsnew entry for version 1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ideally this change should have broken a test (obviously) we don't have one. pls add a test which breaks on master and this change fixes (look in the original issues, where we want to replicate)
I've added a test and addressed the comments. |
can you also post the benchmarks for line read json (should show that this improves vs master) |
I don't expect it to improve raw performance, it's about memory usage for large files. |
we have memory benchmarks to measure this |
@robertwb if you can post the memory benchmarks |
the output is maybe we need to change the benchmark to see a bigger effect.
|
so how do we know that this is working (and the previous was not), sure this is a difference, but wouldn't you expect a HUGE diffrence here? (e.g. maybe the benchmark should be reading with chunksize=1 or something) vs no chunksize. |
we do a concat of the chunks in the benchmark. maybe could just iterate through the chunks without the concat. am currently running with chunksize=1 to compare. |
oh yeah the concat will increase memory :-> don't compare that |
changing the chunksize to 10 (1 fails) make a bit more difference (to the memory used on master!)
|
presumably we do the concat to compare the results with peakmem_read_json_lines
|
ok fair. @simonjayhawkins wouldn't object to changing the benchmarks (but this PR is fine). |
thanks @robertwb |
@meeseeksdev backport 1.2.x |
Co-authored-by: Robert Bradshaw <[email protected]>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff