-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add nrows to read json. #33962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nrows to read json. #33962
Conversation
…d json for read_json pandas-dev#33916
569db1c
to
fc4993f
Compare
Hey @jreback |
f72ea60
to
896de23
Compare
896de23
to
ca9c3e0
Compare
Co-authored-by: William Ayd <[email protected]>
40a5c0a
to
a0a55c9
Compare
a0a55c9
to
74e9c2b
Compare
d386e49
to
ecdbc10
Compare
Hey @jreback Hope you are having a great day So this PR is basically for the addition of the feature of selecting the number of rows to be returned when the lines parameter is set. According to the benchmarks I believe that feature is working fine. So is it possible that we merge this feature and then create a new issue for the concerns that we are having with the chunk size feature? I have got a grip on this section of the codebase so I should be able to do that too but I believe we should have separate PR's for them. Thanks and Regards |
@hasnain2808 yep that sounds fine. please add a whatsnew (about the nrows feature) and create an issue about chunksize & perf which can certainly be a followup. ping on green (for the added whatsnew), put in other enhancements in 1.1 |
…d-nrows-to-read-json issues with merging
Hey @jreback |
@hasnain2808 looks like another merge conflict - can you fix up? So the benchmarks did show an improvement right? Not entirely clear from the link that you sent; usually at the end of the run you should get something saying PERFORMANCE IMPROVED . I might be overlooking that from what you shared As long as that shows an improvement this lgtm |
Hope you're having a nice day I will fix the merge conflicts Yes the memory consumption and time required is low when we use the nrows parameter. As the nrows parameter is optional, we added a new benchmark for this parameter hence, the benchmark results do not show the PERFORMANCE IMPROVED verdict. Pasted the results into the spreadsheet to compare easily Raw dumps are here Please do tell me if there is something else that's needed to be done |
…d-nrows-to-read-json "solve merge conflicts while merging to master"
Fixed the merge conflicts. |
e9c54fe
to
2ce74db
Compare
I do not think this failure is related The benchmark is failing around 0.64%
|
…d-nrows-to-read-json merge conflicts
18bf6e7
to
133aef9
Compare
ping |
…d-nrows-to-read-json "get commits for solved benchmark issues"
Ping |
thanks @hasnain2808 |
Thanks @jreback |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Add the nrows to read_json parameter that returns only the required number of json from the line delimited json