-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_json chunksize parameter does not work #41905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
Is there any way you could post the code and/or dataset that you attempted to run with? Just tested it myself using the same Yelp dataset as one of the comments from the issue (#36791) you linked and it seems that chunksize worked with both a value of 100 and 1000. |
This is the link:
http://download.companieshouse.gov.uk/en_pscdata.html
Below is trying to read chunksize=10 with the full 6GB file. Tried this in
windows and in wsl/ubuntu.
For testing try one of the smaller files from the link. This fits in memory
but takes a very long time and returns 500K rows.
[image: image.png]
…On Thu, 10 Jun 2021 at 06:29, Armando Acosta ***@***.***> wrote:
Is there any way you could post the code and/or dataset that you attempted
to run with? Just tested it myself using the same Yelp dataset as one of
the comments from the issue (#36791
<#36791>) you linked and it
seems that chunksize worked with both a value of 100 and 1000.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#41905 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJE32KUXKAUQG7CR4EJTSTTSBETJANCNFSM46MQW3BQ>
.
|
This is the code:
|
Oh. I thought you called read to read a chunk but actually it is an iterator. Sorry my mistake! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If I set chunksize=100 without nrows then it crashes with a memory error. If I set chunksize and nrows then it returns the maximum rows of the two. The workaround of setting nrows to a high number does not work as it just crashes with a memory error even with low chunksize.
I am using ubuntu20 wsl2/windows10, python=3.8, pandas=1.2.4
This has been reported in other closed issues but does not seem to be fixed.
#34548
#36791
The text was updated successfully, but these errors were encountered: