Skip to content

BUG: read_json chunksize parameter does not work #41905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonm3 opened this issue Jun 9, 2021 · 4 comments
Closed

BUG: read_json chunksize parameter does not work #41905

simonm3 opened this issue Jun 9, 2021 · 4 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@simonm3
Copy link

simonm3 commented Jun 9, 2021

If I set chunksize=100 without nrows then it crashes with a memory error. If I set chunksize and nrows then it returns the maximum rows of the two. The workaround of setting nrows to a high number does not work as it just crashes with a memory error even with low chunksize.

I am using ubuntu20 wsl2/windows10, python=3.8, pandas=1.2.4

This has been reported in other closed issues but does not seem to be fixed.

#34548
#36791

@simonm3 simonm3 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 9, 2021
@aacosta13
Copy link

Is there any way you could post the code and/or dataset that you attempted to run with? Just tested it myself using the same Yelp dataset as one of the comments from the issue (#36791) you linked and it seems that chunksize worked with both a value of 100 and 1000.

@simonm3
Copy link
Author

simonm3 commented Jun 10, 2021 via email

@simonm3
Copy link
Author

simonm3 commented Jun 10, 2021

This is the code:

import pandas as pd
#path = r"C:\Users\simon\Downloads\raw\persons-with-significant-control-snapshot-2021-06-05.txt"
path = r"C:\Users\simon\Downloads\raw\psc-snapshot-2021-06-10_1of19.txt"
df = pd.read_json(path, lines=True, chunksize=10)
df.read()

@simonm3
Copy link
Author

simonm3 commented Jun 10, 2021

Oh. I thought you called read to read a chunk but actually it is an iterator. Sorry my mistake!

@simonm3 simonm3 closed this as completed Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants