-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
memory error when skipping rows #8681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
if you specify That said, still could be a bug. Would appreciate you having a deeper look if you can. |
cc @mdmueller |
Hi, |
I know that skiprows can be a array of ints or a int. I think that if its simply an int, then we should use a more efficient code path to skip the rows instead of generating an 100M size array. |
yep that sounds right have a go! |
Do you have any suggestions on how best to tackle this issue Sent from my iPhone
|
so currently the skipped rows are transformed to a list of rows with range (this is why this blows memory up). You could instead change the impl a bit to pass a list of a list was original passed (e.g. You prob need to change the c-impl as well (so maybe make the original So a bit involved, but would be a nice change. |
This bug is listed in the latest release notes as closed: Is it actually fixed? |
it was linked incorrectly, PR is here: #8752 yes this is fixed. |
I have a file with over 100Million rows. When I do
pd.read_csv(filename, skiprows=100000000, iterator=True)
python crashes with a memory error. I have 32 gigs of memory and python eats up all that memory!!
The text was updated successfully, but these errors were encountered: