Skip to content

read_csv: chunksize clashes with nrows #6774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Apr 3, 2014 · 4 comments · Fixed by #7085
Closed

read_csv: chunksize clashes with nrows #6774

michaelaye opened this issue Apr 3, 2014 · 4 comments · Fixed by #7085
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@michaelaye
Copy link
Contributor

If nrows are determined, the use of chunksize option does not create a TextFileReader object.

reader = pd.read_table(fname, sep=',',chunksize=4, na_values=['null'],
                       nrows=20)
type(reader)
pandas.core.frame.DataFrame

My suggestion:

  • Either put into docs that they are not to be used concurrently
  • or it's a feature request that they can.

I would find it useful to get chunks of x size, but only for the first n row of a huge file.

@jreback jreback added this to the 0.14.0 milestone Apr 3, 2014
@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

I think easiest ATM to simply raise NotImplementedError if both nrows and chunksize are specified.

Implemented is a bit non-trivial but would be useful I agree.

You want to do a PR for the NotImplementedError? Then we'll create an issue to implement this at some point.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 21, 2014
@michaelaye
Copy link
Contributor Author

I would like to but find io/parsers.py quite confusing.

read_csv is 'declared' here:

449 read_csv = _make_parser_function('read_csv', sep=',')
450 read_csv = Appender(_read_csv_doc)(read_csv)

_make_parser_function defines a parser_f on line 311, and that definition has both the chunksize and nrows option, but is it true, that they are in no case, so for none of the generated parsers, work together? In that case should I implemented right there, at parser_f definition?

@jreback
Copy link
Contributor

jreback commented May 8, 2014

@jreback jreback modified the milestones: 0.14.1, 0.15.0 May 30, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 17, 2014
michaelaye added a commit to michaelaye/pandas that referenced this issue Jun 24, 2014
…hunksize.

For read_csv() the user intention most likely is to get a TextFileReader, when using the chunksize option, but simultaneous use of nrows is not implemented yet. This raises now a NotImplementedError. Test and entry to current whatsnew source (v0.14.1.txt) added.
Fixes pandas-dev#6774
@jreback jreback modified the milestones: 0.14.1, 0.15.0 Jun 24, 2014
@myidealab
Copy link

myidealab commented Sep 29, 2017

Is there a workaround for this issue? I am trying to pass in different parameters for testing and production.

Testing: chunksize=None, nrows=n
Production: chunksize=i, nrows =None

Production works fine, but when I try to implement the testing version, I receive the same error as others: NotImplementedError: 'nrows' and 'chunksize' cannot be used together yet.

**Edit: I ended up utilizing a conditional statement and added a parameter for version type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants