read_csv: chunksize clashes with nrows #6774

michaelaye · 2014-04-03T05:14:21Z

If nrows are determined, the use of chunksize option does not create a TextFileReader object.

reader = pd.read_table(fname, sep=',',chunksize=4, na_values=['null'],
                       nrows=20)
type(reader)

pandas.core.frame.DataFrame

My suggestion:

Either put into docs that they are not to be used concurrently
or it's a feature request that they can.

I would find it useful to get chunks of x size, but only for the first n row of a huge file.

jreback · 2014-04-03T11:56:50Z

I think easiest ATM to simply raise NotImplementedError if both nrows and chunksize are specified.

Implemented is a bit non-trivial but would be useful I agree.

You want to do a PR for the NotImplementedError? Then we'll create an issue to implement this at some point.

michaelaye · 2014-05-08T18:51:49Z

I would like to but find io/parsers.py quite confusing.

read_csv is 'declared' here:

449 read_csv = _make_parser_function('read_csv', sep=',')
450 read_csv = Appender(_read_csv_doc)(read_csv)

_make_parser_function defines a parser_f on line 311, and that definition has both the chunksize and nrows option, but is it true, that they are in no case, so for none of the generated parsers, work together? In that case should I implemented right there, at parser_f definition?

jreback · 2014-05-08T18:53:42Z

look here: https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L230

…hunksize. For read_csv() the user intention most likely is to get a TextFileReader, when using the chunksize option, but simultaneous use of nrows is not implemented yet. This raises now a NotImplementedError. Test and entry to current whatsnew source (v0.14.1.txt) added. Fixes pandas-dev#6774

myidealab · 2017-09-29T15:16:30Z

Is there a workaround for this issue? I am trying to pass in different parameters for testing and production.

Testing: chunksize=None, nrows=n
Production: chunksize=i, nrows =None

Production works fine, but when I try to implement the testing version, I receive the same error as others: NotImplementedError: 'nrows' and 'chunksize' cannot be used together yet.

**Edit: I ended up utilizing a conditional statement and added a parameter for version type.

jreback added CSV labels Apr 3, 2014

jreback added this to the 0.14.0 milestone Apr 3, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Apr 21, 2014

michaelaye mentioned this issue May 9, 2014

adding a NotImplementedError for simultaneous use of nrows and chunksize... #7085

Merged

jreback added Error Reporting and removed Docs labels May 9, 2014

jreback modified the milestones: 0.14.1, 0.15.0 May 30, 2014

jreback modified the milestones: 0.15.0, 0.14.1 Jun 17, 2014

jreback modified the milestones: 0.14.1, 0.15.0 Jun 24, 2014

jreback closed this as completed in #7085 Jun 24, 2014

toobaz mentioned this issue Mar 21, 2017

nrows incompatible with chunksize in read_csv #15755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv: chunksize clashes with nrows #6774

read_csv: chunksize clashes with nrows #6774

michaelaye commented Apr 3, 2014

jreback commented Apr 3, 2014

michaelaye commented May 8, 2014

jreback commented May 8, 2014

myidealab commented Sep 29, 2017 •

edited

Loading

read_csv: chunksize clashes with nrows #6774

read_csv: chunksize clashes with nrows #6774

Comments

michaelaye commented Apr 3, 2014

jreback commented Apr 3, 2014

michaelaye commented May 8, 2014

jreback commented May 8, 2014

myidealab commented Sep 29, 2017 • edited Loading

myidealab commented Sep 29, 2017 •

edited

Loading