BUG: Nrows cannot be zero for read_csv. Fixes #21141 #21431

cgopalan · 2018-06-11T22:24:38Z

closes read_csv errors when low_memory=True, index_col is not None, and nrows=0 #21141
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

gfyoung · 2018-06-11T22:42:53Z

@jreback : Do we need to deprecate nrow=0 before disallowing? It's a (small, yet annoying) corner case, which is why I ask about it.

@cgopalan : This will be important, since it determines the type of whatsnew entry you're going to add.

WillAyd · 2018-06-11T22:44:42Z

pandas/tests/io/test_excel.py

@@ -995,11 +995,18 @@ def test_read_excel_nrows_greater_than_nrows_in_file(self, ext):

    def test_read_excel_nrows_non_integer_parameter(self, ext):


Would be better to parametrize this test for cases like [-1, 0, '1'] instead of having separate tests

@WillAyd sure, I will make these changes as soon as we decide if we are keeping this PR :)

gfyoung · 2018-06-11T22:46:03Z

pandas/tests/io/parser/common.py

@@ -375,6 +375,9 @@ def test_read_nrows(self):
        with tm.assert_raises_regex(ValueError, msg):
            self.read_csv(StringIO(self.data1), nrows=-1)

+        with tm.assert_raises_regex(ValueError, msg):


Can you reference the issue number in a comment above this line.

jschendel · 2018-06-12T00:04:20Z

I'm not sure about disallowing nrows=0. It's generally useful when you need to dynamically read a subset of columns, where you do a two pass procedure along the lines of:

Read in the columns only, e.g. all_cols = pd.read_csv('file.csv', nrows=0).columns
Dynamically filter the columns, e.g. good_cols = [c for c in all_cols if c.startswith('foo')]
Read in only the desired columns, e.g. df = pd.read_csv('file.csv', usecols=good_cols)

I doubt you lose much performance by using nrows=1 instead, so it's not the end of the world if this does get removed. However, there are some established StackOverflow answers with nrows=0 similar to my procedure above, so I imagine there are some users that have nrows=0 in their codebase. If this is going to be removed, it definitely seems like it should be deprecated first.

gfyoung · 2018-06-12T00:08:41Z

@jschendel : Good to know. In the case of your example, couldn't you just read the first line of the file using the builtin open ? Don't really need pandas for that. 🤷‍♂️

jschendel · 2018-06-12T00:17:11Z

Yeah, I'm not saying that it's the right solution, just that it's a solution people are likely using. Certainly more concise than using open if you method chain to do the filtering. I wouldn't complain if this was removed, but doing so without deprecating seems likely to break things for users.

gfyoung · 2018-06-12T00:18:16Z

I wouldn't complain if this was removed, but doing so without deprecating seems likely to break things for users.

@jschendel : Nope, that's totally fair. Let's deprecate then.

@cgopalan : Can you update your PR to deprecate nrows=0 ?

jreback

nrows=0 is valid
the combinations of options is invalid

pr needs to limit scope

gfyoung · 2018-06-12T00:27:05Z

nrows=0 is valid
the combinations of options is invalid

@jreback : nrows=0 has limited use and was a clean way to patch the corner case (though it seems like @jschendel has a case to keep it).

In any case, should we just disallow that combination of inputs then? If nrows=0 is considered valid, I'm a little uncertain as to why we would consider the combination invalid (it shouldn't depend on what the value of low_memory is IMO).

codecov · 2018-06-12T00:27:21Z

Codecov Report

Merging #21431 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #21431   +/-   ##
=======================================
  Coverage   91.89%   91.89%           
=======================================
  Files         153      153           
  Lines       49596    49596           
=======================================
  Hits        45576    45576           
  Misses       4020     4020

Flag	Coverage Δ
#multiple	`90.29% <100%> (ø)`	⬆️
#single	`41.86% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.46% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4807905...947168b. Read the comment docs.

cgopalan · 2018-06-12T01:39:52Z

@jreback @gfyoung i too agree that just the value of low_memory should not render the combination of values invalid. Because of this corner case, this is actually exhibiting weird behavior on an unrelated set of conditions. Better to suppress this corner case by removing/deprecating nrows=0.
Is there a way to deprecate just a value for an arg? I would assume we want to keep nrows as an argument.

jreback · 2018-06-12T11:23:17Z

@jreback @gfyoung i too agree that just the value of low_memory should not render the combination of values invalid. Because of this corner case, this is actually exhibiting weird behavior on an unrelated set of conditions. Better to suppress this corner case by removing/deprecating nrows=0.
Is there a way to deprecate just a value for an arg? I would assume we want to keep nrows as an argument.

no reason to remove or even deprecate nrows=0, you simply need to addees for this particular combination of options

cgopalan · 2018-06-12T14:29:52Z

@jreback so read_csv(StringIO(data), low_memory=False, index_col=0, nrows=0) is allowed but read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0) is not?

gfyoung · 2018-06-12T20:26:09Z

no reason to remove or even deprecate nrows=0, you simply need to addees for this particular combination of options

@jreback : What you're saying sounds like we should revisit #21176?

jreback · 2018-06-12T23:00:28Z

yeah i thought that was fine

gfyoung · 2018-06-12T23:02:35Z

@jreback : 🤦‍♂️ well, I guess we're heading back there then. 😄

cgopalan · 2018-06-13T00:38:24Z

@gfyoung should i delete this branch?

gfyoung · 2018-06-13T00:39:26Z

@cgopalan : Sure thing.

cgopalan added 2 commits June 11, 2018 18:19

BUG: Nrows cannot be zero for read_csv. Fixes pandas-dev#21141

c22f6d4

Corrected issue number

947168b

gfyoung added the API Design label Jun 11, 2018

gfyoung added this to the 0.24.0 milestone Jun 11, 2018

gfyoung added the IO CSV read_csv, to_csv label Jun 11, 2018

WillAyd requested changes Jun 11, 2018

View reviewed changes

gfyoung mentioned this pull request Jun 11, 2018

BUG: read_csv with specified kwargs #21176

Merged

1 task

gfyoung reviewed Jun 11, 2018

View reviewed changes

gfyoung removed this from the 0.24.0 milestone Jun 11, 2018

gfyoung added the Deprecate Functionality to remove in pandas label Jun 12, 2018

jreback requested changes Jun 12, 2018

View reviewed changes

gfyoung closed this Jun 12, 2018

cgopalan deleted the read_csv_nrows_arg branch June 13, 2018 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Nrows cannot be zero for read_csv. Fixes #21141 #21431

BUG: Nrows cannot be zero for read_csv. Fixes #21141 #21431

cgopalan commented Jun 11, 2018 •

edited

Loading

gfyoung commented Jun 11, 2018 •

edited

Loading

WillAyd Jun 11, 2018

cgopalan Jun 12, 2018

gfyoung Jun 11, 2018

jschendel commented Jun 12, 2018

gfyoung commented Jun 12, 2018

jschendel commented Jun 12, 2018

gfyoung commented Jun 12, 2018 •

edited

Loading

jreback left a comment

gfyoung commented Jun 12, 2018 •

edited

Loading

codecov bot commented Jun 12, 2018 •

edited

Loading

cgopalan commented Jun 12, 2018

jreback commented Jun 12, 2018

cgopalan commented Jun 12, 2018 •

edited

Loading

gfyoung commented Jun 12, 2018

jreback commented Jun 12, 2018

gfyoung commented Jun 12, 2018

cgopalan commented Jun 13, 2018

gfyoung commented Jun 13, 2018

		@@ -995,11 +995,18 @@ def test_read_excel_nrows_greater_than_nrows_in_file(self, ext):

		def test_read_excel_nrows_non_integer_parameter(self, ext):

BUG: Nrows cannot be zero for read_csv. Fixes #21141 #21431

BUG: Nrows cannot be zero for read_csv. Fixes #21141 #21431

Conversation

cgopalan commented Jun 11, 2018 • edited Loading

gfyoung commented Jun 11, 2018 • edited Loading

WillAyd Jun 11, 2018

Choose a reason for hiding this comment

cgopalan Jun 12, 2018

Choose a reason for hiding this comment

gfyoung Jun 11, 2018

Choose a reason for hiding this comment

jschendel commented Jun 12, 2018

gfyoung commented Jun 12, 2018

jschendel commented Jun 12, 2018

gfyoung commented Jun 12, 2018 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

gfyoung commented Jun 12, 2018 • edited Loading

codecov bot commented Jun 12, 2018 • edited Loading

Codecov Report

cgopalan commented Jun 12, 2018

jreback commented Jun 12, 2018

cgopalan commented Jun 12, 2018 • edited Loading

gfyoung commented Jun 12, 2018

jreback commented Jun 12, 2018

gfyoung commented Jun 12, 2018

cgopalan commented Jun 13, 2018

gfyoung commented Jun 13, 2018

cgopalan commented Jun 11, 2018 •

edited

Loading

gfyoung commented Jun 11, 2018 •

edited

Loading

gfyoung commented Jun 12, 2018 •

edited

Loading

gfyoung commented Jun 12, 2018 •

edited

Loading

codecov bot commented Jun 12, 2018 •

edited

Loading

cgopalan commented Jun 12, 2018 •

edited

Loading