BUG: Check that values for "nrows" and "chunksize" are valid #15774

toobaz · 2017-03-21T22:29:22Z

closes Proper checks on nrows and chunksize for read_csv #15767
tests added / passed
passes git diff master --name-only -- '*.py' | flake8 --diff
whatsnew entry

jreback

some minor comments

jreback · 2017-03-21T22:36:06Z

pandas/io/parsers.py

    an integer OR float that can SAFELY be cast to an integer
    without losing accuracy. Raises a ValueError if that is
    not the case.
    """
-    msg = "'nrows' must be an integer"
+    msg = "'%s' must be an integer >=%s" % (name, min_val)


can you use "{}".format(...) string formatting, ideally with named parameters

jreback · 2017-03-21T22:36:58Z

pandas/io/parsers.py

    """
-    Checks whether the 'nrows' parameter for parsing is either
+    Checks whether the 'name' parameter for parsing is either
    an integer OR float that can SAFELY be cast to an integer
    without losing accuracy. Raises a ValueError if that is
    not the case.
    """


can you add a Parameters section to the doc-string.

jreback · 2017-03-21T22:38:07Z

pandas/tests/io/parser/common.py

@@ -402,6 +405,18 @@ def test_read_chunksize(self):
        tm.assert_frame_equal(chunks[1], df[2:4])
        tm.assert_frame_equal(chunks[2], df[4:])

+        # with invalid chunksize value:


is there a test for nrows that is negative? and same for chunsize?

Well, there is a test for nrows=-1 and one for chunksize=0... which is already "too negative". So yes, there are tests for the positivity check.

jorisvandenbossche · 2017-03-21T22:40:55Z

Looks good!

Currently, chunksize=0 also 'works' (it was only negatives and floats between 0 and 1 that caused the infinite loop, chunksize=0 is somehow checked for), but just directly returning the full dataframe. I suppose because the 0 was just interpreted as 'False'.
I don't think we need to keep supporting this, however, as this is rather strange?

jreback · 2017-03-21T23:15:23Z

lgtm. ~~I agree with @jorisvandenbossche point, that chunksize==0 should be an error as well.~~

I c you did that.

ok ping on green.

codecov · 2017-03-22T00:24:47Z

Codecov Report

Merging #15774 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15774      +/-   ##
==========================================
- Coverage   91.01%   90.99%   -0.02%     
==========================================
  Files         143      143              
  Lines       49384    49384              
==========================================
- Hits        44947    44939       -8     
- Misses       4437     4445       +8

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.51% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.86% <0%> (-0.1%)`	⬇️
pandas/core/common.py	`91.3% <0%> (+0.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c9d46a...b21fdcf. Read the comment docs.

jorisvandenbossche · 2017-03-22T08:03:40Z

@toobaz Thanks!

toobaz · 2017-03-22T08:33:23Z

You're welcome!

…dev#15774)

jorisvandenbossche added IO CSV read_csv, to_csv Error Reporting Incorrect or improved errors from pandas labels Mar 21, 2017

jorisvandenbossche added this to the 0.20.0 milestone Mar 21, 2017

jreback requested changes Mar 21, 2017

View reviewed changes

BUG: Check that values for "nrows" and "chunksize" are valid

b21fdcf

toobaz force-pushed the check_chunksize branch from b7a2957 to b21fdcf Compare March 21, 2017 22:53

jreback approved these changes Mar 21, 2017

View reviewed changes

jorisvandenbossche merged commit a20009f into pandas-dev:master Mar 22, 2017

toobaz deleted the check_chunksize branch March 22, 2017 08:33

mattip pushed a commit to mattip/pandas that referenced this pull request Apr 3, 2017

BUG: Check that values for "nrows" and "chunksize" are valid (pandas-…

0dabcd7

…dev#15774)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Check that values for "nrows" and "chunksize" are valid #15774

BUG: Check that values for "nrows" and "chunksize" are valid #15774

toobaz commented Mar 21, 2017

jreback left a comment

jreback Mar 21, 2017

jreback Mar 21, 2017

jreback Mar 21, 2017

toobaz Mar 21, 2017

jorisvandenbossche commented Mar 21, 2017

jreback commented Mar 21, 2017 •

edited

Loading

codecov bot commented Mar 22, 2017

jorisvandenbossche commented Mar 22, 2017

toobaz commented Mar 22, 2017

BUG: Check that values for "nrows" and "chunksize" are valid #15774

BUG: Check that values for "nrows" and "chunksize" are valid #15774

Conversation

toobaz commented Mar 21, 2017

jreback left a comment

Choose a reason for hiding this comment

jreback Mar 21, 2017

Choose a reason for hiding this comment

jreback Mar 21, 2017

Choose a reason for hiding this comment

jreback Mar 21, 2017

Choose a reason for hiding this comment

toobaz Mar 21, 2017

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 21, 2017

jreback commented Mar 21, 2017 • edited Loading

codecov bot commented Mar 22, 2017

Codecov Report

jorisvandenbossche commented Mar 22, 2017

toobaz commented Mar 22, 2017

jreback commented Mar 21, 2017 •

edited

Loading