Skip to content

BUG: Check that values for "nrows" and "chunksize" are valid #15774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 22, 2017

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Mar 21, 2017

@jorisvandenbossche jorisvandenbossche added IO CSV read_csv, to_csv Error Reporting Incorrect or improved errors from pandas labels Mar 21, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.20.0 milestone Mar 21, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments

an integer OR float that can SAFELY be cast to an integer
without losing accuracy. Raises a ValueError if that is
not the case.
"""
msg = "'nrows' must be an integer"
msg = "'%s' must be an integer >=%s" % (name, min_val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use "{}".format(...) string formatting, ideally with named parameters

"""
Checks whether the 'nrows' parameter for parsing is either
Checks whether the 'name' parameter for parsing is either
an integer OR float that can SAFELY be cast to an integer
without losing accuracy. Raises a ValueError if that is
not the case.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Parameters section to the doc-string.

@@ -402,6 +405,18 @@ def test_read_chunksize(self):
tm.assert_frame_equal(chunks[1], df[2:4])
tm.assert_frame_equal(chunks[2], df[4:])

# with invalid chunksize value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test for nrows that is negative? and same for chunsize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there is a test for nrows=-1 and one for chunksize=0... which is already "too negative". So yes, there are tests for the positivity check.

@jorisvandenbossche
Copy link
Member

Looks good!

Currently, chunksize=0 also 'works' (it was only negatives and floats between 0 and 1 that caused the infinite loop, chunksize=0 is somehow checked for), but just directly returning the full dataframe. I suppose because the 0 was just interpreted as 'False'.
I don't think we need to keep supporting this, however, as this is rather strange?

@jreback
Copy link
Contributor

jreback commented Mar 21, 2017

lgtm. I agree with @jorisvandenbossche point, that chunksize==0 should be an error as well.

I c you did that.

ok ping on green.

@codecov
Copy link

codecov bot commented Mar 22, 2017

Codecov Report

Merging #15774 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15774      +/-   ##
==========================================
- Coverage   91.01%   90.99%   -0.02%     
==========================================
  Files         143      143              
  Lines       49384    49384              
==========================================
- Hits        44947    44939       -8     
- Misses       4437     4445       +8
Impacted Files Coverage Δ
pandas/io/parsers.py 95.51% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.86% <0%> (-0.1%) ⬇️
pandas/core/common.py 91.3% <0%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c9d46a...b21fdcf. Read the comment docs.

@jorisvandenbossche jorisvandenbossche merged commit a20009f into pandas-dev:master Mar 22, 2017
@jorisvandenbossche
Copy link
Member

@toobaz Thanks!

@toobaz toobaz deleted the check_chunksize branch March 22, 2017 08:33
@toobaz
Copy link
Member Author

toobaz commented Mar 22, 2017

You're welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proper checks on nrows and chunksize for read_csv
3 participants