Skip to content

ENH: Improve error message for header argument containing non int types. GH16338 #16351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mjlove12
Copy link
Contributor

@mjlove12 mjlove12 commented May 14, 2017

Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments.

@TomAugspurger TomAugspurger added the Error Reporting Incorrect or improved errors from pandas label May 16, 2017
@TomAugspurger TomAugspurger added this to the 0.21.0 milestone May 16, 2017
@TomAugspurger
Copy link
Contributor

Lint failure here If you pip install flake8 you can run git diff upstream/master --name-only -- '*.py' | flake8 --diff locally.

The other failure looked unrelated, so I've restarted that job. Once you've fixed the linting error, you can push another commit to this branch.

@gfyoung
Copy link
Member

gfyoung commented May 16, 2017

Also need to test that you can hit this error message!

@pierre-haessig
Copy link
Contributor

Thanks @mjlove12 for picking this up.

The lint error is pandas/io/parsers.py:1188:1: W293 blank line contains whitespace. It will be fixed by removing the whitespace on line 1188 (which is a blank line).

@mjlove12
Copy link
Contributor Author

Thanks @pierre-haessig. I'm planning to make the changes this week,

@mjlove12 mjlove12 force-pushed the new-read_csv-header-comments branch from e5fd262 to 888bdd0 Compare May 20, 2017 13:58
@codecov
Copy link

codecov bot commented May 20, 2017

Codecov Report

Merging #16351 into master will decrease coverage by 0.04%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16351      +/-   ##
==========================================
- Coverage   90.41%   90.36%   -0.05%     
==========================================
  Files         161      161              
  Lines       50997    50918      -79     
==========================================
- Hits        46107    46012      -95     
- Misses       4890     4906      +16
Flag Coverage Δ
#multiple 88.13% <50%> (-0.11%) ⬇️
#single 40.22% <0%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.26% <50%> (-0.06%) ⬇️
pandas/core/util/hashing.py 90.35% <0%> (-2.62%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/io/pickle.py 79.54% <0%> (-1.71%) ⬇️
pandas/io/feather_format.py 86.66% <0%> (-1.22%) ⬇️
pandas/io/formats/excel.py 73.15% <0%> (-1.1%) ⬇️
pandas/io/sas/sasreader.py 85.18% <0%> (-1.03%) ⬇️
pandas/core/common.py 90.68% <0%> (-0.71%) ⬇️
pandas/io/excel.py 61.68% <0%> (-0.58%) ⬇️
pandas/core/generic.py 91.94% <0%> (-0.19%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f55de1...888bdd0. Read the comment docs.

@codecov
Copy link

codecov bot commented May 20, 2017

Codecov Report

Merging #16351 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16351      +/-   ##
==========================================
- Coverage   90.42%   90.41%   -0.02%     
==========================================
  Files         161      161              
  Lines       51027    51003      -24     
==========================================
- Hits        46142    46113      -29     
- Misses       4885     4890       +5
Flag Coverage Δ
#multiple 88.24% <100%> (-0.02%) ⬇️
#single 40.19% <25%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.33% <100%> (+0.01%) ⬆️
pandas/io/formats/format.py 95.66% <0%> (-0.37%) ⬇️
pandas/core/reshape/merge.py 93.94% <0%> (-0.24%) ⬇️
pandas/io/excel.py 62.26% <0%> (-0.06%) ⬇️
pandas/io/formats/style.py 95.91% <0%> (-0.02%) ⬇️
pandas/core/reshape/reshape.py 99.28% <0%> (-0.01%) ⬇️
pandas/errors/__init__.py 100% <0%> (ø) ⬆️
pandas/core/dtypes/concat.py 98.07% <0%> (ø) ⬆️
pandas/core/missing.py 84.27% <0%> (ø) ⬆️
pandas/core/frame.py 97.69% <0%> (ø) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04356a8...de005a0. Read the comment docs.

@mjlove12
Copy link
Contributor Author

Okay, I hope I did this right.

  1. Added extra lines to account for cases when the header is non-array-like.
  2. Added a test (test_non_int_header()) in the header.py file under the Header_Tests class which includes a non-array-like case and an array-like case.
  3. Added changes with "git add ."
  4. Committed the changes to my branch with "git commit -m 'Revising PR16351 based on feedback'"
  5. Rebased with "git rebase -i origin/master" and squashing the second commit instead of "picking" it
  6. Forced a push to origin with "git push origin new-read_csv-header-comments -f"

Feedback appreciated if any of those steps are discouraged.

Also, random question:
Can anyone explain why the test cases are wrapped in classes and how that works? I tried to match some of the other tests, but didn't understand why we were using the test classes to call methods like "self.read_csv" and "self.read_table" instead of just pd.read_csv.

Thanks for all the help!
Matt

@gfyoung
Copy link
Member

gfyoung commented May 20, 2017

@mjlove12 : Your steps are perfectly fine. Don't worry about it too much. 😄

As for why we use self.read_csv, it's because there are multiple engines that can be used when parsing a CSV. The two main ones are:

  • C engine: this is pandas in-house parser backed up in C and Cython
  • Python engine: this leverages Python's csv library

As a result, we need to test both, and we have test classes that implement their own read_csv methods to specify which engine to use (FYI read_csv defaults to the C engine).

@mjlove12 mjlove12 force-pushed the new-read_csv-header-comments branch from 888bdd0 to 7d8c174 Compare May 20, 2017 23:39
@mjlove12
Copy link
Contributor Author

@TomAugspurger not entirely sure I'm understanding the failure this time. I ran the command "git diff master --name-only -- '*.py' | grep 'pandas/' | xargs flake8" on the branch and didn't get any issues. Any help would be appreciated.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 23, 2017

@mjlove12 this test failure looks unrelated. I've restarted that worker. The other failure is a known issue we haven't been able to debug yet.

@TomAugspurger
Copy link
Contributor

You can also add a release note in doc/source/whatsnew/v0.21.0.txt if you want, under the bug fixes section.

…n int types. GH16338.

Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments.

Revising PR16351 based on feedback

Revising PR16351 lint issues

Adding release note in whatsnew v0.21.0 for PR16351
@mjlove12 mjlove12 force-pushed the new-read_csv-header-comments branch from 7d8c174 to de005a0 Compare May 23, 2017 22:52
@mjlove12
Copy link
Contributor Author

@TomAugspurger sounds good. I added a whatsnew entry under bug fixes.

@TomAugspurger TomAugspurger merged commit 7271f50 into pandas-dev:master May 24, 2017
@TomAugspurger
Copy link
Contributor

The appveyor failure is unrelated, and already fixed on master.

Thanks @mjlove12!

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
…n int types. GH16338. (pandas-dev#16351)

Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments.

Revising PR16351 based on feedback

Revising PR16351 lint issues

Adding release note in whatsnew v0.21.0 for PR16351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unhelpful error message when header is a list of names in read_csv
4 participants