ENH: Improve error message for header argument containing non int types. GH16338 #16351

mjlove12 · 2017-05-14T05:42:09Z

Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments.

closes unhelpful error message when header is a list of names in read_csv #16338
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

TomAugspurger · 2017-05-16T12:46:01Z

Lint failure here If you pip install flake8 you can run git diff upstream/master --name-only -- '*.py' | flake8 --diff locally.

The other failure looked unrelated, so I've restarted that job. Once you've fixed the linting error, you can push another commit to this branch.

gfyoung · 2017-05-16T19:46:52Z

Also need to test that you can hit this error message!

pierre-haessig · 2017-05-18T14:01:38Z

Thanks @mjlove12 for picking this up.

The lint error is pandas/io/parsers.py:1188:1: W293 blank line contains whitespace. It will be fixed by removing the whitespace on line 1188 (which is a blank line).

mjlove12 · 2017-05-18T14:17:30Z

Thanks @pierre-haessig. I'm planning to make the changes this week,

codecov · 2017-05-20T13:58:40Z

Codecov Report

Merging #16351 into master will decrease coverage by 0.04%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master   #16351      +/-   ##
==========================================
- Coverage   90.41%   90.36%   -0.05%     
==========================================
  Files         161      161              
  Lines       50997    50918      -79     
==========================================
- Hits        46107    46012      -95     
- Misses       4890     4906      +16

Flag	Coverage Δ
#multiple	`88.13% <50%> (-0.11%)`	⬇️
#single	`40.22% <0%> (+0.02%)`	⬆️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.26% <50%> (-0.06%)`	⬇️
pandas/core/util/hashing.py	`90.35% <0%> (-2.62%)`	⬇️
pandas/plotting/_converter.py	`63.23% <0%> (-1.82%)`	⬇️
pandas/io/pickle.py	`79.54% <0%> (-1.71%)`	⬇️
pandas/io/feather_format.py	`86.66% <0%> (-1.22%)`	⬇️
pandas/io/formats/excel.py	`73.15% <0%> (-1.1%)`	⬇️
pandas/io/sas/sasreader.py	`85.18% <0%> (-1.03%)`	⬇️
pandas/core/common.py	`90.68% <0%> (-0.71%)`	⬇️
pandas/io/excel.py	`61.68% <0%> (-0.58%)`	⬇️
pandas/core/generic.py	`91.94% <0%> (-0.19%)`	⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f55de1...888bdd0. Read the comment docs.

codecov · 2017-05-20T13:58:47Z

Codecov Report

Merging #16351 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16351      +/-   ##
==========================================
- Coverage   90.42%   90.41%   -0.02%     
==========================================
  Files         161      161              
  Lines       51027    51003      -24     
==========================================
- Hits        46142    46113      -29     
- Misses       4885     4890       +5

Flag	Coverage Δ
#multiple	`88.24% <100%> (-0.02%)`	⬇️
#single	`40.19% <25%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.33% <100%> (+0.01%)`	⬆️
pandas/io/formats/format.py	`95.66% <0%> (-0.37%)`	⬇️
pandas/core/reshape/merge.py	`93.94% <0%> (-0.24%)`	⬇️
pandas/io/excel.py	`62.26% <0%> (-0.06%)`	⬇️
pandas/io/formats/style.py	`95.91% <0%> (-0.02%)`	⬇️
pandas/core/reshape/reshape.py	`99.28% <0%> (-0.01%)`	⬇️
pandas/errors/__init__.py	`100% <0%> (ø)`	⬆️
pandas/core/dtypes/concat.py	`98.07% <0%> (ø)`	⬆️
pandas/core/missing.py	`84.27% <0%> (ø)`	⬆️
pandas/core/frame.py	`97.69% <0%> (ø)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04356a8...de005a0. Read the comment docs.

mjlove12 · 2017-05-20T14:13:14Z

Okay, I hope I did this right.

Added extra lines to account for cases when the header is non-array-like.
Added a test (test_non_int_header()) in the header.py file under the Header_Tests class which includes a non-array-like case and an array-like case.
Added changes with "git add ."
Committed the changes to my branch with "git commit -m 'Revising PR16351 based on feedback'"
Rebased with "git rebase -i origin/master" and squashing the second commit instead of "picking" it
Forced a push to origin with "git push origin new-read_csv-header-comments -f"

Feedback appreciated if any of those steps are discouraged.

Also, random question:
Can anyone explain why the test cases are wrapped in classes and how that works? I tried to match some of the other tests, but didn't understand why we were using the test classes to call methods like "self.read_csv" and "self.read_table" instead of just pd.read_csv.

Thanks for all the help!
Matt

gfyoung · 2017-05-20T19:02:10Z

@mjlove12 : Your steps are perfectly fine. Don't worry about it too much. 😄

As for why we use self.read_csv, it's because there are multiple engines that can be used when parsing a CSV. The two main ones are:

C engine: this is pandas in-house parser backed up in C and Cython
Python engine: this leverages Python's csv library

As a result, we need to test both, and we have test classes that implement their own read_csv methods to specify which engine to use (FYI read_csv defaults to the C engine).

mjlove12 · 2017-05-22T22:58:45Z

@TomAugspurger not entirely sure I'm understanding the failure this time. I ran the command "git diff master --name-only -- '*.py' | grep 'pandas/' | xargs flake8" on the branch and didn't get any issues. Any help would be appreciated.

TomAugspurger · 2017-05-23T14:20:11Z

@mjlove12 this test failure looks unrelated. I've restarted that worker. The other failure is a known issue we haven't been able to debug yet.

TomAugspurger · 2017-05-23T14:25:05Z

You can also add a release note in doc/source/whatsnew/v0.21.0.txt if you want, under the bug fixes section.

…n int types. GH16338. Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments. Revising PR16351 based on feedback Revising PR16351 lint issues Adding release note in whatsnew v0.21.0 for PR16351

mjlove12 · 2017-05-23T22:54:52Z

@TomAugspurger sounds good. I added a whatsnew entry under bug fixes.

TomAugspurger · 2017-05-24T13:28:07Z

The appveyor failure is unrelated, and already fixed on master.

Thanks @mjlove12!

…n int types. GH16338. (pandas-dev#16351) Adds error "header must be integer or list of integers" when the header argument is a list, tuple or numpy array containing non-integers. Initially intended to read_csv, but applies to other functions with similar header arguments. GH16338 refers to a case in which the user mixes up the "names" and "header" arguments. Revising PR16351 based on feedback Revising PR16351 lint issues Adding release note in whatsnew v0.21.0 for PR16351

TomAugspurger added the Error Reporting Incorrect or improved errors from pandas label May 16, 2017

TomAugspurger added this to the 0.21.0 milestone May 16, 2017

mjlove12 force-pushed the new-read_csv-header-comments branch from e5fd262 to 888bdd0 Compare May 20, 2017 13:58

mjlove12 force-pushed the new-read_csv-header-comments branch from 888bdd0 to 7d8c174 Compare May 20, 2017 23:39

mjlove12 force-pushed the new-read_csv-header-comments branch from 7d8c174 to de005a0 Compare May 23, 2017 22:52

TomAugspurger merged commit 7271f50 into pandas-dev:master May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Improve error message for header argument containing non int types. GH16338 #16351

ENH: Improve error message for header argument containing non int types. GH16338 #16351

mjlove12 commented May 14, 2017 •

edited

Loading

TomAugspurger commented May 16, 2017

gfyoung commented May 16, 2017

pierre-haessig commented May 18, 2017

mjlove12 commented May 18, 2017

codecov bot commented May 20, 2017

codecov bot commented May 20, 2017 •

edited

Loading

mjlove12 commented May 20, 2017

gfyoung commented May 20, 2017 •

edited

Loading

mjlove12 commented May 22, 2017

TomAugspurger commented May 23, 2017 •

edited

Loading

TomAugspurger commented May 23, 2017

mjlove12 commented May 23, 2017

TomAugspurger commented May 24, 2017

ENH: Improve error message for header argument containing non int types. GH16338 #16351

ENH: Improve error message for header argument containing non int types. GH16338 #16351

Conversation

mjlove12 commented May 14, 2017 • edited Loading

TomAugspurger commented May 16, 2017

gfyoung commented May 16, 2017

pierre-haessig commented May 18, 2017

mjlove12 commented May 18, 2017

codecov bot commented May 20, 2017

Codecov Report

codecov bot commented May 20, 2017 • edited Loading

Codecov Report

mjlove12 commented May 20, 2017

gfyoung commented May 20, 2017 • edited Loading

mjlove12 commented May 22, 2017

TomAugspurger commented May 23, 2017 • edited Loading

TomAugspurger commented May 23, 2017

mjlove12 commented May 23, 2017

TomAugspurger commented May 24, 2017

mjlove12 commented May 14, 2017 •

edited

Loading

codecov bot commented May 20, 2017 •

edited

Loading

gfyoung commented May 20, 2017 •

edited

Loading

TomAugspurger commented May 23, 2017 •

edited

Loading