Skip to content

unhelpful error message when header is a list of names in read_csv #16338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pierre-haessig opened this issue May 12, 2017 · 5 comments · Fixed by #16351
Closed

unhelpful error message when header is a list of names in read_csv #16338

pierre-haessig opened this issue May 12, 2017 · 5 comments · Fixed by #16351
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv

Comments

@pierre-haessig
Copy link
Contributor

This is a minor issue about error reporting to the mindless user (me...) who confuses the header and the name argument of read_csv. Basically, when calling read_csv with header=['a', 'b'] (whereas it should be names=['a', 'b']), the error message is crytic:

TypeError: must be str, not int

(pandas 0.20.1, see details below)

Two issues:

  • unhelpful, quite cryptic message, doesn't point in the good direction. E.g. it doesn't explain which argument causes the problem. Of course in the dummy example below, there is just one argument, but in the real case where I got bitten it was messier...
  • it is impossible to debug with %debug magic, because error is raised in the compiled code parsers.pyx

Here is code to reproduce the error message, taken from a IPython session. (First line may be a bit Unix specific, sorry. It's just to create a dummy CSV file)

In [] !echo '1,2\n3,4' > 1234.csv

In [] pd.read_csv('1234.csv')
	1 	2
0 	3 	4

In [] pd.read_csv('1234.csv', names=['a', 'b']) # proper call
 	a 	b
0 	1 	2
1 	3 	4

In [] pd.read_csv('1234.csv', header=['a', 'b']) # beginer's mistake

TypeError                                 Traceback (most recent call last)
<ipython-input-5-b065bd1f57c6> in <module>()
----> 1 pd.read_csv('1234.csv', header=['a', 'b'])

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654 
--> 655         return _read(filepath_or_buffer, kwds)
    656 
    657     parser_f.__name__ = name

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    403 
    404     # Create the parser.
--> 405     parser = TextFileReader(filepath_or_buffer, **kwds)
    406 
    407     if chunksize or iterator:

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    760             self.options['has_index_names'] = kwds['has_index_names']
    761 
--> 762         self._make_engine(self.engine)
    763 
    764     def close(self):

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    964     def _make_engine(self, engine='c'):
    965         if engine == 'c':
--> 966             self._engine = CParserWrapper(self.f, **self.options)
    967         else:
    968             if engine == 'python':

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1580         kwds['allow_leading_cols'] = self.index_col is not False
   1581 
-> 1582         self._reader = parsers.TextReader(src, **kwds)
   1583 
   1584         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:5996)()

TypeError: must be str, not int

Expected Output

I'm not expecting a fancy AI-assistant like error message. However, an early check of the header argument should verify, in coherence with the docstring, that header should be int or list of ints.

What do you think? Is it an overkill?

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.utf8 LOCALE: fr_FR.UTF-8

pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels May 12, 2017
@jreback
Copy link
Contributor

jreback commented May 12, 2017

yeah this could be improved, always helpful to have unambiguous error messages. pull-requests welcome!

@mjlove12
Copy link
Contributor

@jreback I submitted a pull-request for this that raises an error very similar to the one @pierre-haessig suggests, but it looks like the Travis CI Build failed. Not entirely sure how to proceed. It's my first PR so apologies in advance if the process is a little painful.

@jaypeedevlin
Copy link

@mjlove12

According to this test failure The command "ci/lint.sh" exited with 1. If you don't know about linting, it's making sure the layout of your code is correct. I would make sure you conform with PEP8 which will probably take care of this.

The second failure is more an actual test failure. You can view the output here which seems to be a plot failure.

@mjlove12
Copy link
Contributor

Thanks for the help! I'll take a look at those and see what needs to be done.

@smcinerney
Copy link

smcinerney commented Feb 11, 2020

FYI this now gives ValueError: header must be integer or list of integers. Great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants