unhelpful error message when header is a list of names in read_csv #16338

pierre-haessig · 2017-05-12T10:00:50Z

This is a minor issue about error reporting to the mindless user (me...) who confuses the header and the name argument of read_csv. Basically, when calling read_csv with header=['a', 'b'] (whereas it should be names=['a', 'b']), the error message is crytic:

TypeError: must be str, not int

(pandas 0.20.1, see details below)

Two issues:

unhelpful, quite cryptic message, doesn't point in the good direction. E.g. it doesn't explain which argument causes the problem. Of course in the dummy example below, there is just one argument, but in the real case where I got bitten it was messier...
it is impossible to debug with %debug magic, because error is raised in the compiled code parsers.pyx

Here is code to reproduce the error message, taken from a IPython session. (First line may be a bit Unix specific, sorry. It's just to create a dummy CSV file)

In [] !echo '1,2\n3,4' > 1234.csv

In [] pd.read_csv('1234.csv')
	1 	2
0 	3 	4

In [] pd.read_csv('1234.csv', names=['a', 'b']) # proper call
 	a 	b
0 	1 	2
1 	3 	4

In [] pd.read_csv('1234.csv', header=['a', 'b']) # beginer's mistake

TypeError                                 Traceback (most recent call last)
<ipython-input-5-b065bd1f57c6> in <module>()
----> 1 pd.read_csv('1234.csv', header=['a', 'b'])

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654 
--> 655         return _read(filepath_or_buffer, kwds)
    656 
    657     parser_f.__name__ = name

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    403 
    404     # Create the parser.
--> 405     parser = TextFileReader(filepath_or_buffer, **kwds)
    406 
    407     if chunksize or iterator:

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    760             self.options['has_index_names'] = kwds['has_index_names']
    761 
--> 762         self._make_engine(self.engine)
    763 
    764     def close(self):

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    964     def _make_engine(self, engine='c'):
    965         if engine == 'c':
--> 966             self._engine = CParserWrapper(self.f, **self.options)
    967         else:
    968             if engine == 'python':

/home/pierre/Programmes/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1580         kwds['allow_leading_cols'] = self.index_col is not False
   1581 
-> 1582         self._reader = parsers.TextReader(src, **kwds)
   1583 
   1584         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:5996)()

TypeError: must be str, not int

Expected Output

I'm not expecting a fancy AI-assistant like error message. However, an early check of the header argument should verify, in coherence with the docstring, that header should be int or list of ints.

What do you think? Is it an overkill?

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-2-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.utf8 LOCALE: fr_FR.UTF-8

pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-12T10:08:44Z

yeah this could be improved, always helpful to have unambiguous error messages. pull-requests welcome!

mjlove12 · 2017-05-14T13:36:49Z

@jreback I submitted a pull-request for this that raises an error very similar to the one @pierre-haessig suggests, but it looks like the Travis CI Build failed. Not entirely sure how to proceed. It's my first PR so apologies in advance if the process is a little painful.

jaypeedevlin · 2017-05-15T22:57:28Z

@mjlove12

According to this test failure The command "ci/lint.sh" exited with 1. If you don't know about linting, it's making sure the layout of your code is correct. I would make sure you conform with PEP8 which will probably take care of this.

The second failure is more an actual test failure. You can view the output here which seems to be a plot failure.

mjlove12 · 2017-05-15T23:08:39Z

Thanks for the help! I'll take a look at those and see what needs to be done.

smcinerney · 2020-02-11T00:45:23Z

FYI this now gives ValueError: header must be integer or list of integers. Great.

jreback added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels May 12, 2017

jreback added Difficulty Novice labels May 12, 2017

jreback added this to the Next Major Release milestone May 12, 2017

mjlove12 mentioned this issue May 14, 2017

ENH: Improve error message for header argument containing non int types. GH16338 #16351

Merged

4 tasks

TomAugspurger closed this as completed in #16351 May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unhelpful error message when header is a list of names in read_csv #16338

unhelpful error message when header is a list of names in read_csv #16338

pierre-haessig commented May 12, 2017

jreback commented May 12, 2017

mjlove12 commented May 14, 2017

jaypeedevlin commented May 15, 2017

mjlove12 commented May 15, 2017

smcinerney commented Feb 11, 2020 •

edited

Loading

unhelpful error message when header is a list of names in read_csv #16338

unhelpful error message when header is a list of names in read_csv #16338

Comments

pierre-haessig commented May 12, 2017

Expected Output

Output of pd.show_versions()

jreback commented May 12, 2017

mjlove12 commented May 14, 2017

jaypeedevlin commented May 15, 2017

mjlove12 commented May 15, 2017

smcinerney commented Feb 11, 2020 • edited Loading

Output of `pd.show_versions()`

smcinerney commented Feb 11, 2020 •

edited

Loading