Skip to content

Unhelpful error message when loading a single column with read_csv and usecols #20529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattmotoki opened this issue Mar 29, 2018 · 10 comments · Fixed by #20558
Closed

Unhelpful error message when loading a single column with read_csv and usecols #20529

mattmotoki opened this issue Mar 29, 2018 · 10 comments · Fixed by #20558
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@mattmotoki
Copy link

Code

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0,1], 'x1': [2,3]})
>>> df.to_csv('tmp.csv', index=False)
>>> pd.read_csv('tmp.csv', usecols='x')
   x
0  0
1  1
>>> pd.read_csv('tmp.csv', usecols=['x1'])
   x1
0   2
1   3
>>> pd.read_csv('tmp.csv', usecols='x1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1049, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/matt/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1740, in __init__
    raise ValueError("Usecols do not match names.")
ValueError: Usecols do not match names.

Problem description

When using usecols to load a single column, one needs to have either a single-character column name or provide an array-like object. In the example above, pd.read_csv('tmp.csv', usecols='x') and pd.read_csv('tmp.csv', usecols=['x1']) work as expected; however, things break down for pd.read_csv('tmp.csv', usecols='x1'). The corresponding error message ValueError: Usecols do not match names. is not very helpful either.

Expected Output

It would be nice if there were some type checking done on usecols so that things don't break in the example above. At the least, the error message should be a bit more helpful; e.g., ValueError: Usecols should be array-like.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: None
dateutil: 2.7.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

The error message is improved on master:

ValueError: Usecols do not match columns, columns expected but not found: ['1']

I'm trying to figure out what's going on, I think this is just buggy.

It would be nice if there were some type checking done on usecols so that things don't break in the example above.

Yeah, we should ensure that usecols is not a string, right?

cc @gfyoung do you have thoughts here?

@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label IO CSV read_csv, to_csv labels Mar 29, 2018
@gfyoung
Copy link
Member

gfyoung commented Mar 29, 2018

Yeah, we should ensure that usecols is not a string, right?

Agreed. This "array-like" should have been disallowed.

@gfyoung gfyoung added Bug and removed IO Data IO issues that don't fit into a more specific label labels Mar 29, 2018
@mattmotoki
Copy link
Author

mattmotoki commented Mar 30, 2018

I'm trying to figure out what's going on, I think this is just buggy.

@TomAugspurger It looks like pandas is iterating through the characters in the string. In particular, if usecols='x1', then it's first looking for the column 'x' then it's looking for the column '1'.

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0,1], '1':[2,3], 'x1': [4,5]})
>>> df.to_csv('tmp.csv', index=False)
>>> pd.read_csv('tmp.csv', usecols='x1')
   1  x
0  2  0
1  3  1
>>> 

@minggli
Copy link
Contributor

minggli commented Mar 30, 2018

happy to work on this issue, if @mattmotoki is happy to let me :)

@mattmotoki
Copy link
Author

@minggli I'd be more than happy if you worked on this, but I don't think I have any authority on that.

@gfyoung
Copy link
Member

gfyoung commented Mar 30, 2018

@mattmotoki : Actually, you do, since it was your issue 😄 . @minggli go for it!

@jreback jreback added this to the Next Major Release milestone Mar 30, 2018
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Difficulty Intermediate and removed Bug labels Mar 30, 2018
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 30, 2018
@mattmotoki
Copy link
Author

@gfyoung @minggli Oops, sorry about that; I'm new to the open source community.
Is there anything else that I need to do to keep this process running smoothly?

@gfyoung
Copy link
Member

gfyoung commented Mar 30, 2018

Nope, you're good! Though feel free to checkout #20558, which will fix your issue.

@minggli
Copy link
Contributor

minggli commented Mar 30, 2018

@mattmotoki welcome. :}

@mattmotoki
Copy link
Author

@gfyoung @minggli Great work guys! I checked out #20558 and I'd be happy to close this issue whenever it's okay to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants