-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Unhelpful error message when loading a single column with read_csv
and usecols
#20529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The error message is improved on master:
I'm trying to figure out what's going on, I think this is just buggy.
Yeah, we should ensure that cc @gfyoung do you have thoughts here? |
Agreed. This "array-like" should have been disallowed. |
@TomAugspurger It looks like pandas is iterating through the characters in the string. In particular, if >>> import pandas as pd
>>> df = pd.DataFrame({'x': [0,1], '1':[2,3], 'x1': [4,5]})
>>> df.to_csv('tmp.csv', index=False)
>>> pd.read_csv('tmp.csv', usecols='x1')
1 x
0 2 0
1 3 1
>>> |
happy to work on this issue, if @mattmotoki is happy to let me :) |
@minggli I'd be more than happy if you worked on this, but I don't think I have any authority on that. |
@mattmotoki : Actually, you do, since it was your issue 😄 . @minggli go for it! |
Nope, you're good! Though feel free to checkout #20558, which will fix your issue. |
@mattmotoki welcome. :} |
Code
Problem description
When using
usecols
to load a single column, one needs to have either a single-character column name or provide an array-like object. In the example above,pd.read_csv('tmp.csv', usecols='x')
andpd.read_csv('tmp.csv', usecols=['x1'])
work as expected; however, things break down forpd.read_csv('tmp.csv', usecols='x1')
. The corresponding error messageValueError: Usecols do not match names.
is not very helpful either.Expected Output
It would be nice if there were some type checking done on
usecols
so that things don't break in the example above. At the least, the error message should be a bit more helpful; e.g.,ValueError: Usecols should be array-like.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: None
dateutil: 2.7.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: