BUG: `dict_keys` cannot be used as `pd.read_csv`'s `names` parameter #36928

abmyii · 2020-10-06T21:59:28Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

test.csv:

a,b
1,2

import pandas as pd
data = {'a': 10, 'b': 20}
pd.read_csv('test.csv', names=data.keys())

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 691, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 451, in _read
    _validate_names(kwds.get("names", None))
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 424, in _validate_names
    raise ValueError("Names should be an ordered collection.")
ValueError: Names should be an ordered collection.

Problem description

When a dict_keys object is passed, the keys aren't used as names. This is because dict_keys isn't list_like (

pandas/pandas/io/parsers.py

Line 423 in 4e55346

if not is_list_like(names, allow_sets=False):

) or indexable, so the issue is understandable, but it would be nice to be able to pass dict_keys without issue.

Expected Output

Passing the dict keys to names shouldn't fail.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 4e55346
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-118-generic
Version : #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+632.g4e553464f
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.1.3
Cython : None
pytest : None
hypothesis : None
sphinx : 1.6.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : 4.6.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

dsaxton · 2020-10-07T00:42:51Z

Thanks @abmyii, probably we should be making an exception for ordered sets here. PR welcome.

ref #34956
cc @MJafarMashhadi

abmyii · 2020-10-07T08:15:47Z

probably we should be making an exception for ordered sets here. PR welcome.

PR submitted, but I don't know of a type which would work for all ordered sets, so I simply used the abc.KeysView which fixes the immediate issue. Is that acceptable or does more work need to be done?

abmyii added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 6, 2020

abmyii changed the title ~~BUG: dict.keys() cannot be used as pd.read_csv's names parameter~~ BUG: dict_keys cannot be used as pd.read_csv's names parameter Oct 6, 2020

dsaxton added IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 7, 2020

dsaxton added this to the 1.1.4 milestone Oct 7, 2020

abmyii mentioned this issue Oct 7, 2020

BUG: GH36928 Allow dict_keys to be used as column names by read_csv #36937

Merged

5 tasks

jreback closed this as completed in #36937 Oct 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `dict_keys` cannot be used as `pd.read_csv`'s `names` parameter #36928

BUG: `dict_keys` cannot be used as `pd.read_csv`'s `names` parameter #36928

abmyii commented Oct 6, 2020

INSTALLED VERSIONS

dsaxton commented Oct 7, 2020 •

edited

Loading

abmyii commented Oct 7, 2020

BUG: dict_keys cannot be used as pd.read_csv's names parameter #36928

BUG: dict_keys cannot be used as pd.read_csv's names parameter #36928

Comments

abmyii commented Oct 6, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

dsaxton commented Oct 7, 2020 • edited Loading

abmyii commented Oct 7, 2020

BUG: `dict_keys` cannot be used as `pd.read_csv`'s `names` parameter #36928

BUG: `dict_keys` cannot be used as `pd.read_csv`'s `names` parameter #36928

Output of `pd.show_versions()`

dsaxton commented Oct 7, 2020 •

edited

Loading