read_csv: if parse_dates dont appear in use_cols, we get a trace #31251

teto · 2020-01-23T16:41:05Z

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
import io

content = io.StringIO('''
time,val
212.23, 32
''')

date_cols = ['time']

df = pd.read_csv(
    content,
    sep=',',
    usecols=['val'],
    dtype= { 'val': int },
    parse_dates=date_cols,
)

Problem description

triggers

Traceback (most recent call last):
  File "test.py", line 16, in <module>
    parse_dates=date_cols,
  File "/nix/store/k4fd48jzsyafvcifa6wi6pk4vaprnw36-python3.7-pandas-0.25.3/lib/python3.7/site-packages/pandas/io/parsers.py",
line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/nix/store/k4fd48jzsyafvcifa6wi6pk4vaprnw36-python3.7-pandas-0.25.3/lib/python3.7/site-packages/pandas/io/parsers.py",
line 463, in _read
    data = parser.read(nrows)
  File "/nix/store/k4fd48jzsyafvcifa6wi6pk4vaprnw36-python3.7-pandas-0.25.3/lib/python3.7/site-packages/pandas/io/parsers.py",
line 1154, in read
    ret = self._engine.read(nrows)
  File "/nix/store/k4fd48jzsyafvcifa6wi6pk4vaprnw36-python3.7-pandas-0.25.3/lib/python3.7/site-packages/pandas/io/parsers.py",
line 2134, in read
    names, data = self._do_date_conversions(names, data)
  File "/nix/store/k4fd48jzsyafvcifa6wi6pk4vaprnw36-python3.7-pandas-0.25.3/lib/python3.7/site-packages/pandas/io/parsers.py",
line 1885, in _do_date_conversions
    keep_date_col=self.keep_date_col,
  File "/nix/store/k4fd48jzsyafvcifa6wi6pk4vaprnw36-python3.7-pandas-0.25.3/lib/python3.7/site-packages/pandas/io/parsers.py",
line 3335, in _process_date_conversion
    data_dict[colspec] = converter(data_dict[colspec])
KeyError: 'time'

i.e., if you use columns in parse_dates that dont appear in use_cols, then you are screwed.

Either parse_dates sould be added to use_cols or the documentation precise it. An assert could make the error more understandable too.
pandas 0.25.3

The text was updated successfully, but these errors were encountered:

murphybrendan · 2020-01-24T01:16:21Z

I think this might have been fixed. I'm not able to reproduce that error in 1.0.0rc.

mroeschke · 2020-01-24T07:00:10Z

Looks fixed on master as well. Could use a regression test since it doesn't appear explicitly fixed in the whatsnew notes for 1.0.0rc

ShaharNaveh · 2020-01-24T19:26:27Z

I am getting the same error as well. (Python 3.8.1)

INSTALLED VERSIONS

commit : aad377d
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.13.a-1-hardened
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.0rc0+190.gaad377dd7
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.1
setuptools : 45.1.0
Cython : 0.29.14
pytest : 5.3.4
hypothesis : 5.3.0
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.4
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0

teto · 2020-01-24T19:31:08Z

Thanks for looking into it.
One thing I don't get in the api is why dates are considered different than converters. Because they are multicolumn ? we could have a multicolumn converter or ignore dates altogether/have multicolumn converters (I mention this just in case 1.0 is allowed to break API).

sathyz · 2020-03-08T02:10:55Z

take.

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Jan 24, 2020

github-actions bot assigned ShaharNaveh Jan 24, 2020

ShaharNaveh removed their assignment Jan 24, 2020

sathyz added a commit to sathyz/pandas that referenced this issue Feb 1, 2020

BUG: parase_dates column is in dataframe (pandas-dev#31251)

cc64192

sathyz mentioned this issue Feb 1, 2020

BUG: parase_dates column is in dataframe (#31251) #31550

Closed

5 tasks

jschendel added this to the 1.1 milestone Feb 1, 2020

sathyz added a commit to sathyz/pandas that referenced this issue Feb 8, 2020

BUG: add parse_dates check in cparser (pandas-dev#31251)

9cd1a30

sathyz added a commit to sathyz/pandas that referenced this issue Feb 8, 2020

BUG: parase_dates column is in dataframe (pandas-dev#31251)

8225c8b

sathyz added a commit to sathyz/pandas that referenced this issue Feb 8, 2020

BUG: add parse_dates check in cparser (pandas-dev#31251)

63dc85e

sathyz mentioned this issue Feb 9, 2020

check parser_dates names in columns #31815

Closed

5 tasks

jbrockmendel added the IO CSV read_csv, to_csv label Feb 25, 2020

sathyz mentioned this issue Feb 28, 2020

BUG: parse_dates may have columns not in dataframe #32320

Merged

5 tasks

github-actions bot assigned ishanema1 Mar 5, 2020

ishanema1 removed their assignment Mar 5, 2020

WillAyd closed this as completed in #32320 Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv: if parse_dates dont appear in use_cols, we get a trace #31251

read_csv: if parse_dates dont appear in use_cols, we get a trace #31251

teto commented Jan 23, 2020 •

edited

Loading

murphybrendan commented Jan 24, 2020

mroeschke commented Jan 24, 2020

ShaharNaveh commented Jan 24, 2020

INSTALLED VERSIONS

teto commented Jan 24, 2020

sathyz commented Mar 8, 2020

read_csv: if parse_dates dont appear in use_cols, we get a trace #31251

read_csv: if parse_dates dont appear in use_cols, we get a trace #31251

Comments

teto commented Jan 23, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

murphybrendan commented Jan 24, 2020

mroeschke commented Jan 24, 2020

ShaharNaveh commented Jan 24, 2020

INSTALLED VERSIONS

teto commented Jan 24, 2020

sathyz commented Mar 8, 2020

teto commented Jan 23, 2020 •

edited

Loading