Skip to content

BUG: to_datetime throws TypeError: unhashable type: 'list' even with errors='ignore' #39756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adamerose opened this issue Feb 11, 2021 · 6 comments · Fixed by #40414
Closed
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@adamerose
Copy link

adamerose commented Feb 11, 2021

I'm getting this error when I try converting certain Series to datetime using pd.to_datetime, even though I pass errors='ignore'

_ = pd.to_datetime(pd.Series([['a']]*50), errors='ignore')  # works
_ = pd.to_datetime(pd.Series([['a']]*51), errors='ignore')  # error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python38\lib\site-packages\pandas\core\tools\datetimes.py", line 801, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  File "C:\Python38\lib\site-packages\pandas\core\tools\datetimes.py", line 173, in _maybe_cache
    if not should_cache(arg):
  File "C:\Python38\lib\site-packages\pandas\core\tools\datetimes.py", line 137, in should_cache
    unique_elements = set(islice(arg, check_count))
TypeError: unhashable type: 'list'
@adamerose adamerose added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 11, 2021
@MarcoGorelli
Copy link
Member

Thanks @adamerose - could you please upload your in a format other than pickle?


https://docs.python.org/3/library/pickle.html :

Warning

The pickle module is not secure. Only unpickle data you trust.

It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with hmac if you need to ensure that it has not been tampered with.

Safer serialization formats such as json may be more appropriate if you are processing untrusted data. See Comparison with json.

@MarcoGorelli MarcoGorelli added the Needs Info Clarification about behavior needed to assess issue label Feb 12, 2021
@bjornjorgensen
Copy link

Hi you can load it from my arangodb server

import pandas as pd
import json
import re
from pandasgui import show
from arango import ArangoClient

client = ArangoClient(hosts='http://bjhjemme.duckdns.org:8529')

db = client.db('test_2015-2020', username='test', password='test')

aql = db.aql

cursor = db.aql.execute('FOR c IN FORM_3 RETURN c', batch_size=1)
result = [c for c in cursor]

df = pd.json_normalize(result, max_level=None)

df = df.drop(columns=['_key', '_id', '_rev'])
df = df.dropna(axis=1, how='all')
show(df) #show are for pandagui.

You can brows the files in http://bjhjemme.duckdns.org:8529 logg in with test and test

@adamerose
Copy link
Author

This reproduces it with no other packages needed:

import pandas as pd
import json
s_raw = pd.read_csv('https://github.com/adamerose/temp/raw/main/test.csv')['test']
s = s_raw.apply(lambda x: json.loads(x.replace('\'', '"')) if type(x) == str else x)

# these work
x = pd.to_datetime(s[:35], errors='ignore')
x = pd.to_datetime(s[25:], errors='ignore')
# this doesn't
x = pd.to_datetime(s, errors='ignore')

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 21, 2021

Thanks @adamerose

simpler reproducer:

>>> _ = pd.to_datetime(pd.Series([['a']]*50), errors='ignore')  # works
>>> _ = pd.to_datetime(pd.Series([['a']]*51), errors='ignore')  # error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marco/pandas-marco/pandas/core/tools/datetimes.py", line 814, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  File "/home/marco/pandas-marco/pandas/core/tools/datetimes.py", line 186, in _maybe_cache
    if not should_cache(arg):
  File "/home/marco/pandas-marco/pandas/core/tools/datetimes.py", line 150, in should_cache
    unique_elements = set(islice(arg, check_count))
TypeError: unhashable type: 'list'

@MarcoGorelli MarcoGorelli removed the Needs Info Clarification about behavior needed to assess issue label Feb 21, 2021
@rhshadrach
Copy link
Member

@adamrose - I've removed the link to your pickle file in the OP. In general one should not load pickles unless the source is trusted as they can execute arbitrary code.

@adamerose
Copy link
Author

@rhendric Understood, no problem. I only resorted to posting that because I couldn't figure out a way to reproduce it programmatically initially. Thanks for providing that @MarcoGorelli, I'll edit that into the OP

@lithomas1 lithomas1 added Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 7, 2021
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Mar 7, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants