Skip to content

BUG: pd.util.hash_array fails on DatetimeIndex with tz specified #41817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
TheNeuralBit opened this issue Jun 4, 2021 · 1 comment
Closed
3 tasks done
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@TheNeuralBit
Copy link
Contributor

TheNeuralBit commented Jun 4, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00'], tz='Europe/Berlin'))

Output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-389898c5af02> in <module>
----> 1 pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00'], tz='Europe/Berlin'))

~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    255         return _hash_categorical(vals, encoding, hash_key)
    256     elif is_extension_array_dtype(dtype):
--> 257         vals, _ = vals._values_for_factorize()
    258         dtype = vals.dtype
    259 

AttributeError: 'DatetimeIndex' object has no attribute '_values_for_factorize'

Apparently datetime64[ns, Europe/Berlin] is an extension array dtype but has no _values_for_factorize method. I've reproduced on pandas 1.1.4, 1.2.4, and on master (503ce50)

Problem description

pd.util.hash_array works with other Indexes, including a timezone-naive DatetimeIndex, it seems reasonable to expect it to work with a timezone-aware DatetimeIndex (or yield a better error).

Expected Output

Output should be similar to timezone-naive DatetimeIndex:

In [3]: pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00']))
Out[3]: array([3152239034440746192], dtype=uint64)
@TheNeuralBit TheNeuralBit added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 4, 2021
@TheNeuralBit TheNeuralBit changed the title BUG:pd.util.hash_array fails on DatetimeIndex with tz specified BUG: pd.util.hash_array fails on DatetimeIndex with tz specified Jun 4, 2021
@TheNeuralBit
Copy link
Contributor Author

TheNeuralBit commented Jun 14, 2021

Closing this as it duplicates #42003. In both cases I should be using pd.util.hash_pandas_object instead. Will track a possible error message improvement in the other bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant