Skip to content

BUG: EAs should not be hashable #30908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 20, 2020
Merged

Conversation

jbrockmendel
Copy link
Member

This turns out to be a blocker for some cleanup in index.pyx

@gfyoung gfyoung added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 11, 2020
@gfyoung
Copy link
Member

gfyoung commented Jan 11, 2020

LGTM - do we need a whatsnew ?

@WillAyd
Copy link
Member

WillAyd commented Jan 11, 2020

Do we actually want to commit to this? Just wonder if this would conflict with some of the work I think @TomAugspurger did before to try and create a non-mutable dataframe

@jreback
Copy link
Contributor

jreback commented Jan 11, 2020

yes mutable things are not hashable
we do allow hash_pandas_object to work though

@TomAugspurger
Copy link
Contributor

This change seems fine. Can you say more about the motivation though? As I can imagine an immutable array that is hasable, so we mange shouldn’t rely on unhashabilty.

@jbrockmendel
Copy link
Member Author

Can you say more about the motivation though? As I can imagine an immutable array that is hasable, so we mange shouldn’t rely on unhashabilty.

I'm working on indexing code and want to simplify a check

cdef inline bint is_definitely_invalid_key(object val):
    if isinstance(val, tuple):
        try:
            hash(val)
        except TypeError:
            return True

    # we have a _data, means we are a NDFrame
    return (isinstance(val, slice) or util.is_array(val)
            or isinstance(val, list) or hasattr(val, '_data'))

    cpdef get_loc(self, object val):
        cdef:
            Py_ssize_t loc

        if is_definitely_invalid_key(val):
            raise TypeError(f"'{val}' is an invalid key")
    [...]

to just check for hashability. Doing that breaks a couple of tests because EAs are currently hashable

@TomAugspurger
Copy link
Contributor

to just check for hashability. Doing that breaks a couple of tests because EAs are currently hashable

When an immutable, 3rd-party EA comes along that's hashable, will that break again?

@jbrockmendel
Copy link
Member Author

When an immutable, 3rd-party EA comes along that's hashable, will that break again?

If that EA gets passed as a key to index._engine.(get_loc|get_value|contains) then it may give (KeyError|KeyError|False) instead of raising a TypeError

@jreback jreback added this to the 1.1 milestone Jan 18, 2020
@@ -1073,6 +1073,9 @@ def _reduce(self, name, skipna=True, **kwargs):
"""
raise TypeError(f"cannot perform {name} with type {self.dtype}")

def __hash__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to add some text to the top of the EA that says EA by default are not hashable

@jreback jreback merged commit 998e742 into pandas-dev:master Jan 20, 2020
@jreback
Copy link
Contributor

jreback commented Jan 20, 2020

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the bug-hashable branch January 20, 2020 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants