-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ExtensionDtype should be hashable #22476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll also use this for #22994. It looks like our default ExtensionDtype.eq is pretty dangerous. It's basically wrong for any kind of parametrized type. I'll do some reading about how to handle this. |
So, a safe and low-tech option is to have EA authors declare the meaningful attributes of their ExtensionDtype. For PeriodDtype, this is freq. For CategorcialDtype its the categories and order. class PeriodDtype(ExtensionDtype):
_attributes = ['freq']
def __eq__(self, other):
if isinstance(other, compat.string_types):
try:
other = self._construct_from_string(other)
except TypeError:
return False
if isinstance(other, type(self)):
return all(
getattr(self, attr) == getattr(other, attr)
for attr in self._attributes
)
return False
def __hash__(self):
return hash(tuple(getattr(self, attr) for attr in self._attributes)) (in reality, That requires a couple things from all the
CategoricalDtype.categories fails on both those counts, so the default implementation would work for it, but that's fine. I think that's a less common case. The downside is a tiny bit of duplication. We could do fancier things with metaclasses probably, but that seems like overkill. |
* API: ExtensionDtype Equality and Hashability Closes #22476
* API: ExtensionDtype Equality and Hashability Closes pandas-dev#22476
I guess I missed this somehow - my attempt at a SetArray (#22382) has been dormant for some time while other issues were more pressing. In any case, having hashability as a general requirement makes it impossible to implement an extension array for sets, unless there is a way to circumvent that requirement somehow. While implementation is not around the corner, I'd like for this to be a possibility further down the line. In the same spirit, @jreback commented on my first attempt at sets (without using an extension array) in #21547 (comment):
EDIT: If need be, I guess the backend could use an array of |
I haven't looked at the details, but what about a SetDtype isn't hashable?
Keep in mind that this is for dtypes, not arrays (which can't be hashable,
since they're mutable).
…On Tue, Nov 20, 2018 at 10:38 AM h-vetinari ***@***.***> wrote:
I guess I missed this somehow - my attempt at a SetArray (#22382
<#22382>) has been dormant for
some time while other issues were more pressing. In any case, having
hashability as a general requirement makes it impossible to implement an
extension array for sets, unless there is a way to circumvent that
requirement somehow.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#22476 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIqO22peQdrViEjLvQAgfp8sgw4bPks5uxC_ugaJpZM4WItuD>
.
|
@TomAugspurger |
Hmm yes that is ambiguous. I'll rephrase it when I go through the docs
before the release.
…On Tue, Nov 20, 2018 at 10:47 AM h-vetinari ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>
Thanks for the quick response. This was my misunderstanding - I read
"Pandas now requires that extension dtypes be hashable" in
https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.24.0.html#extensiontype-changes,
and got the wrong idea.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22476 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInu-_BMJDaXG0N-GzizpjjM-izlPks5uxDI2gaJpZM4WItuD>
.
|
Using this for sparse in #22325, it'd be good to have generally.
The text was updated successfully, but these errors were encountered: