-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: Create StorageExtensionDtype #46537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -24,6 +21,7 @@ | |||
|
|||
from pandas.core.dtypes.base import ( | |||
ExtensionDtype, | |||
StorageExtensionDtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is ExtensionType still neede?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an isinstance
check that uses it.
return self.name | ||
|
||
def __eq__(self, other: Any) -> bool: | ||
if isinstance(other, self.type) and other == self.name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember why you added this line here? (it's different as the line in __eq__
you moved here from StringDtype, and this only works "by chance" for StringDtype, as its .type
is str
)
So I think you can actually keep the isinstance(other, str)
as it was originally in StringDtype, as this is meant to check for strings (regardless of the type of the dtype).
(from #47034 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I am not sure why this changed. This was just supposed to be a copy-paste essentially to a new subclass and maybe make it more generic maybe since StringDtype.type == str
, but clearly I misunderstood why this was str
in the first place.
@@ -67,7 +65,7 @@ class StringDtype(ExtensionDtype): | |||
parts of the API may change without warning. | |||
|
|||
In particular, StringDtype.na_value may change to no longer be | |||
``numpy.nan``. | |||
``pd.NA``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also related to the comment I made in #47034 (comment) about pd.NA vs pa.NA, but so AFAIK we don't have any concrete plans to stop using pd.NA for the string dtype? If that is the case, I wouldn't state it here like that, or if there are concrete ideas, that's something we should have a discussion about.
(the sentence was clearly outdated of course ;), since the np.nan wasn't even returned. But so I would rather remove the full sentence then?)
Working towards eventually having
int[pyarrow]
for example, generalizing some of the storage concepts ofStringDtype
intoStorageExtensionDtype