Skip to content

REF: Create StorageExtensionDtype #46537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 28, 2022
Merged

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Mar 27, 2022

Working towards eventually having int[pyarrow] for example, generalizing some of the storage concepts of StringDtype into StorageExtensionDtype

@mroeschke mroeschke added the Refactor Internal refactoring of code label Mar 27, 2022
@jreback jreback added this to the 1.5 milestone Mar 28, 2022
@@ -24,6 +21,7 @@

from pandas.core.dtypes.base import (
ExtensionDtype,
StorageExtensionDtype,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ExtensionType still neede?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an isinstance check that uses it.

@jreback jreback merged commit 583c8da into pandas-dev:main Mar 28, 2022
@mroeschke mroeschke deleted the ref/storagedtype branch March 28, 2022 19:51
return self.name

def __eq__(self, other: Any) -> bool:
if isinstance(other, self.type) and other == self.name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember why you added this line here? (it's different as the line in __eq__ you moved here from StringDtype, and this only works "by chance" for StringDtype, as its .type is str)

So I think you can actually keep the isinstance(other, str) as it was originally in StringDtype, as this is meant to check for strings (regardless of the type of the dtype).

(from #47034 (comment))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I am not sure why this changed. This was just supposed to be a copy-paste essentially to a new subclass and maybe make it more generic maybe since StringDtype.type == str, but clearly I misunderstood why this was str in the first place.

@@ -67,7 +65,7 @@ class StringDtype(ExtensionDtype):
parts of the API may change without warning.

In particular, StringDtype.na_value may change to no longer be
``numpy.nan``.
``pd.NA``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to the comment I made in #47034 (comment) about pd.NA vs pa.NA, but so AFAIK we don't have any concrete plans to stop using pd.NA for the string dtype? If that is the case, I wouldn't state it here like that, or if there are concrete ideas, that's something we should have a discussion about.

(the sentence was clearly outdated of course ;), since the np.nan wasn't even returned. But so I would rather remove the full sentence then?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants