-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Standardize ExtensionDtype subtype name #22224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 for subtype |
IntegerArray actually has |
We also set We should first agree on what exactly this thing (call it
So something like "the type this would correspond to in NumPy, if NumPy supported this directly"? Given all that, I think I like |
But this is not what you listed for all dtypes, so maybe we should not have a single name. Eg for intervals, even if numpy supported interval dtype, this would not be the same dtype as the interval bounds itself (the same for period). So maybe we shouldn't standardize too much. |
Yes, you're correct... So if we can't pin down the concept we're expressing
here, standardizing won't be that useful.
…On Mon, Oct 8, 2018 at 8:10 AM Joris Van den Bossche < ***@***.***> wrote:
So something like "the type this would correspond to in NumPy, if NumPy
supported this directly"?
But this is not what you listed for all dtypes, so maybe we should not
have a *single* name. Eg for intervals, even if numpy supported interval
dtype, this would not be the same dtype as the interval bounds itself (the
same for period).
And for categorical, you have the dtype of the codes, and the dtype of the
categories.
So maybe we shouldn't standardize too much. numpy_dtype could also be
kept for #22791 <#22791>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22224 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIm9kJt4ekT3huCwz6-32zVitK2FSks5ui07fgaJpZM4VxDX->
.
|
Let's do this for 0.24.0 ? Otherwise we need to start deprecating things if we still want to change it after the initial release, which is a bit stupid for this. I don't think it is much work, it is mainly deciding what we want to do |
Yeah, not much work, aside from the hard problem of "what do we want". I haven't really come to a decision on what's best here. What we can relatively easily do for 0.24 is decide if anything should be assigned a common name
I'm really not sure here... Maybe there all OK as is. |
In the meantime, the
I think the main question might be what we want to do with #22791. Because a name there might clash with some of the names used here. |
yeah I ran into this when fixing up to use We ought to make these NotImplemented on the base dtype to catch errors. I think I would allow having one of [subtype|numpy_dtype] for dtypes. |
@TomAugspurger about your list above: I think SparseDtype actually uses subtype and not numpy_dtype? |
Yes, sorry. Updated. |
Does anyone have a concrete proposal for what to update, if anything? I've struggled to form an opinion here. |
Pushing to 0.24.2. |
To follow up on this old thread, is there a currently supported way to get numpy-compatible dtype inference for an It seems that one should be able to define a custom pandas/pandas/core/dtypes/cast.py Lines 1789 to 1792 in ce3bac9
This in turn causes the BlockManager to fall back to a numpy object dtype: pandas/pandas/core/internals/managers.py Lines 1462 to 1469 in ce3bac9
Are there any thoughts on modifying that elif isinstance(dtype, ExtensionDtype):
if hasattr(dtype, "numpy_dtype"):
dtype = dtype.numpy_dtype
elif hasattr(dtype, "subtype"):
dtype = dtype.subtype
else:
dtype = np.dtype("object") |
The issue here is standardizing idiosyncratic naming across different internal dtypes, not defining an API we expect 3rd parties to implement? If so, I'm really not bothered by the existing names not matching. They mean different things for different dtypes right? The closest thing I have to an opinion is maybe they should be privatized? |
We have a few extension arrays that build off one or more ndarrays. You'll often want to get the underlying NumPy dtype for that array.
.type
.subtype
IntervalArray can't use
.type
, since that has to beInterval
. So IntegerArray will need to alias subtype to type.Are we happy with the name
subtype
?The text was updated successfully, but these errors were encountered: