-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF/API: Stricter extension checking. #22031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/API: Stricter extension checking. #22031
Conversation
Removes is_extension_array_dtype's handling of both arrays and dtypes. Now it handles just arrays, and we provide `is_extension_dtype` for checking whether a dtype is an extension dtype. It's the caller's responsibility to know whether they have an array or dtype. Closes pandas-dev#22021
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 30, 2018 at 13:09 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing this? we do this for all other is_*
To fix #22021
Yeah, and that's why I can make similar changes for the other methods, but we have backwards compatibility issues there. |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -323,7 +323,8 @@ ExtensionType Changes | |||
- Bug in :meth:`Series.get` for ``Series`` using ``ExtensionArray`` and integer index (:issue:`21257`) | |||
- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`) | |||
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`) | |||
- | |||
- Added :func:`pandas.api.types.is_extension_array_dtype` for testing whether an array is an ExtensionArray and :func:`pandas.api.types.is_extension_dtype` for testing whether a dtype is an ExtensionDtype (:issue:`22021`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more logical to call this is_extension_array
instead of is_extension_array_dtype
? (I suppose the reason is backwards compatibility for at least one of the two use cases?)
Because I find the 'dtype' part of the name quite confusing.
Codecov Report
@@ Coverage Diff @@
## master #22031 +/- ##
==========================================
- Coverage 92.07% 92.07% -0.01%
==========================================
Files 170 170
Lines 50688 50684 -4
==========================================
- Hits 46671 46666 -5
- Misses 4017 4018 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't believe this didn't break anything. I am -1 on this though because this changes the definition of is_*_dtype. All of the routines work on an array correctly. Adding 2 different methods is very confusing as we on't do this for anything else and introduces a special case.
Yes. These weren't in the public namespace earlier, so we don't have to worry about deprecating.
How so? Let's step back to the original issue: In [28]: subarr = [('f1', 'f2')]
In [29]: pd.api.types.pandas_dtype(subarr)
Out[29]: dtype([('f1', '<f2')]) Is that what you would expect?
As I said,
|
Why? As Tom said, "all the call sites knew whether they had values or a dtype.", and there are not yet that many usages in the code base currently for this.
Note that we actually have two methods for some other types as well: we have |
I think this is a really bad idea to change. We have had this convention for a LONG time esp on the basic types. this is throwing out everything. |
Throwing out?
…________________________________
From: Jeff Reback <[email protected]>
Sent: Tuesday, July 24, 2018 5:05:31 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Mention
Subject: Re: [pandas-dev/pandas] REF/API: Stricter extension checking. (#22031)
I think this is a really bad idea to change. We have had this convention for a LONG time esp on the basic types. this is throwing out everything.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#22031 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIlZxP53Zujbodat9MFHLx1yidcOqks5uJ5orgaJpZM4VblLO>.
|
as I said I am -1 on this. This changes too much. This changes the meaning of the is_* for a single case, which is contrary to all of the existing code. I am not sure what the problem is, but fixing this way is not it. |
That's unfortunate. |
Sorry, I closed this in frustration with you @jreback. But I think it's a good change for the project, so I'll continue to state my case for it. Comments like
and
are frustrating, since they've already been addressed. I explain the original problem in the issue (#22021 (comment)) and here (#22031 (comment)). Joris explains in #22031 (comment), this follows the precedent laid out by categorical and datetimetz. |
But since they are not yet in the public namespace, we can choose whichever best name we can think of? Or maybe I didn't understand your answer :-) Anyway, I think we need to think a bit more on the naming scheme. |
pandas/core/dtypes/common.py
Outdated
|
||
Parameters | ||
---------- | ||
arr : object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array -> dtype
pandas/core/dtypes/common.py
Outdated
return isinstance(arr_or_dtype, (ExtensionDtype, ExtensionArray)) | ||
|
||
def is_extension_dtype(dtype): | ||
"""Check if a dtype object is a pandas extension dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question is whether we want to accept strings here as well?
Or keep it strictly for actual dtype objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I suspect we should accept strings that have been registered.
I'll see what I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meant for the comment here
this is what pandas_dtype does
Sorry for the confusion. I think you understood my answer, but I didn't push the change :) I think |
this is exactly what pandas_dtype does |
Yes, but pandas/pandas/core/dtypes/common.py Line 2013 in 2c7c797
|
Since you now splitted it in two functions, in the new |
pandas/core/dtypes/common.py
Outdated
def is_extension_array_dtype(arr_or_dtype): | ||
"""Check if an object is a pandas extension array type. | ||
def is_extension_array(arr): | ||
"""Check if an array object is a pandas extension array type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify (maybe in extended summary paragraph or in parameters section) that this is "array-like" (also Series and Index, in practice anything that has a dtype property)
Refactored.
|
|
||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't actually a classmethod.
@TomAugspurger looks a lot nicer! |
I don't think so. pandas will still have the issue of In [28]: subarr = [('f1', 'f2')]
In [29]: pd.api.types.pandas_dtype(subarr)
Out[29]: dtype([('f1', '<f2')]) this is just a workaround that avoids the problem by not calling |
can you rebase |
thanks!!! |
Removes is_extension_array_dtype's handling of both arrays and dtypes.
Now it handles just arrays, and we provide
is_extension_dtype
for checkingwhether a dtype is an extension dtype. It's the caller's responsibility to know
whether they have an array or dtype.
Closes #22021