Skip to content

PERF/DISC: restrict inputs to is_foo_dtype #33368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Apr 7, 2020 · 3 comments
Closed

PERF/DISC: restrict inputs to is_foo_dtype #33368

jbrockmendel opened this issue Apr 7, 2020 · 3 comments
Labels
Needs Discussion Requires discussion from core team before further action Performance Memory or execution speed performance

Comments

@jbrockmendel
Copy link
Member

xref #33364

A lot of the overhead in is_foo_dtype (which we call a lot) comes from checking for dtype objects, arrays, and strings. We could tighten these to be e.g.

def is_extension_arrray_dtype(obj):
     return isinstance(obj, ExtensionDtype)

The trouble with this is some of these are user-facing, so this would require a deprecation cycle, and in the interim the warnings call would actually hurt performance.

@jbrockmendel jbrockmendel added Performance Memory or execution speed performance Needs Discussion Requires discussion from core team before further action labels Apr 7, 2020
@TomAugspurger
Copy link
Contributor

The trouble with this is some of these are user-facing, so this would require a deprecation cycle, and in the interim the warnings call would actually hurt performance.

It'll be a bit of work, but we can define

is_extension_array_dtype(obj):
    # flexible stuff...
    # deprecation checks...
    return _is_extension_array_dtype(dtype)

And internally we use the strict _is_extension_array_dtype. That's probably worth doing regardless of whether we deprecate the flexible stuff, and only accept dtype instances.

@jbrockmendel
Copy link
Member Author

And internally we use the strict _is_extension_array_dtype

Makes sense. Should keep the names public (no underscore, not in the public API), maybe some abbreviation is_ea_dtype, is_dt64_dtype, ...

@jorisvandenbossche
Copy link
Member

Big +1 to deprecate passing values instead of dtypes.

I am not sure it needs to hurt performance to include the deprecation? (as long as internally we change all occurrences to call this with a dtype of course).

Right now, we eg have have this function to get the dtype:

if arr_or_dtype is None:
raise TypeError("Cannot deduce dtype from null object")
# fastpath
elif isinstance(arr_or_dtype, np.dtype):
return arr_or_dtype
elif isinstance(arr_or_dtype, type):
return np.dtype(arr_or_dtype)
# if we have an array-like
elif hasattr(arr_or_dtype, "dtype"):
arr_or_dtype = arr_or_dtype.dtype
return pandas_dtype(arr_or_dtype)

So there is a check to see if the object has a "dtype" attribute. Adding a deprecation warning inside that if block shouldn't impact performance when passing dtypes to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Requires discussion from core team before further action Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants