Skip to content

PERF: dtype checks #27224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jul 4, 2019 · 4 comments
Closed

PERF: dtype checks #27224

jbrockmendel opened this issue Jul 4, 2019 · 4 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance

Comments

@jbrockmendel
Copy link
Member

At the sprint there was some discussion of optimization and python call stacks. One place where we do many tiny calls is in is_foo_dtype checks

In [3]: arr = np.arange(10**5)         
                                                                                                                                 
In [4]: %timeit is_float_dtype(arr)                                                                                                                                    
1.23 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit is_float_dtype(arr.dtype)                                                                                                                              
678 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit arr.dtype.kind == 'f'                                                                                                                                  
71.6 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

~17x difference. Part of this is because is_foo_dtype will take either arr or arr.dtype. The potential savings stack up in places where we do many of these dtype checks on the same arguments.

@TomAugspurger
Copy link
Contributor

tangentially related, I was looking at the overhead of Series.sum() vs. np.nansum(series.values).

basic

10 calls to is_dtype_type, and 21 isinstance(thing, ABCPandas*)!

@simonjayhawkins simonjayhawkins added Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance labels Jul 4, 2019
@jbrockmendel
Copy link
Member Author

We could implement versions of these that are dtype-only. I don't think changing the existing ones is a option (at least not short-term) since they are exposed in the API

@jbrockmendel
Copy link
Member Author

We've eliminated many internal usages recently of these recently (#52682, #52649, #52607, #52582, #52564, #52527, #52506, #52387, #52288, #52279, #52213). Could deprecate e.g. is_datetime64_dtype.

@jbrockmendel
Copy link
Member Author

Closing as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants