-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: directly astype with numpy if series is already nansafe #8732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The time is actually not in checking for nulls. If you did s.values.astype(str) what you get back is an object holding int. This is numpy doing the conversion, where as pandas iterates over each item and calls str(item) on it. https://github.com/pydata/pandas/blob/master/pandas/lib.pyx#L866 So I don't think it can be fixed if we still want to returns object holding str. Potentially https://github.com/pydata/pandas/blob/master/pandas/lib.pyx#L843 I can sort out a pull request if there is interest. |
|
from SO
so the null check is pretty cheap. if no nulls, then can just bypass nansafe an use the underlying numpy routine. should be a nice speedup.
``
In [13]: arr = np.random.randint(1,10,size=1000000)
In [14]: s = Series(arr)
In [15]: s.notnull().all()
Out[15]: True
In [16]: %timeit s.notnull().all()
1000 loops, best of 3: 1.35 ms per loop
In [17]: %timeit s.astype(str)
1 loops, best of 3: 2.52 s per loop
In [18]: %timeit s.values.astype(str)
10 loops, best of 3: 37.7 ms per loop
The text was updated successfully, but these errors were encountered: