PERF: directly astype with numpy if series is already nansafe #8732

jreback · 2014-11-04T22:07:56Z

from SO

so the null check is pretty cheap. if no nulls, then can just bypass nansafe an use the underlying numpy routine. should be a nice speedup.

``
In [13]: arr = np.random.randint(1,10,size=1000000)

In [14]: s = Series(arr)

In [15]: s.notnull().all()
Out[15]: True

In [16]: %timeit s.notnull().all()
1000 loops, best of 3: 1.35 ms per loop

In [17]: %timeit s.astype(str)
1 loops, best of 3: 2.52 s per loop

In [18]: %timeit s.values.astype(str)
10 loops, best of 3: 37.7 ms per loop

The text was updated successfully, but these errors were encountered:

…ll check

vikram · 2014-11-29T18:32:25Z

The time is actually not in checking for nulls.
But in ensuring that every element returned is a string.

If you did s.values.astype(str) what you get back is an object holding int. This is numpy doing the conversion, where as pandas iterates over each item and calls str(item) on it.
So if you do s.astype(str) you be an object holding str.

https://github.com/pydata/pandas/blob/master/pandas/lib.pyx#L866

So I don't think it can be fixed if we still want to returns object holding str.

Potentially https://github.com/pydata/pandas/blob/master/pandas/lib.pyx#L843
can be improved. If the array doesn't have nulls and we don't have the is_datelike
then instead of iterating, we can just return arr.astype(new_dtype)

I can sort out a pull request if there is interest.

jbrockmendel · 2021-12-21T03:19:57Z

s.values.astype(str) is now slightly slower than s.astype(str) (331ms vs 309ms locally). Closing.

jreback added Performance Memory or execution speed performance Good as first PR Strings String extension data type and string data labels Nov 4, 2014

vikram pushed a commit to vikram/pandas that referenced this issue Nov 29, 2014

astype checks notnull Fixes: pandas-dev#8732

50ef0e6

vikram mentioned this issue Nov 29, 2014

astype checks notnull Fixes: #8732 #8924

Closed

vikram pushed a commit to vikram/pandas that referenced this issue Nov 29, 2014

astype checks notnull Fixes: pandas-dev#8732 added benchmark,added nu…

bea542a

…ll check

jreback added this to the 0.16.0 milestone Nov 29, 2014

jorisvandenbossche mentioned this issue Dec 3, 2014

PERF: astype(str) on object dtypes GH8732 #8971

Closed

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback mentioned this issue Dec 17, 2016

BUG: Categorical Series breaks when re-serializing with msgpack. #14901

Closed

TomAugspurger added the good first issue label Oct 11, 2017

jreback added good first issue and removed good first issue Difficulty Novice labels Dec 15, 2017

simonjayhawkins removed the good first issue label Apr 25, 2020

jbrockmendel closed this as completed Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: directly astype with numpy if series is already nansafe #8732

PERF: directly astype with numpy if series is already nansafe #8732

jreback commented Nov 4, 2014

vikram commented Nov 29, 2014

jbrockmendel commented Dec 21, 2021

PERF: directly astype with numpy if series is already nansafe #8732

PERF: directly astype with numpy if series is already nansafe #8732

Comments

jreback commented Nov 4, 2014

vikram commented Nov 29, 2014

jbrockmendel commented Dec 21, 2021