You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importpandasaspdimportnumpyasnpnp_df=np.random.randn(10000, 4000)
df=pd.DataFrame(np_df)
%timeitnp.round(np_df, 2)
# 416 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeitdf.round(2)
# 1.69 s ± 27.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeitnp.round(df, 2)
# 1.74 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Problem description
Completely unexpected DataFrage.round() showed up as a major hotspot during profiling.
When looking at the code, we see that even when rounding the complete data frame to a given number of decimals it is split into series objects which are then rounded.
I am wondering if there is a reason not to pass the underlying data frame to numpy and do the rounding there in this case.
A quick test showed that something like this would give us the numpy performance:
deffaster_round(df, decimals):
rounded=np.round(df.values, decimals)
returnpd.DataFrame(rounded, columns=df.columns, index=df.index)
%timeitfaster_round(df, 2)
# 417 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The text was updated successfully, but these errors were encountered:
@aberres : Good question! Ultimately, we doing up touch the get_values method for many pandas objects, which returns ndarray. Perhaps we could re-implement to avoid all of this indirection, though be careful to ensure nothing breaks.
jreback
changed the title
DataFrame.round() unnecessarily slow copared to np.round()
PERF: DataFrame.round() unnecessarily slow copared to np.round()
Aug 15, 2017
Code Sample, a copy-pastable example if possible
Problem description
Completely unexpected DataFrage.round() showed up as a major hotspot during profiling.
When looking at the code, we see that even when rounding the complete data frame to a given number of decimals it is split into series objects which are then rounded.
I am wondering if there is a reason not to pass the underlying data frame to numpy and do the rounding there in this case.
A quick test showed that something like this would give us the numpy performance:
The text was updated successfully, but these errors were encountered: