PERF: DataFrame.round() unnecessarily slow copared to np.round() #17254

aberres · 2017-08-15T06:54:35Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

np_df = np.random.randn(10000, 4000)
df = pd.DataFrame(np_df)

%timeit np.round(np_df, 2)
# 416 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.round(2)
# 1.69 s ± 27.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.round(df, 2)
# 1.74 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Problem description

Completely unexpected DataFrage.round() showed up as a major hotspot during profiling.
When looking at the code, we see that even when rounding the complete data frame to a given number of decimals it is split into series objects which are then rounded.

I am wondering if there is a reason not to pass the underlying data frame to numpy and do the rounding there in this case.

A quick test showed that something like this would give us the numpy performance:

def faster_round(df, decimals):
    rounded = np.round(df.values, decimals)
    return pd.DataFrame(rounded, columns=df.columns, index=df.index)

%timeit faster_round(df, 2)
# 417 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

gfyoung · 2017-08-15T07:10:37Z

@aberres : Good question! Ultimately, we doing up touch the get_values method for many pandas objects, which returns ndarray. Perhaps we could re-implement to avoid all of this indirection, though be careful to ensure nothing breaks.

@jreback @jorisvandenbossche

jreback · 2017-08-15T09:54:57Z

this is done column by column, it could instead be per-dtype as a block. would require some amount of work to do this. pull-requests are welcome.

gfyoung added the Performance Memory or execution speed performance label Aug 15, 2017

jreback added Difficulty Intermediate labels Aug 15, 2017

jreback added this to the Next Major Release milestone Aug 15, 2017

jreback changed the title ~~DataFrame.round() unnecessarily slow copared to np.round()~~ PERF: DataFrame.round() unnecessarily slow copared to np.round() Aug 15, 2017

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

lithomas1 self-assigned this Feb 18, 2023

lithomas1 mentioned this issue Feb 20, 2023

PERF: Implement round on the block level #51498

Merged

5 tasks

lithomas1 closed this as completed in #51498 Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: DataFrame.round() unnecessarily slow copared to np.round() #17254

PERF: DataFrame.round() unnecessarily slow copared to np.round() #17254

aberres commented Aug 15, 2017

gfyoung commented Aug 15, 2017 •

edited

Loading

jreback commented Aug 15, 2017

PERF: DataFrame.round() unnecessarily slow copared to np.round() #17254

PERF: DataFrame.round() unnecessarily slow copared to np.round() #17254

Comments

aberres commented Aug 15, 2017

Code Sample, a copy-pastable example if possible

Problem description

gfyoung commented Aug 15, 2017 • edited Loading

jreback commented Aug 15, 2017

gfyoung commented Aug 15, 2017 •

edited

Loading