Performance regression in 0.24+ on GroupBy.apply #25883

blu3r4y · 2019-03-26T17:35:43Z

Code Sample

import time

import numpy as np
import pandas as pd

nrows, ncols = 1000, 100

# data frame with random values and a key to be grouped by
df = pd.DataFrame(np.random.rand(nrows, ncols))
df["key"] = range(nrows)

numeric_columns = list(range(ncols))

grouping = df.groupby(by="key")

# performance regression in apply()
start = time.time()
grouping[numeric_columns].apply(lambda x: x - x.mean())
end = time.time()

print("[pandas=={}] execution time: {:.4f} seconds".format(pd.__version__, end - start))

# [pandas==0.23.4] execution time: 0.8700 seconds
# [pandas==0.24.0] execution time: 24.3790 seconds
# [pandas==0.24.2] execution time: 23.9600 seconds

Problem description

The function GroupBy.apply is a lot slower (~ 25 times) with version 0.24.0, compared to the 0.23.4 release. The problem still persists in the latest 0.24.2 release.

The code sample above shows this performance regression.
The purpose of the sample is to subtract the group mean from all elements in this group.

The problem does only occur when the lambda for apply() returns a data frame.
There are no performance issues with scalar return values, e.g. lambda x: x.mean().

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-03-26T17:41:34Z

Can you isolate the frame operation from the groupby? Curious if the regression is noticeable in the former

blu3r4y · 2019-03-26T18:16:09Z

You mean like so? Well, there is a slight difference (~ 30 % slower) but I wouldn't trust my "benchmark" here too much. I ran the code a few times and the numbers varied between 0.5 and 1 second.

import time

import numpy as np
import pandas as pd

nrows, ncols = 1000, 1000

df = pd.DataFrame(np.random.rand(nrows, ncols))

start = time.time()
for _ in range(100):
    df.apply(lambda x: x - x.mean())
end = time.time()

print("[pandas=={}] execution time: {:.4f} seconds".format(pd.__version__, end - start))
# [pandas==0.23.4] execution time: 25.8880 seconds
# [pandas==0.24.0] execution time: 36.0216 seconds
# [pandas==0.24.2] execution time: 34.6180 seconds

Additionally, i tested the original code sample with having only one group. It's still about 2 times slower in 0.24+ compared to 0.23.4, but not as drastic as with multiple groups.

nrows, ncols = 10000, 10000

df["key"] = [1] * nrows

# [pandas==0.23.4] execution time: 5.5250 seconds
# [pandas==0.24.0] execution time: 12.1590 seconds
# [pandas==0.24.2] execution time: 12.1540 seconds

WillAyd · 2019-03-26T18:23:23Z

Right I'm just trying to isolate potential regressions in GroupBy versus Frame operations. Given you don't see the same regression with scalars I'm inclined to believe it's the latter that may be at fault here.

Can you try your last example on master? I think those results might be misleading as the apply operation would still get called twice even with only one group in 0.24.2 (see #24748 which just changed this behavior) so might not be a clean comparison to make

@TomAugspurger we were never able to get the ASV site back up running were we?

blu3r4y · 2019-03-26T21:02:47Z

I guess I got closer to the problem, it really seems to be related to data frame operations, i.e. subtracting a scalar from a data frame already seems to be the regression. And it's huge, on column-heavy shapes, this simple operation is 153 times slower ô.O

import timeit
import numpy as np  # 1.16.2
import pandas as pd

def benchmark():
    nrows, ncols = 100, 100
    df = pd.DataFrame(np.random.rand(nrows, ncols))
    _ = df - 1

time = timeit.timeit(benchmark, number=100)
print("# {:>8.4f} sec   pandas=={}".format(time, pd.__version__))

Here are my benchmarking results of the df - 1 operation with different shapes compared to master.

shape	0.23.4	`3855a27`	factor
100, 100	0.0343	2.3835	x69
10000, 100	1.5811	5.0893	x3
100, 10000	1.4902	229.1662	x153
1000, 1000	1.5120	26.9160	x18

Concerning GroupBy.apply, there seems to be just a slight performance regression of up to 30 % slower performance, as indicated in my earlier comment.

import timeit
import numpy as np  # 1.16.2
import pandas as pd

def benchmark():
    nrows, ncols = 1000, 100
    df = pd.DataFrame(np.random.rand(nrows, ncols))
    df["key"] = range(nrows)

    numeric_columns = list(range(ncols))
    
    grouping = df.groupby(by="key")
    grouping[numeric_columns].apply(lambda x: x.mean())

time = timeit.timeit(benchmark, number=10)
print("# {:>8.4f} sec   pandas=={}".format(time, pd.__version__))

shape	0.23.4	`3855a27`	factor
100, 100	0.2715	0.3211	x1.18
1000, 100	19.4875	26.0909	x1.34
100, 1000	0.6296	0.6103	x0.96
1000, 1000	5.0933	5.6909	x1.11

chris-b1 · 2019-03-27T00:06:39Z

Note that if this is your actual function, you can/should instead do this, which has always been faster

df[numeric_columns] - grouping[numeric_columns].transform('mean')

TomAugspurger · 2019-03-27T03:14:53Z

Can you check for duplicates? We have another issue for ops that were previously blockwise, but are now columnwise.

WillAyd · 2019-03-27T03:53:28Z

Local ASV result comparing current head to the last commit on 0.23.4 confirms a regression for the frame ops:

       before           after         ratio
     [af7b0ba4]       [95c78d65]
                      <master>
+      3.30±0.2ms          249±3ms    75.37  binary_ops.Ops2.time_frame_float_div_by_zero
+      8.08±0.4ms         256±10ms    31.75  binary_ops.Ops2.time_frame_int_div_by_zero
+      12.6±0.2ms          261±7ms    20.78  binary_ops.Ops2.time_frame_float_floor_by_zero
+      30.2±0.4ms          106±1ms     3.52  binary_ops.Ops.time_frame_multi_and(False, 1)
+      29.9±0.7ms          102±4ms     3.43  binary_ops.Ops.time_frame_multi_and(False, 'default')
+      34.1±0.4ms          110±2ms     3.24  binary_ops.Ops.time_frame_multi_and(True, 1)
+      39.2±0.4ms        111±0.9ms     2.83  binary_ops.Ops.time_frame_multi_and(True, 'default')
+     3.85±0.06ms       5.18±0.1ms     1.35  binary_ops.Ops.time_frame_add(True, 1)
+      28.6±0.1μs       37.4±0.2μs     1.31  binary_ops.Ops2.time_series_dot
+     3.54±0.06ms       4.29±0.3ms     1.21  binary_ops.Ops.time_frame_add(False, 1)
+         512±1μs          600±3μs     1.17  binary_ops.Ops2.time_frame_series_dot
-         107±2ms         61.2±1ms     0.57  binary_ops.Ops.time_frame_comparison(False, 'default')
-         108±1ms       61.1±0.9ms     0.57  binary_ops.Ops.time_frame_comparison(False, 1)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

I've updated the title to reflect this as I think that is the larger issue. @blu3r4y if you can run ASVs for GroupBy apply to confirm regression there that could be helpful as another issue

blu3r4y · 2019-03-27T10:14:22Z

Can you check for duplicates? We have another issue for ops that were previously blockwise, but are now columnwise.

Yes, regression with DataFrame + Scalar Ops in 0.24+ have already been reported in #24990.

@WillAyd I will run ASV for GroupBy.apply soon, so that we keep this issue isolated on GroupBy.apply, right? Or should I make a new issue then?

WillAyd · 2019-03-27T12:35:28Z

That makes sense - thanks Mario!

…

Sent from my iPhone

On Mar 27, 2019, at 3:14 AM, Mario Kahlhofer ***@***.***> wrote: Can you check for duplicates? We have another issue for ops that were previously blockwise, but are now columnwise. Yes, regression with DataFrame + Scalar Ops in 0.24+ have already been reported in #24990. @WillAyd I will run ASV for GroupBy.apply soon, so that we keep this issue isolated on GroupBy.apply, right? Or should I make a new issue then? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

blu3r4y · 2019-06-02T13:24:58Z

As suggested by @WillAyd, i ran ASV for GroupBy.apply in order to only observe the impact on that.

I did the asv call 5 times for warm-up and 5 times for the sake of reporting, on a clean Debian 9.7 CPU-optimized droplet from Digital Ocean. I observed that the results drastically differ between runs, this is why I did multiple runs in the first place.
I observed a regression between v0.23.4 and v0.24.0 in time_groupby_apply_dict_return (1.78x - 3.10x) and one in time_scalar_function_single_col (1.12x - 1.52x)
Comparing v0.23.4 and the current HEAD, these regression are still observable: time_groupby_apply_dict_return (1.76x - 2.42x) and time_scalar_function_single_col (1.11x - 1.72x)

These findings must be taken with a grain of salt, since the benchmarks do not show stable results (maybe interesting for the discussion in #23412). Although, I guess the impact on time_groupby_apply_dict_return and time_scalar_function_single_col is significant in any case.

I renamed THIS issue to focus on the performance regression in GroupBy.apply, since I reckon #24990 already got to the other root cause, namely the performance regression on DataFrame Binary Ops, which heavily affected my initial MWE on column-heavy shapes as well - as older comments here showed.

v0.23.4 `0409521` vs. v0.24.0 `83eb242` (5 warm-up runs + 5 reported runs)

$ for i in {1..10}; do asv continuous -f 1.1 v0.23.4 v0.24.0 -b groupby.Apply; done

       before           after         ratio
     [04095216]       [83eb2428]
+      85.6±0.2ms        152±0.8ms     1.78  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+      48.1±0.2ms       54.4±0.2ms     1.13  groupby.Apply.time_scalar_function_multi_col
-         643±4ms          415±4ms     0.64  groupby.Apply.time_copy_overhead_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+      49.4±0.1ms        153±0.1ms     3.10  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+        9.25±3ms      14.1±0.07ms     1.52  groupby.Apply.time_scalar_function_single_col
+         646±2ms          752±1ms     1.16  groupby.Apply.time_copy_overhead_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+      49.2±0.2ms         120±30ms     2.43  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     6.07±0.06ms      6.77±0.08ms     1.12  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+       69.2±20ms        151±0.2ms     2.19  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+       35.1±10ms       54.8±0.5ms     1.56  groupby.Apply.time_scalar_function_multi_col
+       1.30±0.4s       1.96±0.01s     1.51  groupby.Apply.time_copy_function_multi_col
+     12.4±0.06ms      13.9±0.04ms     1.12  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+      49.4±0.4ms         121±40ms     2.44  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+        914±20ms        1.49±0.5s     1.63  groupby.Apply.time_copy_function_multi_col
+     6.00±0.03ms      6.76±0.08ms     1.13  groupby.Apply.time_scalar_function_single_col

v0.23.4 `0409521` vs. HEAD `437efa6` (5 warm-up runs + 5 reported runs)

$ for i in {1..10}; do asv continuous -f 1.1 v0.23.4 HEAD -b groupby.Apply; done

       before           after         ratio
     [04095216]       [437efa6e]
+      86.6±0.1ms          152±1ms     1.76  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     5.99±0.03ms         10.3±3ms     1.72  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [437efa6e]
+      49.0±0.1ms         119±30ms     2.42  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         908±8ms       1.08±0.01s     1.19  groupby.Apply.time_copy_function_multi_col
+     6.02±0.07ms      6.67±0.05ms     1.11  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [437efa6e]
+     5.99±0.03ms         10.2±4ms     1.70  groupby.Apply.time_scalar_function_single_col
+         353±3ms          411±1ms     1.16  groupby.Apply.time_copy_overhead_single_col
+      22.0±0.1ms       24.7±0.2ms     1.12  groupby.Apply.time_scalar_function_multi_col

       before           after         ratio
     [04095216]       [437efa6e]
+      48.4±0.3ms       54.1±0.2ms     1.12  groupby.Apply.time_scalar_function_multi_col

       before           after         ratio
     [04095216]       [437efa6e]
+      49.2±0.5ms         118±30ms     2.41  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     6.04±0.05ms         10.2±4ms     1.69  groupby.Apply.time_scalar_function_single_col
+         352±1ms        592±200ms     1.68  groupby.Apply.time_copy_overhead_single_col

pv · 2019-06-02T15:08:20Z

FWIW, on a desktop computer the benchmark numbers are fairly stable:

$ for i in {1..10}; do asv continuous -f 1 v0.23.4 v0.24.0 -b groupby.Apply --cpu-affinity 5; done
       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      57.8±0.2ms       94.1±0.3ms     1.63  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         465±5ms          522±5ms     1.12  groupby.Apply.time_copy_overhead_single_col
+      1.19±0.01s       1.34±0.02s     1.12  groupby.Apply.time_copy_function_multi_col

       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      56.2±0.1ms       93.5±0.2ms     1.66  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         451±2ms          526±4ms     1.17  groupby.Apply.time_copy_overhead_single_col
+         1.17±0s       1.35±0.01s     1.16  groupby.Apply.time_copy_function_multi_col
+     6.36±0.09ms       7.35±0.2ms     1.16  groupby.Apply.time_scalar_function_single_col
+      23.1±0.7ms       26.3±0.4ms     1.14  groupby.Apply.time_scalar_function_multi_col

       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      57.0±0.6ms       93.3±0.9ms     1.64  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         451±2ms          526±4ms     1.17  groupby.Apply.time_copy_overhead_single_col
+        23.0±1ms       26.7±0.1ms     1.16  groupby.Apply.time_scalar_function_multi_col
+      1.17±0.01s       1.35±0.01s     1.15  groupby.Apply.time_copy_function_multi_col
+      6.64±0.3ms      7.29±0.07ms     1.10  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      56.0±0.2ms       93.4±0.8ms     1.67  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     6.38±0.08ms      7.47±0.03ms     1.17  groupby.Apply.time_scalar_function_single_col
+      23.4±0.2ms       27.0±0.5ms     1.16  groupby.Apply.time_scalar_function_multi_col
+         454±2ms          520±8ms     1.15  groupby.Apply.time_copy_overhead_single_col
+      1.18±0.01s       1.33±0.01s     1.13  groupby.Apply.time_copy_function_multi_col

If the accuracy is not sufficient, you can add -a processes=5 to run 5 rounds (instead of the default 2), to get a better sample of the fluctuations.

rhshadrach · 2023-04-23T16:42:34Z

I'm seeing 0.2614 seconds on my machine on main; but that means relatively little in isolation. However I think this issue is too old, comparisons with 0.23 perf are no longer useful.

WillAyd added Performance Memory or execution speed performance Needs Info Clarification about behavior needed to assess issue labels Mar 26, 2019

WillAyd removed the Needs Info Clarification about behavior needed to assess issue label Mar 26, 2019

WillAyd changed the title ~~GroupBy.apply runtime performance regression in 0.24+~~ Performance regression in 0.24+ subtracting scalar from DataFrame Mar 27, 2019

WillAyd added the DataFrame DataFrame data structure label Mar 27, 2019

WillAyd changed the title ~~Performance regression in 0.24+ subtracting scalar from DataFrame~~ Performance regression in 0.24+ DataFrame Binary Ops Mar 27, 2019

WillAyd changed the title ~~Performance regression in 0.24+ DataFrame Binary Ops~~ Performance regression in 0.24+ on DataFrame Binary Ops Mar 27, 2019

jbrockmendel mentioned this issue Apr 12, 2019

dispatch_to_series is very slow #26061

Closed

blu3r4y changed the title ~~Performance regression in 0.24+ on DataFrame Binary Ops~~ Performance regression in 0.24+ on GroupBy.apply Jun 2, 2019

jbrockmendel added Apply Apply, Aggregate, Transform, Map Groupby labels Oct 16, 2019

rhshadrach closed this as completed Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in 0.24+ on GroupBy.apply #25883

Performance regression in 0.24+ on GroupBy.apply #25883

blu3r4y commented Mar 26, 2019

WillAyd commented Mar 26, 2019

blu3r4y commented Mar 26, 2019

WillAyd commented Mar 26, 2019

blu3r4y commented Mar 26, 2019

chris-b1 commented Mar 27, 2019

TomAugspurger commented Mar 27, 2019

WillAyd commented Mar 27, 2019

blu3r4y commented Mar 27, 2019

WillAyd commented Mar 27, 2019 via email

blu3r4y commented Jun 2, 2019

pv commented Jun 2, 2019

rhshadrach commented Apr 23, 2023

Performance regression in 0.24+ on GroupBy.apply #25883

Performance regression in 0.24+ on GroupBy.apply #25883

Comments

blu3r4y commented Mar 26, 2019

Code Sample

Problem description

Output of pd.show_versions()

WillAyd commented Mar 26, 2019

blu3r4y commented Mar 26, 2019

WillAyd commented Mar 26, 2019

blu3r4y commented Mar 26, 2019

chris-b1 commented Mar 27, 2019

TomAugspurger commented Mar 27, 2019

WillAyd commented Mar 27, 2019

blu3r4y commented Mar 27, 2019

WillAyd commented Mar 27, 2019 via email

blu3r4y commented Jun 2, 2019

v0.23.4 0409521 vs. v0.24.0 83eb242 (5 warm-up runs + 5 reported runs)

v0.23.4 0409521 vs. HEAD 437efa6 (5 warm-up runs + 5 reported runs)

pv commented Jun 2, 2019

rhshadrach commented Apr 23, 2023

Output of `pd.show_versions()`

v0.23.4 `0409521` vs. v0.24.0 `83eb242` (5 warm-up runs + 5 reported runs)

v0.23.4 `0409521` vs. HEAD `437efa6` (5 warm-up runs + 5 reported runs)