Skip to content

Performance regression in 0.24+ on GroupBy.apply #25883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
blu3r4y opened this issue Mar 26, 2019 · 12 comments
Closed

Performance regression in 0.24+ on GroupBy.apply #25883

blu3r4y opened this issue Mar 26, 2019 · 12 comments
Labels
Apply Apply, Aggregate, Transform, Map DataFrame DataFrame data structure Groupby Performance Memory or execution speed performance

Comments

@blu3r4y
Copy link

blu3r4y commented Mar 26, 2019

Code Sample

import time

import numpy as np
import pandas as pd

nrows, ncols = 1000, 100

# data frame with random values and a key to be grouped by
df = pd.DataFrame(np.random.rand(nrows, ncols))
df["key"] = range(nrows)

numeric_columns = list(range(ncols))

grouping = df.groupby(by="key")

# performance regression in apply()
start = time.time()
grouping[numeric_columns].apply(lambda x: x - x.mean())
end = time.time()

print("[pandas=={}] execution time: {:.4f} seconds".format(pd.__version__, end - start))

# [pandas==0.23.4] execution time: 0.8700 seconds
# [pandas==0.24.0] execution time: 24.3790 seconds
# [pandas==0.24.2] execution time: 23.9600 seconds

Problem description

The function GroupBy.apply is a lot slower (~ 25 times) with version 0.24.0, compared to the 0.23.4 release. The problem still persists in the latest 0.24.2 release.

The code sample above shows this performance regression.
The purpose of the sample is to subtract the group mean from all elements in this group.

The problem does only occur when the lambda for apply() returns a data frame.
There are no performance issues with scalar return values, e.g. lambda x: x.mean().

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
@WillAyd
Copy link
Member

WillAyd commented Mar 26, 2019

Can you isolate the frame operation from the groupby? Curious if the regression is noticeable in the former

@WillAyd WillAyd added Performance Memory or execution speed performance Needs Info Clarification about behavior needed to assess issue labels Mar 26, 2019
@blu3r4y
Copy link
Author

blu3r4y commented Mar 26, 2019

You mean like so? Well, there is a slight difference (~ 30 % slower) but I wouldn't trust my "benchmark" here too much. I ran the code a few times and the numbers varied between 0.5 and 1 second.

import time

import numpy as np
import pandas as pd

nrows, ncols = 1000, 1000

df = pd.DataFrame(np.random.rand(nrows, ncols))

start = time.time()
for _ in range(100):
    df.apply(lambda x: x - x.mean())
end = time.time()

print("[pandas=={}] execution time: {:.4f} seconds".format(pd.__version__, end - start))
# [pandas==0.23.4] execution time: 25.8880 seconds
# [pandas==0.24.0] execution time: 36.0216 seconds
# [pandas==0.24.2] execution time: 34.6180 seconds

Additionally, i tested the original code sample with having only one group. It's still about 2 times slower in 0.24+ compared to 0.23.4, but not as drastic as with multiple groups.

nrows, ncols = 10000, 10000

df["key"] = [1] * nrows

# [pandas==0.23.4] execution time: 5.5250 seconds
# [pandas==0.24.0] execution time: 12.1590 seconds
# [pandas==0.24.2] execution time: 12.1540 seconds

@WillAyd
Copy link
Member

WillAyd commented Mar 26, 2019

Right I'm just trying to isolate potential regressions in GroupBy versus Frame operations. Given you don't see the same regression with scalars I'm inclined to believe it's the latter that may be at fault here.

Can you try your last example on master? I think those results might be misleading as the apply operation would still get called twice even with only one group in 0.24.2 (see #24748 which just changed this behavior) so might not be a clean comparison to make

@TomAugspurger we were never able to get the ASV site back up running were we?

@WillAyd WillAyd removed the Needs Info Clarification about behavior needed to assess issue label Mar 26, 2019
@blu3r4y
Copy link
Author

blu3r4y commented Mar 26, 2019

I guess I got closer to the problem, it really seems to be related to data frame operations, i.e. subtracting a scalar from a data frame already seems to be the regression. And it's huge, on column-heavy shapes, this simple operation is 153 times slower ô.O

import timeit
import numpy as np  # 1.16.2
import pandas as pd

def benchmark():
    nrows, ncols = 100, 100
    df = pd.DataFrame(np.random.rand(nrows, ncols))
    _ = df - 1

time = timeit.timeit(benchmark, number=100)
print("# {:>8.4f} sec   pandas=={}".format(time, pd.__version__))

Here are my benchmarking results of the df - 1 operation with different shapes compared to master.

shape 0.23.4 3855a27 factor
100, 100 0.0343 2.3835 x69
10000, 100 1.5811 5.0893 x3
100, 10000 1.4902 229.1662 x153
1000, 1000 1.5120 26.9160 x18

Concerning GroupBy.apply, there seems to be just a slight performance regression of up to 30 % slower performance, as indicated in my earlier comment.

import timeit
import numpy as np  # 1.16.2
import pandas as pd

def benchmark():
    nrows, ncols = 1000, 100
    df = pd.DataFrame(np.random.rand(nrows, ncols))
    df["key"] = range(nrows)

    numeric_columns = list(range(ncols))
    
    grouping = df.groupby(by="key")
    grouping[numeric_columns].apply(lambda x: x.mean())

time = timeit.timeit(benchmark, number=10)
print("# {:>8.4f} sec   pandas=={}".format(time, pd.__version__))
shape 0.23.4 3855a27 factor
100, 100 0.2715 0.3211 x1.18
1000, 100 19.4875 26.0909 x1.34
100, 1000 0.6296 0.6103 x0.96
1000, 1000 5.0933 5.6909 x1.11

@chris-b1
Copy link
Contributor

Note that if this is your actual function, you can/should instead do this, which has always been faster

df[numeric_columns] - grouping[numeric_columns].transform('mean')

@WillAyd WillAyd changed the title GroupBy.apply runtime performance regression in 0.24+ Performance regression in 0.24+ subtracting scalar from DataFrame Mar 27, 2019
@WillAyd WillAyd added the DataFrame DataFrame data structure label Mar 27, 2019
@TomAugspurger
Copy link
Contributor

Can you check for duplicates? We have another issue for ops that were previously blockwise, but are now columnwise.

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2019

Local ASV result comparing current head to the last commit on 0.23.4 confirms a regression for the frame ops:

       before           after         ratio
     [af7b0ba4]       [95c78d65]
                      <master>
+      3.30±0.2ms          249±3ms    75.37  binary_ops.Ops2.time_frame_float_div_by_zero
+      8.08±0.4ms         256±10ms    31.75  binary_ops.Ops2.time_frame_int_div_by_zero
+      12.6±0.2ms          261±7ms    20.78  binary_ops.Ops2.time_frame_float_floor_by_zero
+      30.2±0.4ms          106±1ms     3.52  binary_ops.Ops.time_frame_multi_and(False, 1)
+      29.9±0.7ms          102±4ms     3.43  binary_ops.Ops.time_frame_multi_and(False, 'default')
+      34.1±0.4ms          110±2ms     3.24  binary_ops.Ops.time_frame_multi_and(True, 1)
+      39.2±0.4ms        111±0.9ms     2.83  binary_ops.Ops.time_frame_multi_and(True, 'default')
+     3.85±0.06ms       5.18±0.1ms     1.35  binary_ops.Ops.time_frame_add(True, 1)
+      28.6±0.1μs       37.4±0.2μs     1.31  binary_ops.Ops2.time_series_dot
+     3.54±0.06ms       4.29±0.3ms     1.21  binary_ops.Ops.time_frame_add(False, 1)
+         512±1μs          600±3μs     1.17  binary_ops.Ops2.time_frame_series_dot
-         107±2ms         61.2±1ms     0.57  binary_ops.Ops.time_frame_comparison(False, 'default')
-         108±1ms       61.1±0.9ms     0.57  binary_ops.Ops.time_frame_comparison(False, 1)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

I've updated the title to reflect this as I think that is the larger issue. @blu3r4y if you can run ASVs for GroupBy apply to confirm regression there that could be helpful as another issue

@WillAyd WillAyd changed the title Performance regression in 0.24+ subtracting scalar from DataFrame Performance regression in 0.24+ DataFrame Binary Ops Mar 27, 2019
@WillAyd WillAyd changed the title Performance regression in 0.24+ DataFrame Binary Ops Performance regression in 0.24+ on DataFrame Binary Ops Mar 27, 2019
@blu3r4y
Copy link
Author

blu3r4y commented Mar 27, 2019

Can you check for duplicates? We have another issue for ops that were previously blockwise, but are now columnwise.

Yes, regression with DataFrame + Scalar Ops in 0.24+ have already been reported in #24990.

@WillAyd I will run ASV for GroupBy.apply soon, so that we keep this issue isolated on GroupBy.apply, right? Or should I make a new issue then?

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2019 via email

@blu3r4y blu3r4y changed the title Performance regression in 0.24+ on DataFrame Binary Ops Performance regression in 0.24+ on GroupBy.apply Jun 2, 2019
@blu3r4y
Copy link
Author

blu3r4y commented Jun 2, 2019

As suggested by @WillAyd, i ran ASV for GroupBy.apply in order to only observe the impact on that.

  • I did the asv call 5 times for warm-up and 5 times for the sake of reporting, on a clean Debian 9.7 CPU-optimized droplet from Digital Ocean. I observed that the results drastically differ between runs, this is why I did multiple runs in the first place.
  • I observed a regression between v0.23.4 and v0.24.0 in time_groupby_apply_dict_return (1.78x - 3.10x) and one in time_scalar_function_single_col (1.12x - 1.52x)
  • Comparing v0.23.4 and the current HEAD, these regression are still observable: time_groupby_apply_dict_return (1.76x - 2.42x) and time_scalar_function_single_col (1.11x - 1.72x)

These findings must be taken with a grain of salt, since the benchmarks do not show stable results (maybe interesting for the discussion in #23412). Although, I guess the impact on time_groupby_apply_dict_return and time_scalar_function_single_col is significant in any case.

I renamed THIS issue to focus on the performance regression in GroupBy.apply, since I reckon #24990 already got to the other root cause, namely the performance regression on DataFrame Binary Ops, which heavily affected my initial MWE on column-heavy shapes as well - as older comments here showed.

v0.23.4 0409521 vs. v0.24.0 83eb242 (5 warm-up runs + 5 reported runs)

$ for i in {1..10}; do asv continuous -f 1.1 v0.23.4 v0.24.0 -b groupby.Apply; done

       before           after         ratio
     [04095216]       [83eb2428]
+      85.6±0.2ms        152±0.8ms     1.78  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+      48.1±0.2ms       54.4±0.2ms     1.13  groupby.Apply.time_scalar_function_multi_col
-         643±4ms          415±4ms     0.64  groupby.Apply.time_copy_overhead_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+      49.4±0.1ms        153±0.1ms     3.10  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+        9.25±3ms      14.1±0.07ms     1.52  groupby.Apply.time_scalar_function_single_col
+         646±2ms          752±1ms     1.16  groupby.Apply.time_copy_overhead_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+      49.2±0.2ms         120±30ms     2.43  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     6.07±0.06ms      6.77±0.08ms     1.12  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+       69.2±20ms        151±0.2ms     2.19  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+       35.1±10ms       54.8±0.5ms     1.56  groupby.Apply.time_scalar_function_multi_col
+       1.30±0.4s       1.96±0.01s     1.51  groupby.Apply.time_copy_function_multi_col
+     12.4±0.06ms      13.9±0.04ms     1.12  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [83eb2428]
+      49.4±0.4ms         121±40ms     2.44  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+        914±20ms        1.49±0.5s     1.63  groupby.Apply.time_copy_function_multi_col
+     6.00±0.03ms      6.76±0.08ms     1.13  groupby.Apply.time_scalar_function_single_col

v0.23.4 0409521 vs. HEAD 437efa6 (5 warm-up runs + 5 reported runs)

$ for i in {1..10}; do asv continuous -f 1.1 v0.23.4 HEAD -b groupby.Apply; done

       before           after         ratio
     [04095216]       [437efa6e]
+      86.6±0.1ms          152±1ms     1.76  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     5.99±0.03ms         10.3±3ms     1.72  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [437efa6e]
+      49.0±0.1ms         119±30ms     2.42  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         908±8ms       1.08±0.01s     1.19  groupby.Apply.time_copy_function_multi_col
+     6.02±0.07ms      6.67±0.05ms     1.11  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [437efa6e]
+     5.99±0.03ms         10.2±4ms     1.70  groupby.Apply.time_scalar_function_single_col
+         353±3ms          411±1ms     1.16  groupby.Apply.time_copy_overhead_single_col
+      22.0±0.1ms       24.7±0.2ms     1.12  groupby.Apply.time_scalar_function_multi_col

       before           after         ratio
     [04095216]       [437efa6e]
+      48.4±0.3ms       54.1±0.2ms     1.12  groupby.Apply.time_scalar_function_multi_col

       before           after         ratio
     [04095216]       [437efa6e]
+      49.2±0.5ms         118±30ms     2.41  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     6.04±0.05ms         10.2±4ms     1.69  groupby.Apply.time_scalar_function_single_col
+         352±1ms        592±200ms     1.68  groupby.Apply.time_copy_overhead_single_col

@pv
Copy link
Contributor

pv commented Jun 2, 2019

FWIW, on a desktop computer the benchmark numbers are fairly stable:

$ for i in {1..10}; do asv continuous -f 1 v0.23.4 v0.24.0 -b groupby.Apply --cpu-affinity 5; done
       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      57.8±0.2ms       94.1±0.3ms     1.63  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         465±5ms          522±5ms     1.12  groupby.Apply.time_copy_overhead_single_col
+      1.19±0.01s       1.34±0.02s     1.12  groupby.Apply.time_copy_function_multi_col

       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      56.2±0.1ms       93.5±0.2ms     1.66  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         451±2ms          526±4ms     1.17  groupby.Apply.time_copy_overhead_single_col
+         1.17±0s       1.35±0.01s     1.16  groupby.Apply.time_copy_function_multi_col
+     6.36±0.09ms       7.35±0.2ms     1.16  groupby.Apply.time_scalar_function_single_col
+      23.1±0.7ms       26.3±0.4ms     1.14  groupby.Apply.time_scalar_function_multi_col

       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      57.0±0.6ms       93.3±0.9ms     1.64  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+         451±2ms          526±4ms     1.17  groupby.Apply.time_copy_overhead_single_col
+        23.0±1ms       26.7±0.1ms     1.16  groupby.Apply.time_scalar_function_multi_col
+      1.17±0.01s       1.35±0.01s     1.15  groupby.Apply.time_copy_function_multi_col
+      6.64±0.3ms      7.29±0.07ms     1.10  groupby.Apply.time_scalar_function_single_col

       before           after         ratio
     [04095216]       [83eb2428]
     <v0.23.4^0>       <v0.24.0^0>
+      56.0±0.2ms       93.4±0.8ms     1.67  groupby.ApplyDictReturn.time_groupby_apply_dict_return
+     6.38±0.08ms      7.47±0.03ms     1.17  groupby.Apply.time_scalar_function_single_col
+      23.4±0.2ms       27.0±0.5ms     1.16  groupby.Apply.time_scalar_function_multi_col
+         454±2ms          520±8ms     1.15  groupby.Apply.time_copy_overhead_single_col
+      1.18±0.01s       1.33±0.01s     1.13  groupby.Apply.time_copy_function_multi_col

If the accuracy is not sufficient, you can add -a processes=5 to run 5 rounds (instead of the default 2), to get a better sample of the fluctuations.

@jbrockmendel jbrockmendel added Apply Apply, Aggregate, Transform, Map Groupby labels Oct 16, 2019
@rhshadrach
Copy link
Member

I'm seeing 0.2614 seconds on my machine on main; but that means relatively little in isolation. However I think this issue is too old, comparisons with 0.23 perf are no longer useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map DataFrame DataFrame data structure Groupby Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

7 participants