corrwith in 0.24 is much slower than 0.23 (especially if corr axis is smaller than other axis) #26368

yavitzour · 2019-05-13T13:58:07Z

Hi,

I've noticed that corrwith on pandas 0.24 is much slower than in 0.23, especially when trying to correlate dataframes where the length of the axis of correlation is much smaller than the length of the other axis.

Example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(10000, 100))
df2 = pd.DataFrame(np.random.rand(10000, 100))
df1.corrwith(df2, axis=1)

With pandas 0.23.4 the snippet above finishes in about 0.1 sec, whereas with pandas 0.24.1 it takes about 10 seconds (a 100 times slower...).

If we increase the length of the correlation axis, 0.23.4 still performs much better, but the results are a bit less dramatic, for example with 10000 on both axes:

df1 = pd.DataFrame(np.random.rand(10000, 10000))
df2 = pd.DataFrame(np.random.rand(10000, 10000))
df1.corrwith(df2, axis=1)

Pandas 0.23.4 finishes in ~10 seconds whereas pandas 0.24.1 finishes in about ~30 seconds ("only" 3 times slower)

Thanks!

TomAugspurger · 2019-05-13T14:10:14Z

Can you profile things to see where the slowdown is? The only PR reference corrwith in the 0.24.0 release notes is #22375.

…

On Mon, May 13, 2019 at 8:58 AM yavitzour ***@***.***> wrote: Hi, I've noticed that corrwith on pandas 0.24 is much slower than in 0.23, especially when trying to correlate dataframes where the length of the axis of correlation is much smaller than the length of the other axis. Example: import pandas as pdimport numpy as np df1 = pd.DataFrame(np.random.rand(10000, 100)) df2 = pd.DataFrame(np.random.rand(10000, 100)) df1.corrwith(df2, axis=1) With pandas 0.23.4 the snippet above finishes in about 0.1 sec, whereas with pandas 0.24.1 it takes about 10 seconds (a 100 times slower...). If we increase the length of the correlation axis, 0.23.4 still performs much better, but the results are a bit less dramatic, for example with 10000 on both axes: df1 = pd.DataFrame(np.random.rand(10000, 10000)) df2 = pd.DataFrame(np.random.rand(10000, 10000)) df1.corrwith(df2, axis=1) Pandas 0.23.4 finishes in ~10 seconds whereas pandas 0.24.1 finishes in about ~30 seconds ("only" 3 times slower) Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26368?email_source=notifications&email_token=AAKAOIR2YHZW5YPSQLOPSZLPVFXXTA5CNFSM4HMQCWA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GTN63QQ>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIQ7PLZLSOLQRDXI23TPVFXXTANCNFSM4HMQCWAQ> .

yavitzour · 2019-05-13T16:43:37Z

I can try, though I have no familiarity with the insides of pandas so I doubt that I could get something out of it. It's very easy to replicate, just run the code above in two clean virtual environments, one with pandas 0.24.1 and one with 0.23.4 (or any other 0.23 release).

I just ran it now with cProfile (on a different computer, just for the fun of it). Here are the first few lines of the output. Hope you can make something out of it.

For 0.23 I get:

python -m cProfile -s cumtime scr.py
         235002 function calls (229040 primitive calls) in 0.974 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    403/1    0.006    0.000    0.974    0.974 {built-in method builtins.exec}
        1    0.004    0.004    0.974    0.974 scr.py:1(<module>)
    615/2    0.006    0.000    0.751    0.375 <frozen importlib._bootstrap>:978(_find_and_load)
    615/2    0.003    0.000    0.751    0.375 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
    417/2    0.003    0.000    0.749    0.375 <frozen importlib._bootstrap>:663(_load_unlocked)
    341/2    0.002    0.000    0.749    0.374 <frozen importlib._bootstrap_external>:722(exec_module)
    650/2    0.001    0.000    0.748    0.374 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        3    0.000    0.000    0.747    0.249 __init__.py:5(<module>)
   442/41    0.001    0.000    0.598    0.015 {built-in method builtins.__import__}
2147/1194    0.003    0.000    0.264    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
        1    0.000    0.000    0.211    0.211 api.py:5(<module>)
        1    0.000    0.000    0.209    0.209 __init__.py:106(<module>)
       28    0.001    0.000    0.188    0.007 __init__.py:1(<module>)
        4    0.000    0.000    0.182    0.046 __init__.py:2(<module>)
        1    0.003    0.003    0.178    0.178 frame.py:6649(corrwith)
      341    0.006    0.000    0.165    0.000 <frozen importlib._bootstrap_external>:793(get_code)
        1    0.000    0.000    0.162    0.162 groupby.py:1(<module>)
      545    0.006    0.000    0.159    0.000 <frozen importlib._bootstrap>:882(_find_spec)
     2333    0.150    0.000    0.150    0.000 {built-in method nt.stat}
      525    0.001    0.000    0.150    0.000 <frozen importlib._bootstrap_external>:1272(find_spec)
      525    0.003    0.000    0.149    0.000 <frozen importlib._bootstrap_external>:1240(_get_spec)
      851    0.011    0.000    0.134    0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
        6    0.000    0.000    0.131    0.022 frame.py:6845(_reduce)
        6    0.000    0.000    0.130    0.022 frame.py:6856(f)
      8/6    0.001    0.000    0.130    0.022 nanops.py:69(_f)
  416/390    0.001    0.000    0.126    0.000 <frozen importlib._bootstrap>:576(module_from_spec)
     1741    0.002    0.000    0.121    0.000 <frozen importlib._bootstrap_external>:74(_path_stat)
    55/39    0.000    0.000    0.109    0.003 <frozen importlib._bootstrap_external>:1040(create_module)
    55/39    0.064    0.001    0.109    0.003 {built-in method _imp.create_dynamic}
        2    0.000    0.000    0.108    0.054 __init__.py:9(<module>)
      250    0.001    0.000    0.105    0.000 {method 'extend' of 'list' objects}
        1    0.000    0.000    0.105    0.105 lazy.py:97(_lazy)
      593    0.000    0.000    0.104    0.000 __init__.py:1098(<genexpr>)
      592    0.002    0.000    0.104    0.000 __init__.py:111(resource_exists)

whereas for 0.24 I get:

python -m cProfile -s cumtime scr.py
         25185450 function calls (24818715 primitive calls) in 18.990 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    422/1    0.007    0.000   18.990   18.990 {built-in method builtins.exec}
        1    0.004    0.004   18.990   18.990 scr.py:1(<module>)
        1    0.003    0.003   17.996   17.996 frame.py:7151(corrwith)
        7    0.000    0.000   17.853    2.550 ops.py:2015(f)
        4    0.000    0.000   17.796    4.449 ops.py:1102(dispatch_to_series)
 40011/11    0.067    0.000   15.818    1.438 expressions.py:192(evaluate)
 40011/11    0.130    0.000   15.815    1.438 expressions.py:63(_evaluate_standard)
    40004    0.343    0.000    9.213    0.000 ops.py:1536(wrapper)
        2    0.000    0.000    8.995    4.497 ops.py:1891(_combine_series_frame)
        2    0.018    0.009    8.995    4.497 frame.py:5111(_combine_match_columns)
        2    0.017    0.009    8.837    4.419 frame.py:5118(_combine_const)
        2    0.000    0.000    7.970    3.985 ops.py:1142(column_op)
        2    0.121    0.060    7.970    3.985 ops.py:1143(<dictcomp>)
        2    0.000    0.000    7.832    3.916 ops.py:1126(column_op)
        2    0.097    0.049    7.832    3.916 ops.py:1127(<dictcomp>)
    60012    0.197    0.000    6.324    0.000 indexing.py:1485(__getitem__)
    40000    0.047    0.000    5.297    0.000 indexing.py:2141(_getitem_tuple)
    80028    0.482    0.000    4.815    0.000 series.py:152(__init__)
40003/20003    0.166    0.000    4.795    0.000 {built-in method _operator.mul}
40001/20001    0.157    0.000    4.470    0.000 {built-in method _operator.sub}
    40004    0.098    0.000    4.436    0.000 ops.py:1468(_construct_result)
    40000    0.387    0.000    4.132    0.000 indexing.py:960(_getitem_lowerdim)
    60012    0.202    0.000    3.393    0.000 indexing.py:2205(_getitem_axis)
  4579482    1.595    0.000    2.873    0.000 {built-in method builtins.isinstance}
    80040    0.310    0.000    2.826    0.000 blocks.py:3034(get_block_type)
    60012    0.061    0.000    2.400    0.000 indexing.py:143(_get_loc)
    40000    0.200    0.000    2.235    0.000 frame.py:2829(_ixs)
    80028    0.294    0.000    2.140    0.000 managers.py:1443(__init__)
    80043    0.145    0.000    2.002    0.000 blocks.py:3080(make_block)
       25    0.001    0.000    1.992    0.080 frame.py:378(__init__)
        4    0.008    0.002    1.989    0.497 construction.py:170(init_dict)
        4    0.000    0.000    1.944    0.486 construction.py:43(arrays_to_mgr)
    40004    0.146    0.000    1.891    0.000 ops.py:1512(safe_na_op)

TomAugspurger · 2019-05-13T16:54:47Z

I personally don't plan to look into this. If you're not planning to work on it either, anything you can do to help another contributor identify the issue and propose a solution is helpful! Nothing in that cProfile output looks wrong at a glance. A line profile of DataFrame.corrwith would be the next place I look.

yavitzour · 2019-08-11T11:26:47Z

Confirmed that the problem persists in pandas 0.25.0

TomAugspurger · 2019-08-12T18:43:01Z

I don't believe that anyone has started working on this, if you're still interested.

JamesXiao · 2019-10-11T01:04:02Z

I've encountered the same issue, the corrwith function is 30x slower in version later than 0.23.x, when calculating on two data frame of shape (4000, 4) in my case, that I had to downgrade pandas in production environment.

yavitzour · 2020-08-03T12:11:22Z

Happy to report that while the problem still persisted up to 1.0.5, pandas 1.1.0 solves the problem for me.

Thanks!

simonjayhawkins · 2020-08-03T12:54:36Z

Thanks @yavitzour

can confirm recent improvement

%timeit df1.corrwith(df2, axis=1)
# 242 ms ± 7.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # 1.2.0.dev0+17.ga0c8425a5
# 4.48 s ± 27.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # 1.0.1

arw2019 · 2020-11-27T05:12:43Z

We also have benchmarks for this in asv_bench/benchmarks/stat_ops.py

TomAugspurger added the Performance Memory or execution speed performance label May 13, 2019

arw2019 added the Closing Candidate May be closeable, needs more eyeballs label Nov 27, 2020

rhshadrach added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Closing Candidate May be closeable, needs more eyeballs labels Feb 6, 2021

rhshadrach closed this as completed Feb 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corrwith in 0.24 is much slower than 0.23 (especially if corr axis is smaller than other axis) #26368

corrwith in 0.24 is much slower than 0.23 (especially if corr axis is smaller than other axis) #26368

yavitzour commented May 13, 2019

TomAugspurger commented May 13, 2019 via email

yavitzour commented May 13, 2019

TomAugspurger commented May 13, 2019

yavitzour commented Aug 11, 2019

TomAugspurger commented Aug 12, 2019

JamesXiao commented Oct 11, 2019

yavitzour commented Aug 3, 2020

simonjayhawkins commented Aug 3, 2020

arw2019 commented Nov 27, 2020

corrwith in 0.24 is much slower than 0.23 (especially if corr axis is smaller than other axis) #26368

corrwith in 0.24 is much slower than 0.23 (especially if corr axis is smaller than other axis) #26368

Comments

yavitzour commented May 13, 2019

TomAugspurger commented May 13, 2019 via email

yavitzour commented May 13, 2019

TomAugspurger commented May 13, 2019

yavitzour commented Aug 11, 2019

TomAugspurger commented Aug 12, 2019

JamesXiao commented Oct 11, 2019

yavitzour commented Aug 3, 2020

simonjayhawkins commented Aug 3, 2020

arw2019 commented Nov 27, 2020