Skip to content

corrwith in 0.24 is much slower than 0.23 (especially if corr axis is smaller than other axis) #26368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yavitzour opened this issue May 13, 2019 · 9 comments
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance

Comments

@yavitzour
Copy link

Hi,

I've noticed that corrwith on pandas 0.24 is much slower than in 0.23, especially when trying to correlate dataframes where the length of the axis of correlation is much smaller than the length of the other axis.

Example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(10000, 100))
df2 = pd.DataFrame(np.random.rand(10000, 100))
df1.corrwith(df2, axis=1)

With pandas 0.23.4 the snippet above finishes in about 0.1 sec, whereas with pandas 0.24.1 it takes about 10 seconds (a 100 times slower...).

If we increase the length of the correlation axis, 0.23.4 still performs much better, but the results are a bit less dramatic, for example with 10000 on both axes:

df1 = pd.DataFrame(np.random.rand(10000, 10000))
df2 = pd.DataFrame(np.random.rand(10000, 10000))
df1.corrwith(df2, axis=1)

Pandas 0.23.4 finishes in ~10 seconds whereas pandas 0.24.1 finishes in about ~30 seconds ("only" 3 times slower)

Thanks!

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 13, 2019 via email

@yavitzour
Copy link
Author

I can try, though I have no familiarity with the insides of pandas so I doubt that I could get something out of it. It's very easy to replicate, just run the code above in two clean virtual environments, one with pandas 0.24.1 and one with 0.23.4 (or any other 0.23 release).

I just ran it now with cProfile (on a different computer, just for the fun of it). Here are the first few lines of the output. Hope you can make something out of it.

For 0.23 I get:

python -m cProfile -s cumtime scr.py
         235002 function calls (229040 primitive calls) in 0.974 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    403/1    0.006    0.000    0.974    0.974 {built-in method builtins.exec}
        1    0.004    0.004    0.974    0.974 scr.py:1(<module>)
    615/2    0.006    0.000    0.751    0.375 <frozen importlib._bootstrap>:978(_find_and_load)
    615/2    0.003    0.000    0.751    0.375 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
    417/2    0.003    0.000    0.749    0.375 <frozen importlib._bootstrap>:663(_load_unlocked)
    341/2    0.002    0.000    0.749    0.374 <frozen importlib._bootstrap_external>:722(exec_module)
    650/2    0.001    0.000    0.748    0.374 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        3    0.000    0.000    0.747    0.249 __init__.py:5(<module>)
   442/41    0.001    0.000    0.598    0.015 {built-in method builtins.__import__}
2147/1194    0.003    0.000    0.264    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
        1    0.000    0.000    0.211    0.211 api.py:5(<module>)
        1    0.000    0.000    0.209    0.209 __init__.py:106(<module>)
       28    0.001    0.000    0.188    0.007 __init__.py:1(<module>)
        4    0.000    0.000    0.182    0.046 __init__.py:2(<module>)
        1    0.003    0.003    0.178    0.178 frame.py:6649(corrwith)
      341    0.006    0.000    0.165    0.000 <frozen importlib._bootstrap_external>:793(get_code)
        1    0.000    0.000    0.162    0.162 groupby.py:1(<module>)
      545    0.006    0.000    0.159    0.000 <frozen importlib._bootstrap>:882(_find_spec)
     2333    0.150    0.000    0.150    0.000 {built-in method nt.stat}
      525    0.001    0.000    0.150    0.000 <frozen importlib._bootstrap_external>:1272(find_spec)
      525    0.003    0.000    0.149    0.000 <frozen importlib._bootstrap_external>:1240(_get_spec)
      851    0.011    0.000    0.134    0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
        6    0.000    0.000    0.131    0.022 frame.py:6845(_reduce)
        6    0.000    0.000    0.130    0.022 frame.py:6856(f)
      8/6    0.001    0.000    0.130    0.022 nanops.py:69(_f)
  416/390    0.001    0.000    0.126    0.000 <frozen importlib._bootstrap>:576(module_from_spec)
     1741    0.002    0.000    0.121    0.000 <frozen importlib._bootstrap_external>:74(_path_stat)
    55/39    0.000    0.000    0.109    0.003 <frozen importlib._bootstrap_external>:1040(create_module)
    55/39    0.064    0.001    0.109    0.003 {built-in method _imp.create_dynamic}
        2    0.000    0.000    0.108    0.054 __init__.py:9(<module>)
      250    0.001    0.000    0.105    0.000 {method 'extend' of 'list' objects}
        1    0.000    0.000    0.105    0.105 lazy.py:97(_lazy)
      593    0.000    0.000    0.104    0.000 __init__.py:1098(<genexpr>)
      592    0.002    0.000    0.104    0.000 __init__.py:111(resource_exists)

whereas for 0.24 I get:

python -m cProfile -s cumtime scr.py
         25185450 function calls (24818715 primitive calls) in 18.990 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    422/1    0.007    0.000   18.990   18.990 {built-in method builtins.exec}
        1    0.004    0.004   18.990   18.990 scr.py:1(<module>)
        1    0.003    0.003   17.996   17.996 frame.py:7151(corrwith)
        7    0.000    0.000   17.853    2.550 ops.py:2015(f)
        4    0.000    0.000   17.796    4.449 ops.py:1102(dispatch_to_series)
 40011/11    0.067    0.000   15.818    1.438 expressions.py:192(evaluate)
 40011/11    0.130    0.000   15.815    1.438 expressions.py:63(_evaluate_standard)
    40004    0.343    0.000    9.213    0.000 ops.py:1536(wrapper)
        2    0.000    0.000    8.995    4.497 ops.py:1891(_combine_series_frame)
        2    0.018    0.009    8.995    4.497 frame.py:5111(_combine_match_columns)
        2    0.017    0.009    8.837    4.419 frame.py:5118(_combine_const)
        2    0.000    0.000    7.970    3.985 ops.py:1142(column_op)
        2    0.121    0.060    7.970    3.985 ops.py:1143(<dictcomp>)
        2    0.000    0.000    7.832    3.916 ops.py:1126(column_op)
        2    0.097    0.049    7.832    3.916 ops.py:1127(<dictcomp>)
    60012    0.197    0.000    6.324    0.000 indexing.py:1485(__getitem__)
    40000    0.047    0.000    5.297    0.000 indexing.py:2141(_getitem_tuple)
    80028    0.482    0.000    4.815    0.000 series.py:152(__init__)
40003/20003    0.166    0.000    4.795    0.000 {built-in method _operator.mul}
40001/20001    0.157    0.000    4.470    0.000 {built-in method _operator.sub}
    40004    0.098    0.000    4.436    0.000 ops.py:1468(_construct_result)
    40000    0.387    0.000    4.132    0.000 indexing.py:960(_getitem_lowerdim)
    60012    0.202    0.000    3.393    0.000 indexing.py:2205(_getitem_axis)
  4579482    1.595    0.000    2.873    0.000 {built-in method builtins.isinstance}
    80040    0.310    0.000    2.826    0.000 blocks.py:3034(get_block_type)
    60012    0.061    0.000    2.400    0.000 indexing.py:143(_get_loc)
    40000    0.200    0.000    2.235    0.000 frame.py:2829(_ixs)
    80028    0.294    0.000    2.140    0.000 managers.py:1443(__init__)
    80043    0.145    0.000    2.002    0.000 blocks.py:3080(make_block)
       25    0.001    0.000    1.992    0.080 frame.py:378(__init__)
        4    0.008    0.002    1.989    0.497 construction.py:170(init_dict)
        4    0.000    0.000    1.944    0.486 construction.py:43(arrays_to_mgr)
    40004    0.146    0.000    1.891    0.000 ops.py:1512(safe_na_op)

@TomAugspurger TomAugspurger added the Performance Memory or execution speed performance label May 13, 2019
@TomAugspurger
Copy link
Contributor

I personally don't plan to look into this. If you're not planning to work on it either, anything you can do to help another contributor identify the issue and propose a solution is helpful! Nothing in that cProfile output looks wrong at a glance. A line profile of DataFrame.corrwith would be the next place I look.

@yavitzour
Copy link
Author

Confirmed that the problem persists in pandas 0.25.0

@TomAugspurger
Copy link
Contributor

I don't believe that anyone has started working on this, if you're still interested.

@JamesXiao
Copy link

I've encountered the same issue, the corrwith function is 30x slower in version later than 0.23.x, when calculating on two data frame of shape (4000, 4) in my case, that I had to downgrade pandas in production environment.

@yavitzour
Copy link
Author

Happy to report that while the problem still persisted up to 1.0.5, pandas 1.1.0 solves the problem for me.

Thanks!

@simonjayhawkins
Copy link
Member

Thanks @yavitzour

can confirm recent improvement

%timeit df1.corrwith(df2, axis=1)
# 242 ms ± 7.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # 1.2.0.dev0+17.ga0c8425a5
# 4.48 s ± 27.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # 1.0.1

@arw2019
Copy link
Member

arw2019 commented Nov 27, 2020

We also have benchmarks for this in asv_bench/benchmarks/stat_ops.py

@arw2019 arw2019 added the Closing Candidate May be closeable, needs more eyeballs label Nov 27, 2020
@rhshadrach rhshadrach added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Closing Candidate May be closeable, needs more eyeballs labels Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

6 participants