-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Spearman correlation #14239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For what it's worth, scipy.stats.spearmanr appears to be much faster. |
pls show actual detail and df.info() for a portion of the frame |
spearman corr does re-ranking on every iteration. This is to accomodate the nan's changing. If you don't have nan's then the scipy method is great. I am sure this could be sped up in lots of cases. pull-requests are welcome. |
I've come across this issue today. I'll just add some timing info > df.shape
< (10000, 30)
> %%timeit df.corr('spearman')
< 949 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> %%timeit pd.DataFrame(scipy.stats.spearmanr(df)[0], columns=df.columns, index=df.columns)
< 28.8 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.24.1 |
Perhaps we can close this due to #28151? |
Thanks @Liam3851, I think you're right. |
Just retried the snipped I posted. Still not as fast as scipy, but it's at least a lot better than before 🎉
|
I'm trying to compute Spearman correlation for a relatively small DataFrame (
500x8000), but the calculation appears to be orders of magnitude slower than Pearson correlation.Any help to make Spearman's speed comparable to Pearson's would be greatly appreciated.
The text was updated successfully, but these errors were encountered: