-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: parallelize DataFrame.corr #40956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Vysybyl you're welcome to contribute an implementation |
So how to parallel? |
Was this fixed in #42761? |
Looks like we have a min_periods keyword that scipy doesnt. Other than that i don't see why we couldn't do e.g
|
Hi, I am interested in contributing to this issue! In my own project, I just modify the code slightly to remove the nested loop and use cython.parallel.prange. However, this requires OpenMP, which if I understand correctly Pandas doesn't rely on at the moment? If this isn't an issue, I would be happy to submit a PR! |
Is your feature request related to a problem?
DataFrame.corr(method="spearman") is extremely slow.
method="pearson" is quite slow too.
I can see from my machine resource monitor that the implementation is single threaded. Is it a design choice? If so, there should be at least an optional argument to parallelize it (at C++ level, of course).
I did not check the actual code implementing this method.
Describe the solution you'd like
scipy.stats.spearmanr implements this computation on a numpy array in 1/20 of the time in my 6-core machine.
API breaking implications
None.
Describe alternatives you've considered
Add an optional argument (ex. "parallelize"=[True, False]) so that you give the user this option.
Then, the method should either be reimplemented from scratch at C++ level or we must use the existing scipy.stats function
on the DataFrame.values, wrapping the returned array in a new DataFrame.
Additional context
IMPORTANT: DataFrame.corr and spearmanr gives slightly different results (some kind of small rounding error of about 10e-15)
The text was updated successfully, but these errors were encountered: