-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: implement pd.Series.corr(method="distance") #22402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Interesting suggestion, except I'm a little concerned by the phrase "under-used," as it makes me wonder how much benefit this addition might have for the overall user base vs. maintenance work. If you can write a relatively simple implementation in cc @jreback |
@gfyoung I can understand that concern, as it could be argued that distance correlation is more interesting as a theoretical rather than applied measure. Is there a way to gauge interest in a feature within the |
There is the pandas mailing list. |
@TomAugspurger Sounds good, #22684 looks like a nice change. Can you point me to any documentation on how to create cookbook recipes? |
It's nothing fancy, just a bunch of code snippets in https://github.com/pandas-dev/pandas/blob/master/doc/source/cookbook.rst (it would be nice to enforce some consistency, and pick a better presentation format; but that's another issue). |
I could add this within the "Computation" section if that makes sense. |
Why closed? |
Distance correlation (https://en.wikipedia.org/wiki/Distance_correlation) is a powerful yet underused technique for comparing two distributions that I think would make a very nice addition to the existing correlation methods in$X$ and $Y$ are independent if and only if their distance correlation is zero, which cannot be said of Pearson, Spearman or Kendall.
pandas
. For one, these measures have the unique property that two random variablesThe below code is an implementation in pure
numpy
(which could certainly be optimized / more elegantly written) that could be part of theSeries
class and then called withincorr
. Later it could be integrated seamlessly withcorrwith
, and if this feature were available I know personally it would be one of the first things I would look at when approaching a regression problem.Here's an example that shows how distance correlation can detect relationships that the other common correlation methods miss:
The text was updated successfully, but these errors were encountered: