-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow pairwise calcuation when comparing the column with itself … #43569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ab9be5e
to
bb1675f
Compare
bb1675f
to
55da48f
Compare
@@ -9416,6 +9417,10 @@ def corr( | |||
Minimum number of observations required per pair of columns | |||
to have a valid result. Currently only available for Pearson | |||
and Spearman correlation. | |||
calculate_diagonal : bool, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is descriptive, but is there precedent for this type of naming, e.g numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is descriptive, but is there precedent for this type of naming, e.g numpy?
No. I came up with this myself. I'm not familiar with Numpy naming convention. Any suggestion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a whatsnew note in other enhancements for 1.4
430afdd
to
e794c82
Compare
@pandas-dev/pandas-core if anyone has comments on this (naming in particular) |
This feels like a pretty exotic feature. Is there a clear use case for it becoming an option? As far as I can tell, the correlation will be 1 except in edge cases where it is strange things like NaN or Inf. Should this be left to users to adjust after the call to corr if they need this behavior? |
Why not just always do the calculation? What is the use case for overriding the correct value with 1? |
What is the correct correlation between a constant series and anything? The most common answer is 0. But this isn't what most software will return because they usually don't specialize the case where one or both values are constant. |
I agree with @bashtage here. If a situation is weird enough that a user wants to pass a kwarg for this, it is just as easy to patch the result directly; no need to bloat the API.
First thing that comes to mind is avoiding floating point error. |
What In the original issue, the user wanted to override the default value of 1. It seems to me there are a few ways of looking at this.
result = df.corr(lambda x,y: myfunc(x,y))
for i in range(len(df.columns)):
result.iloc[i,i] = myfunc(df.iloc[:,i], df.iloc[:,i]) The latter seems a bit awkward, so I would disagree with @jbrockmendel where he says " it is just as easy to patch the result directly" I think it is reasonable to ask @fabianrost84 (OP on the original issue) and @peterpanmj whether:
|
I think as @Dr-Irv mentions In the rare cases a user doesn't want a unitary diagonal and wants to compute something else, I would almost interpret this as a more generalist user defined distance function of their vector space. I.e. a |
@peterpanmj ok i guess should just close this and instead update the doc-string of .corr with an example of how to change the diagonal values. |
I am ok with that. @jreback Should I raise a new issue about updating the doc-string ? |
I want to point out that this is not a rarely met usecase. People are using df.corr(callable=) for purpose other than just correlation. In my case, I want to calculate pairwise distance matrix () for all rows in a dataframe (which is quite common) Now, I can almost achieve this by using df.T.corr(distance.jaccard) I know df.corr is not designed for this. Perhaps it is better to create a new method for this kind of jobs ? |
@peterpanmj It sounds like you are looking for an "N"-way apply. Basically something like |
Yes this is exactly the more generalist description I was trying to get at. You made it sound like this existed already, but think this is a great idea. +1 |
…(#25781)