Skip to content

ENH: Allow option to return lower triangular correlation matrix in DataFrame.corr #22840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsaxton opened this issue Sep 26, 2018 · 2 comments · Fixed by #23032
Closed

ENH: Allow option to return lower triangular correlation matrix in DataFrame.corr #22840

dsaxton opened this issue Sep 26, 2018 · 2 comments · Fixed by #23032
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement
Milestone

Comments

@dsaxton
Copy link
Member

dsaxton commented Sep 26, 2018

In a lot of situations it's useful to remove the redundant or trivial (diagonal) entries from a correlation matrix. For example, if you want to query a correlation matrix to find highly or perfectly correlated columns you have to find some way of manually ignoring the diagonal, or if you want to remove only one of two highly correlated columns, it's useful to be working only with the lower triangular form.

Therefore I think it would be very useful to add a boolean argument to DataFrame.corr which specifies if the lower triangular correlation matrix should be returned (if the upper triangular is desired the user can take the transpose) instead of the full.

@mroeschke
Copy link
Member

Copying from the #22842 (comment)

  1. If we were to include a tri argument, I would prefer a tri=None(default)/'upper'/'lower' instead of tri=True/False for just the lower triangular so users can have the option of the upper triangular.

  2. In your implementation, it appears fill value would be NaN as opposed to numpy (triu/trul) which the fill value is 0. I am not sure what the preferred fill value should be, but in general we try to stay consistent with numpy conventions.

@mroeschke mroeschke added Enhancement Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Sep 26, 2018
@aliobeid
Copy link

aliobeid commented Oct 8, 2018

Pearson correlation is commutative i.e. corr(X,Y) = corr(Y,X). It makes no sense to calculate anything other than one of the triangles

@jreback jreback added this to the 0.24.0 milestone Nov 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement
Projects
None yet
4 participants