Skip to content

Add cookbook entry for triangular correlation matrix (closes #22840) #23032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 3, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1226,6 +1226,17 @@ Computation
Correlation
***********

Often it's useful to obtain the lower (or upper) triangular form of a correlation matrix calculated from :func:`DataFrame.corr`. This can be achieved by passing a boolean mask to ``where`` as follows:

.. ipython:: python

df = pd.DataFrame(np.random.random(size=(100, 5)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the size a lot smaller (like (4, 4) max)? corr_mat would be huge otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think corr_mat is only 5 x 5 here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, my mistake. Do you mind rendering cookbook.rst and seeing what this section looks like? 5 x 5 should be relatively compact but just want to make sure the rendering isn't truncating any columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's rendering well:

In [175]: df = pd.DataFrame(np.random.random(size=(100, 5)))

In [176]: corr_mat = df.corr()

In [177]: mask = np.tril(np.ones_like(corr_mat, dtype=np.bool), k=-1)

In [178]: corr_mat.where(mask)
Out[178]: 
          0         1         2         3   4
0       NaN       NaN       NaN       NaN NaN
1  0.100443       NaN       NaN       NaN NaN
2  0.012441 -0.068965       NaN       NaN NaN
3  0.009641  0.078722 -0.067531       NaN NaN
4 -0.065089 -0.156980 -0.004463  0.075126 NaN


corr_mat = df.corr()
mask = np.tril(np.ones_like(corr_mat, dtype=np.bool), k=-1)

corr_mat.where(mask)

The `method` argument within `DataFrame.corr` can accept a callable in addition to the named correlation types. Here we compute the `distance correlation <https://en.wikipedia.org/wiki/Distance_correlation>`__ matrix for a `DataFrame` object.

.. code-block:: python
Expand Down