Skip to content

ENH: Allow pairwise calcuation when comparing the column with itself … #43569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ Other enhancements
- :meth:`DataFrame.__pos__`, :meth:`DataFrame.__neg__` now retain ``ExtensionDtype`` dtypes (:issue:`43883`)
- The error raised when an optional dependency can't be imported now includes the original exception, for easier investigation (:issue:`43882`)
- Added :meth:`.ExponentialMovingWindow.sum` (:issue:`13297`)
- :meth:`DataFrame.corr` now accept the argument ``calculate_diagonal`` to allow results returned from callable to be used as diagonal elements of the correlation matrix instead of setting them to ones (:issue:`25781`)

.. ---------------------------------------------------------------------------

Expand Down
9 changes: 8 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -9402,6 +9402,7 @@ def corr(
self,
method: str | Callable[[np.ndarray, np.ndarray], float] = "pearson",
min_periods: int = 1,
calculate_diagonal: bool = False,
) -> DataFrame:
"""
Compute pairwise correlation of columns, excluding NA/null values.
Expand All @@ -9422,6 +9423,12 @@ def corr(
Minimum number of observations required per pair of columns
to have a valid result. Currently only available for Pearson
and Spearman correlation.
calculate_diagonal : bool, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is descriptive, but is there precedent for this type of naming, e.g numpy?

Copy link
Contributor Author

@peterpanmj peterpanmj Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is descriptive, but is there precedent for this type of naming, e.g numpy?

No. I came up with this myself. I'm not familiar with Numpy naming convention. Any suggestion ?

Whether to calculate pairwise correlation using supplied callable.
Ignored when method argument is not callable. If False, pairwise
correlation between a column and itself is default to 1.

.. versionadded:: 1.4.0

Returns
-------
Expand Down Expand Up @@ -9471,7 +9478,7 @@ def corr(
valid = mask[i] & mask[j]
if valid.sum() < min_periods:
c = np.nan
elif i == j:
elif i == j and not calculate_diagonal:
c = 1.0
elif not valid.all():
c = corrf(ac[valid], bc[valid])
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/frame/methods/test_cov_corr.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,21 @@ def test_corr_min_periods_greater_than_length(self, method):
)
tm.assert_frame_equal(result, expected)

@pytest.mark.filterwarnings("ignore: An input array is constant")
@td.skip_if_no_scipy
@pytest.mark.parametrize("array_creator", [np.ones, np.zeros, np.random.random])
def test_corr_diagonal_not_ones(self, array_creator):
from scipy.stats import pearsonr

frame_size = 4
df = DataFrame(array_creator((frame_size, frame_size)))
cor_mat = df.corr(
method=lambda x, y: pearsonr(x, y)[0], calculate_diagonal=True
)
result_diag = [cor_mat.loc[i, i] for i in range(frame_size)]
expected_diag = [pearsonr(df[i], df[i])[0] for i in range(frame_size)]
tm.assert_almost_equal(result_diag, expected_diag)


class TestDataFrameCorrWith:
def test_corrwith(self, datetime_frame):
Expand Down