Skip to content

Commit b545bf2

Browse files
author
MarcoGorelli
committed
Revert "PERF: faster corrwith method for pearson and spearman correlation when other is a Series and axis = 0 (column-wise) (#46174)"
This reverts commit 5efb570.
1 parent cef9e9a commit b545bf2

File tree

2 files changed

+2
-42
lines changed

2 files changed

+2
-42
lines changed

doc/source/whatsnew/v1.5.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -654,7 +654,7 @@ Deprecations
654654
In the next major version release, 2.0, several larger API changes are being considered without a formal deprecation such as
655655
making the standard library `zoneinfo <https://docs.python.org/3/library/zoneinfo.html>`_ the default timezone implementation instead of ``pytz``,
656656
having the :class:`Index` support all data types instead of having multiple subclasses (:class:`CategoricalIndex`, :class:`Int64Index`, etc.), and more.
657-
The changes under consideration are logged in `this Github issue <https://github.com/pandas-dev/pandas/issues/44823>`_, and any
657+
The changes under consideration are logged in `this GitHub issue <https://github.com/pandas-dev/pandas/issues/44823>`_, and any
658658
feedback or concerns are welcome.
659659

660660
.. _whatsnew_150.deprecations.int_slicing_series:

pandas/core/frame.py

+1-41
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,6 @@
162162
from pandas.core.array_algos.take import take_2d_multi
163163
from pandas.core.arraylike import OpsMixin
164164
from pandas.core.arrays import (
165-
BaseMaskedArray,
166165
DatetimeArray,
167166
ExtensionArray,
168167
PeriodArray,
@@ -10581,47 +10580,8 @@ def corrwith(
1058110580
if numeric_only is lib.no_default and len(this.columns) < len(self.columns):
1058210581
com.deprecate_numeric_only_default(type(self), "corrwith")
1058310582

10584-
# GH46174: when other is a Series object and axis=0, we achieve a speedup over
10585-
# passing .corr() to .apply() by taking the columns as ndarrays and iterating
10586-
# over the transposition row-wise. Then we delegate the correlation coefficient
10587-
# computation and null-masking to np.corrcoef and np.isnan respectively,
10588-
# which are much faster. We exploit the fact that the Spearman correlation
10589-
# of two vectors is equal to the Pearson correlation of their ranks to use
10590-
# substantially the same method for Pearson and Spearman,
10591-
# just with intermediate argsorts on the latter.
1059210583
if isinstance(other, Series):
10593-
if axis == 0 and method in ["pearson", "spearman"]:
10594-
corrs = {}
10595-
if numeric_only:
10596-
cols = self.select_dtypes(include=np.number).columns
10597-
else:
10598-
cols = self.columns
10599-
k = other.values
10600-
k_mask = ~other.isna()
10601-
if isinstance(k, BaseMaskedArray):
10602-
k = k._data
10603-
if method == "pearson":
10604-
for col in cols:
10605-
val = self[col].values
10606-
nonnull_mask = ~self[col].isna() & k_mask
10607-
if isinstance(val, BaseMaskedArray):
10608-
val = val._data
10609-
corrs[col] = np.corrcoef(val[nonnull_mask], k[nonnull_mask])[
10610-
0, 1
10611-
]
10612-
else:
10613-
for col in cols:
10614-
val = self[col].values
10615-
nonnull_mask = ~self[col].isna() & k_mask
10616-
if isinstance(val, BaseMaskedArray):
10617-
val = val._data
10618-
corrs[col] = np.corrcoef(
10619-
libalgos.rank_1d(val[nonnull_mask]),
10620-
libalgos.rank_1d(k[nonnull_mask]),
10621-
)[0, 1]
10622-
return Series(corrs)
10623-
else:
10624-
return this.apply(lambda x: other.corr(x, method=method), axis=axis)
10584+
return this.apply(lambda x: other.corr(x, method=method), axis=axis)
1062510585

1062610586
if numeric_only_bool:
1062710587
other = other._get_numeric_data()

0 commit comments

Comments
 (0)