Skip to content

Commit 7eb8f3a

Browse files
author
MarcoGorelli
committed
Revert "PERF: faster corrwith method for pearson and spearman correlation when other is a Series and axis = 0 (column-wise) (pandas-dev#46174)"
This reverts commit 5efb570.
1 parent 8608ac9 commit 7eb8f3a

File tree

2 files changed

+5
-33
lines changed

2 files changed

+5
-33
lines changed

doc/source/whatsnew/v1.5.0.rst

+4
Original file line numberDiff line numberDiff line change
@@ -943,8 +943,12 @@ Other Deprecations
943943

944944
Performance improvements
945945
~~~~~~~~~~~~~~~~~~~~~~~~
946+
<<<<<<< HEAD
946947
- Performance improvement in :meth:`DataFrame.corrwith` for column-wise (axis=0) Pearson and Spearman correlation when other is a :class:`Series` (:issue:`46174`)
947948
- Performance improvement in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`)
949+
=======
950+
- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`)
951+
>>>>>>> parent of 5efb570ec3 (PERF: faster corrwith method for pearson and spearman correlation when other is a Series and axis = 0 (column-wise) (#46174))
948952
- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`)
949953
- Performance improvement in :meth:`.DataFrameGroupBy.diff` and :meth:`.SeriesGroupBy.diff` (:issue:`16706`)
950954
- Performance improvement in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`)

pandas/core/frame.py

+1-33
Original file line numberDiff line numberDiff line change
@@ -10577,40 +10577,8 @@ def corrwith(
1057710577
if numeric_only is lib.no_default and len(this.columns) < len(self.columns):
1057810578
com.deprecate_numeric_only_default(type(self), "corrwith")
1057910579

10580-
# GH46174: when other is a Series object and axis=0, we achieve a speedup over
10581-
# passing .corr() to .apply() by taking the columns as ndarrays and iterating
10582-
# over the transposition row-wise. Then we delegate the correlation coefficient
10583-
# computation and null-masking to np.corrcoef and np.isnan respectively,
10584-
# which are much faster. We exploit the fact that the Spearman correlation
10585-
# of two vectors is equal to the Pearson correlation of their ranks to use
10586-
# substantially the same method for Pearson and Spearman,
10587-
# just with intermediate argsorts on the latter.
1058810580
if isinstance(other, Series):
10589-
if axis == 0 and method in ["pearson", "spearman"]:
10590-
corrs = {}
10591-
if numeric_only:
10592-
cols = self.select_dtypes(include=np.number).columns
10593-
ndf = self[cols].values.transpose()
10594-
else:
10595-
cols = self.columns
10596-
ndf = self.values.transpose()
10597-
k = other.values
10598-
if method == "pearson":
10599-
for i, r in enumerate(ndf):
10600-
nonnull_mask = ~np.isnan(r) & ~np.isnan(k)
10601-
corrs[cols[i]] = np.corrcoef(r[nonnull_mask], k[nonnull_mask])[
10602-
0, 1
10603-
]
10604-
else:
10605-
for i, r in enumerate(ndf):
10606-
nonnull_mask = ~np.isnan(r) & ~np.isnan(k)
10607-
corrs[cols[i]] = np.corrcoef(
10608-
r[nonnull_mask].argsort().argsort(),
10609-
k[nonnull_mask].argsort().argsort(),
10610-
)[0, 1]
10611-
return Series(corrs)
10612-
else:
10613-
return this.apply(lambda x: other.corr(x, method=method), axis=axis)
10581+
return this.apply(lambda x: other.corr(x, method=method), axis=axis)
1061410582

1061510583
if numeric_only_bool:
1061610584
other = other._get_numeric_data()

0 commit comments

Comments
 (0)