Skip to content

Commit fd4632d

Browse files
Backport PR #38816: REGR: groupby.sem with nuisance columns (#38838)
Co-authored-by: Richard Shadrach <[email protected]>
1 parent 9791d78 commit fd4632d

File tree

3 files changed

+14
-6
lines changed

3 files changed

+14
-6
lines changed

doc/source/whatsnew/v1.2.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Fixed regressions
1919
- Bug in repr of float-like strings of an ``object`` dtype having trailing 0's truncated after the decimal (:issue:`38708`)
2020
- Fixed regression in :meth:`DataFrame.groupby()` with :class:`Categorical` grouping column not showing unused categories for ``grouped.indices`` (:issue:`38642`)
2121
- Fixed regression in :meth:`DataFrame.any` and :meth:`DataFrame.all` not returning a result for tz-aware ``datetime64`` columns (:issue:`38723`)
22+
- Fixed regression in :meth:`.GroupBy.sem` where the presence of non-numeric columns would cause an error instead of being dropped (:issue:`38774`)
2223
- :func:`read_excel` does not work for non-rawbyte file handles (issue:`38788`)
2324
- Bug in :meth:`read_csv` with ``float_precision="high"`` caused segfault or wrong parsing of long exponent strings (:issue:`38753`)
2425
-

pandas/core/groupby/groupby.py

+5-6
Original file line numberDiff line numberDiff line change
@@ -1606,12 +1606,11 @@ def sem(self, ddof: int = 1):
16061606
if result.ndim == 1:
16071607
result /= np.sqrt(self.count())
16081608
else:
1609-
cols = result.columns.get_indexer_for(
1610-
result.columns.difference(self.exclusions).unique()
1611-
)
1612-
result.iloc[:, cols] = result.iloc[:, cols] / np.sqrt(
1613-
self.count().iloc[:, cols]
1614-
)
1609+
cols = result.columns.difference(self.exclusions).unique()
1610+
counts = self.count()
1611+
result_ilocs = result.columns.get_indexer_for(cols)
1612+
count_ilocs = counts.columns.get_indexer_for(cols)
1613+
result.iloc[:, result_ilocs] /= np.sqrt(counts.iloc[:, count_ilocs])
16151614
return result
16161615

16171616
@final

pandas/tests/groupby/test_groupby.py

+8
Original file line numberDiff line numberDiff line change
@@ -842,6 +842,14 @@ def test_omit_nuisance(df):
842842
grouped.agg(lambda x: x.sum(0, numeric_only=False))
843843

844844

845+
def test_omit_nuisance_sem(df):
846+
# GH 38774 - sem should work with nuisance columns
847+
grouped = df.groupby("A")
848+
result = grouped.sem()
849+
expected = df.loc[:, ["A", "C", "D"]].groupby("A").sem()
850+
tm.assert_frame_equal(result, expected)
851+
852+
845853
def test_omit_nuisance_python_multiple(three_group):
846854
grouped = three_group.groupby(["A", "B"])
847855

0 commit comments

Comments
 (0)