Backport PR #38816: REGR: groupby.sem with nuisance columns (#38838)

meeseeksmachine · rhshadrach · web-flow · commit fd4632dbc7a7 · 2020-12-30T22:02:40.000-05:00
Co-authored-by: Richard Shadrach &lt;45562402+rhshadrach@users.noreply.github.com&gt;
diff --git a/doc/source/whatsnew/v1.2.1.rst b/doc/source/whatsnew/v1.2.1.rst
@@ -19,6 +19,7 @@ Fixed regressions
 - Bug in repr of float-like strings of an ``object`` dtype having trailing 0's truncated after the decimal (:issue:`38708`)
 - Fixed regression in :meth:`DataFrame.groupby()` with :class:`Categorical` grouping column not showing unused categories for ``grouped.indices`` (:issue:`38642`)
 - Fixed regression in :meth:`DataFrame.any` and :meth:`DataFrame.all` not returning a result for tz-aware ``datetime64`` columns (:issue:`38723`)
+- Fixed regression in :meth:`.GroupBy.sem` where the presence of non-numeric columns would cause an error instead of being dropped (:issue:`38774`)
 - :func:`read_excel` does not work for non-rawbyte file handles (issue:`38788`)
 - Bug in :meth:`read_csv` with ``float_precision="high"`` caused segfault or wrong parsing of long exponent strings (:issue:`38753`)
 -
diff --git a/pandas/core/groupby/groupby.py b/pandas/core/groupby/groupby.py
@@ -1606,12 +1606,11 @@ def sem(self, ddof: int = 1):
         if result.ndim == 1:
             result /= np.sqrt(self.count())
         else:
-            cols = result.columns.get_indexer_for(
-                result.columns.difference(self.exclusions).unique()
-            )
-            result.iloc[:, cols] = result.iloc[:, cols] / np.sqrt(
-                self.count().iloc[:, cols]
-            )
+            cols = result.columns.difference(self.exclusions).unique()
+            counts = self.count()
+            result_ilocs = result.columns.get_indexer_for(cols)
+            count_ilocs = counts.columns.get_indexer_for(cols)
+            result.iloc[:, result_ilocs] /= np.sqrt(counts.iloc[:, count_ilocs])
         return result
 
     @final
diff --git a/pandas/tests/groupby/test_groupby.py b/pandas/tests/groupby/test_groupby.py
@@ -842,6 +842,14 @@ def test_omit_nuisance(df):
         grouped.agg(lambda x: x.sum(0, numeric_only=False))
 
 
+def test_omit_nuisance_sem(df):
+    # GH 38774 - sem should work with nuisance columns
+    grouped = df.groupby("A")
+    result = grouped.sem()
+    expected = df.loc[:, ["A", "C", "D"]].groupby("A").sem()
+    tm.assert_frame_equal(result, expected)
+
+
 def test_omit_nuisance_python_multiple(three_group):
     grouped = three_group.groupby(["A", "B"])
 

Original file line number	Diff line number	Diff line change
`@@ -19,6 +19,7 @@ Fixed regressions`
`19`	`19`	- Bug in repr of float-like strings of an ``object`` dtype having trailing 0's truncated after the decimal (:issue:`38708`)
`20`	`20`	- Fixed regression in :meth:`DataFrame.groupby()` with :class:`Categorical` grouping column not showing unused categories for ``grouped.indices`` (:issue:`38642`)
`21`	`21`	- Fixed regression in :meth:`DataFrame.any` and :meth:`DataFrame.all` not returning a result for tz-aware ``datetime64`` columns (:issue:`38723`)
	`22`	+- Fixed regression in :meth:`.GroupBy.sem` where the presence of non-numeric columns would cause an error instead of being dropped (:issue:`38774`)
`22`	`23`	- :func:`read_excel` does not work for non-rawbyte file handles (issue:`38788`)
`23`	`24`	- Bug in :meth:`read_csv` with ``float_precision="high"`` caused segfault or wrong parsing of long exponent strings (:issue:`38753`)
`24`	`25`	`-`