Skip to content

Commit 4cd824d

Browse files
jorisvandenbosschemeeseeksmachine
authored andcommitted
Backport PR pandas-dev#37433: REGR: fix groupby std() with nullable dtypes
1 parent 5498df4 commit 4cd824d

File tree

3 files changed

+37
-1
lines changed

3 files changed

+37
-1
lines changed

doc/source/whatsnew/v1.1.4.rst

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Fixed regressions
2121
- Fixed regression in :meth:`Series.astype` converting ``None`` to ``"nan"`` when casting to string (:issue:`36904`)
2222
- Fixed regression in :class:`RollingGroupby` causing a segmentation fault with Index of dtype object (:issue:`36727`)
2323
- Fixed regression in :meth:`DataFrame.resample(...).apply(...)` raised ``AttributeError`` when input was a :class:`DataFrame` and only a :class:`Series` was evaluated (:issue:`36951`)
24+
- Fixed regression in ``DataFrame.groupby(..).std()`` with nullable integer dtype (:issue:`37415`)
2425
- Fixed regression in :class:`PeriodDtype` comparing both equal and unequal to its string representation (:issue:`37265`)
2526
- Fixed regression where slicing :class:`DatetimeIndex` raised :exc:`AssertionError` on irregular time series with ``pd.NaT`` or on unsorted indices (:issue:`36953` and :issue:`35509`)
2627
- Fixed regression in certain offsets (:meth:`pd.offsets.Day() <pandas.tseries.offsets.Day>` and below) no longer being hashable (:issue:`37267`)

pandas/core/groupby/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2489,9 +2489,9 @@ def _get_cythonized_result(
24892489
except TypeError as e:
24902490
error_msg = str(e)
24912491
continue
2492+
vals = vals.astype(cython_dtype, copy=False)
24922493
if needs_2d:
24932494
vals = vals.reshape((-1, 1))
2494-
vals = vals.astype(cython_dtype, copy=False)
24952495
func = partial(func, vals)
24962496

24972497
func = partial(func, labels)

pandas/tests/groupby/aggregate/test_cython.py

+35
Original file line numberDiff line numberDiff line change
@@ -277,3 +277,38 @@ def test_read_only_buffer_source_agg(agg):
277277
expected = df.copy().groupby(["species"]).agg({"sepal_length": agg})
278278

279279
tm.assert_equal(result, expected)
280+
281+
282+
@pytest.mark.parametrize(
283+
"op_name",
284+
[
285+
"count",
286+
"sum",
287+
"std",
288+
"var",
289+
"sem",
290+
"mean",
291+
"median",
292+
"prod",
293+
"min",
294+
"max",
295+
],
296+
)
297+
def test_cython_agg_nullable_int(op_name):
298+
# ensure that the cython-based aggregations don't fail for nullable dtype
299+
# (eg https://github.com/pandas-dev/pandas/issues/37415)
300+
df = DataFrame(
301+
{
302+
"A": ["A", "B"] * 5,
303+
"B": pd.array([1, 2, 3, 4, 5, 6, 7, 8, 9, pd.NA], dtype="Int64"),
304+
}
305+
)
306+
result = getattr(df.groupby("A")["B"], op_name)()
307+
df2 = df.assign(B=df["B"].astype("float64"))
308+
expected = getattr(df2.groupby("A")["B"], op_name)()
309+
310+
if op_name != "count":
311+
# the result is not yet consistently using Int64/Float64 dtype,
312+
# so for now just checking the values by casting to float
313+
result = result.astype("float64")
314+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)