Skip to content

Commit 52f85da

Browse files
Charlie-XIAOmroeschke
authored andcommitted
BUG: groupby sum turning inf+inf and (-inf)+(-inf) into nan (pandas-dev#53623)
1 parent f18e434 commit 52f85da

File tree

3 files changed

+37
-2
lines changed

3 files changed

+37
-2
lines changed

doc/source/whatsnew/v2.1.0.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -462,8 +462,9 @@ Groupby/resample/rolling
462462
- Bug in :meth:`GroupBy.groups` with a datetime key in conjunction with another key produced incorrect number of group keys (:issue:`51158`)
463463
- Bug in :meth:`GroupBy.quantile` may implicitly sort the result index with ``sort=False`` (:issue:`53009`)
464464
- Bug in :meth:`GroupBy.var` failing to raise ``TypeError`` when called with datetime64, timedelta64 or :class:`PeriodDtype` values (:issue:`52128`, :issue:`53045`)
465-
- Bug in :meth:`SeriresGroupBy.nth` and :meth:`DataFrameGroupBy.nth` after performing column selection when using ``dropna="any"`` or ``dropna="all"`` would not subset columns (:issue:`53518`)
466-
- Bug in :meth:`SeriresGroupBy.nth` and :meth:`DataFrameGroupBy.nth` raised after performing column selection when using ``dropna="any"`` or ``dropna="all"`` resulted in rows being dropped (:issue:`53518`)
465+
- Bug in :meth:`SeriesGroupBy.nth` and :meth:`DataFrameGroupBy.nth` after performing column selection when using ``dropna="any"`` or ``dropna="all"`` would not subset columns (:issue:`53518`)
466+
- Bug in :meth:`SeriesGroupBy.nth` and :meth:`DataFrameGroupBy.nth` raised after performing column selection when using ``dropna="any"`` or ``dropna="all"`` resulted in rows being dropped (:issue:`53518`)
467+
- Bug in :meth:`SeriesGroupBy.sum` and :meth:`DataFrameGroupby.sum` summing ``np.inf + np.inf`` and ``(-np.inf) + (-np.inf)`` to ``np.nan`` (:issue:`53606`)
467468

468469
Reshaping
469470
^^^^^^^^^

pandas/_libs/groupby.pyx

+7
Original file line numberDiff line numberDiff line change
@@ -746,6 +746,13 @@ def group_sum(
746746
y = val - compensation[lab, j]
747747
t = sumx[lab, j] + y
748748
compensation[lab, j] = t - sumx[lab, j] - y
749+
if compensation[lab, j] != compensation[lab, j]:
750+
# GH#53606
751+
# If val is +/- infinity compensation is NaN
752+
# which would lead to results being NaN instead
753+
# of +/- infinity. We cannot use util.is_nan
754+
# because of no gil
755+
compensation[lab, j] = 0
749756
sumx[lab, j] = t
750757

751758
_check_below_mincount(

pandas/tests/groupby/test_libgroupby.py

+27
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
group_cumprod,
77
group_cumsum,
88
group_mean,
9+
group_sum,
910
group_var,
1011
)
1112

@@ -302,3 +303,29 @@ def test_cython_group_mean_Inf_at_begining_and_end():
302303
actual,
303304
expected,
304305
)
306+
307+
308+
@pytest.mark.parametrize(
309+
"values, out",
310+
[
311+
([[np.inf], [np.inf], [np.inf]], [[np.inf], [np.inf]]),
312+
([[np.inf], [np.inf], [-np.inf]], [[np.inf], [np.nan]]),
313+
([[np.inf], [-np.inf], [np.inf]], [[np.inf], [np.nan]]),
314+
([[np.inf], [-np.inf], [-np.inf]], [[np.inf], [-np.inf]]),
315+
],
316+
)
317+
def test_cython_group_sum_Inf_at_begining_and_end(values, out):
318+
# GH #53606
319+
actual = np.array([[np.nan], [np.nan]], dtype="float64")
320+
counts = np.array([0, 0], dtype="int64")
321+
data = np.array(values, dtype="float64")
322+
labels = np.array([0, 1, 1], dtype=np.intp)
323+
324+
group_sum(actual, counts, data, labels, None, is_datetimelike=False)
325+
326+
expected = np.array(out, dtype="float64")
327+
328+
tm.assert_numpy_array_equal(
329+
actual,
330+
expected,
331+
)

0 commit comments

Comments
 (0)