Skip to content

Commit 6796bf4

Browse files
committed
BUG SeriesGroupBy.mean() overflowed on some integer array (pandas-dev#22487)
When integer arrays contained integers that could were outside the range of int64, the conversion would overflow. Instead only allow allow safe casting and if a safe cast can not be done, cast to float64 instead.
1 parent 0976e12 commit 6796bf4

File tree

3 files changed

+16
-1
lines changed

3 files changed

+16
-1
lines changed

doc/source/whatsnew/v0.24.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -754,6 +754,7 @@ Groupby/Resample/Rolling
754754
- Bug in :meth:`Resampler.apply` when passing postiional arguments to applied func (:issue:`14615`).
755755
- Bug in :meth:`Series.resample` when passing ``numpy.timedelta64`` to `loffset` kwarg (:issue:`7687`).
756756
- Bug in :meth:`Resampler.asfreq` when frequency of ``TimedeltaIndex`` is a subperiod of a new frequency (:issue:`13022`).
757+
- Bug in :meth:`SeriesGroupBy.mean` when values where integral but could not fit inside of int64, overflowing instead. (:issues:`22487`)
757758

758759
Sparse
759760
^^^^^^

pandas/core/groupby/ops.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1,
471471
if (values == iNaT).any():
472472
values = ensure_float64(values)
473473
else:
474-
values = values.astype('int64', copy=False)
474+
try:
475+
values = values.astype('int64', copy=False, casting='safe')
476+
except TypeError:
477+
# At least one of the integers were outside the range of
478+
# int64. Convert to float64 instead.
479+
values = values.astype('float64', copy=False)
475480
elif is_numeric and not is_complex_dtype(values):
476481
values = ensure_float64(values)
477482
else:

pandas/tests/arrays/test_integer.py

+9
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,15 @@ def test_groupby_mean_included():
603603
tm.assert_frame_equal(result, expected)
604604

605605

606+
def test_groupby_mean_no_overflow():
607+
# Regression test for (#22487)
608+
df = pd.DataFrame({
609+
"user": ["A", "A", "A", "A", "A"],
610+
"connections": [4970, 4749, 4719, 4704, 18446744073699999744]
611+
})
612+
assert df.groupby('user')['connections'].mean()['A'] == 3689348814740003840
613+
614+
606615
def test_astype_nansafe():
607616
# https://github.com/pandas-dev/pandas/pull/22343
608617
arr = integer_array([np.nan, 1, 2], dtype="Int8")

0 commit comments

Comments
 (0)