Skip to content

Commit 5340c0b

Browse files
committed
BUG SeriesGroupBy.mean() overflowed on some integer array (#22487)
When integer arrays contained integers that could were outside the range of int64, the conversion would overflow. Instead only allow allow safe casting and if a safe cast can not be done, cast to float64 instead.
1 parent 0976e12 commit 5340c0b

File tree

3 files changed

+16
-1
lines changed

3 files changed

+16
-1
lines changed

doc/source/whatsnew/v0.24.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -754,6 +754,7 @@ Groupby/Resample/Rolling
754754
- Bug in :meth:`Resampler.apply` when passing postiional arguments to applied func (:issue:`14615`).
755755
- Bug in :meth:`Series.resample` when passing ``numpy.timedelta64`` to `loffset` kwarg (:issue:`7687`).
756756
- Bug in :meth:`Resampler.asfreq` when frequency of ``TimedeltaIndex`` is a subperiod of a new frequency (:issue:`13022`).
757+
- Bug in :meth:`SeriesGroupBy.mean` when values were integral but could not fit inside of int64, overflowing instead. (:issue:`22487`)
757758

758759
Sparse
759760
^^^^^^

pandas/core/groupby/ops.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,12 @@ def _cython_operation(self, kind, values, how, axis, min_count=-1,
471471
if (values == iNaT).any():
472472
values = ensure_float64(values)
473473
else:
474-
values = values.astype('int64', copy=False)
474+
try:
475+
values = values.astype('int64', copy=False, casting='safe')
476+
except TypeError:
477+
# At least one of the integers were outside the range of
478+
# int64. Convert to float64 instead.
479+
values = values.astype('float64', copy=False)
475480
elif is_numeric and not is_complex_dtype(values):
476481
values = ensure_float64(values)
477482
else:

pandas/tests/groupby/test_function.py

+9
Original file line numberDiff line numberDiff line change
@@ -1125,3 +1125,12 @@ def h(df, arg3):
11251125
expected = pd.Series([4, 8, 12], index=pd.Int64Index([1, 2, 3]))
11261126

11271127
tm.assert_series_equal(result, expected)
1128+
1129+
1130+
def test_groupby_mean_no_overflow():
1131+
# Regression test for (#22487)
1132+
df = pd.DataFrame({
1133+
"user": ["A", "A", "A", "A", "A"],
1134+
"connections": [4970, 4749, 4719, 4704, 18446744073699999744]
1135+
})
1136+
assert df.groupby('user')['connections'].mean()['A'] == 3689348814740003840

0 commit comments

Comments
 (0)