Skip to content

Commit 5007426

Browse files
committed
REGR: Expand ValueError catching in series aggregate
Closes pandas-dev#31802 This "fixes" pandas-dev#31802 by expanding the number of cases where we swallow an exception in libreduction. Currently, we're creating an invalid Series in SeriesBinGrouper where the `.mgr_locs` doesn't match the values. See pandas-dev#31802 (comment) for more. For now, we simply catch more cases that fall back to Python. I've gone with a minimal change which addresses only issues hitting this exact exception. We might want to go broader, but that's not clear.
1 parent 787dc8a commit 5007426

File tree

3 files changed

+36
-1
lines changed

3 files changed

+36
-1
lines changed

doc/source/whatsnew/v1.0.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Fixed regressions
2020
- Fixed regression in :meth:`pandas.core.groupby.RollingGroupby.apply` where the ``raw`` parameter was ignored (:issue:`31754`)
2121
- Fixed regression in :meth:`rolling(..).corr() <pandas.core.window.Rolling.corr>` when using a time offset (:issue:`31789`)
2222
- Fixed regression in :meth:`DataFrameGroupBy.nunique` which was modifying the original values if ``NaN`` values were present (:issue:`31950`)
23+
- Fixed regression in ``DataFrame.groupby`` raising a ``ValueError`` from an internal operation (:issue:`31802`)
2324
- Fixed regression where :func:`read_pickle` raised a ``UnicodeDecodeError`` when reading a py27 pickle with :class:`MultiIndex` column (:issue:`31988`).
2425
- Fixed regression in :class:`DataFrame` arithmetic operations with mis-matched columns (:issue:`31623`)
2526
- Fixed regression in :meth:`GroupBy.agg` calling a user-provided function an extra time on an empty input (:issue:`31760`)

pandas/core/groupby/ops.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -639,9 +639,16 @@ def agg_series(self, obj: Series, func):
639639
try:
640640
return self._aggregate_series_fast(obj, func)
641641
except ValueError as err:
642-
if "Function does not reduce" in str(err):
642+
msg = str(err)
643+
if "Function does not reduce" in msg:
643644
# raised in libreduction
644645
pass
646+
elif "Wrong number of items" in msg:
647+
# https://github.com/pandas-dev/pandas/issues/31802
648+
# libreduction.SeriesGrouper can create invalid Series /
649+
# Blocks, which might raise arbitrary exceptions when
650+
# operated upon.
651+
pass
645652
else:
646653
raise
647654
return self._aggregate_series_pure_python(obj, func)

pandas/tests/groupby/test_bin_groupby.py

+27
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
from pandas.core.dtypes.common import ensure_int64
77

8+
import pandas as pd
89
from pandas import Index, Series, isna
910
import pandas._testing as tm
1011

@@ -51,6 +52,32 @@ def test_series_bin_grouper():
5152
tm.assert_almost_equal(counts, exp_counts)
5253

5354

55+
def assert_block_lengths(x):
56+
assert len(x) == len(x._data.blocks[0].mgr_locs)
57+
return 0
58+
59+
60+
def cumsum_max(x):
61+
x.cumsum().max() # triggers the ValueError when creating a block
62+
return 0
63+
64+
65+
@pytest.mark.parametrize(
66+
"func",
67+
[
68+
cumsum_max,
69+
pytest.param(assert_block_lengths, marks=pytest.mark.xfail(reason="debatable")),
70+
],
71+
)
72+
def test_operation_on_invalid_block_passes(func):
73+
# https://github.com/pandas-dev/pandas/issues/31802
74+
# SeriesBinGrouper creates an invalid block, which may
75+
# raise arbitrary exceptions.
76+
df = pd.DataFrame({"A": ["a", "a", "a"], "B": ["a", "b", "b"], "C": [1, 1, 1]})
77+
result = df.groupby(["A", "B"]).agg(func)
78+
assert isinstance(result, pd.DataFrame)
79+
80+
5481
@pytest.mark.parametrize(
5582
"binner,closed,expected",
5683
[

0 commit comments

Comments
 (0)