Skip to content

Commit f868874

Browse files
committed
REGR: Fixed AssertionError in groupby
Closes pandas-dev#31522
1 parent a9d2450 commit f868874

File tree

3 files changed

+35
-2
lines changed

3 files changed

+35
-2
lines changed

doc/source/whatsnew/v1.0.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Fixed regressions
1919
- Fixed regression when indexing a ``Series`` or ``DataFrame`` indexed by ``DatetimeIndex`` with a slice containg a :class:`datetime.date` (:issue:`31501`)
2020
- Fixed regression in :class:`Series` multiplication when multiplying a numeric :class:`Series` with >10000 elements with a timedelta-like scalar (:issue:`31457`)
2121
- Fixed regression in :meth:`GroupBy.apply` if called with a function which returned a non-pandas non-scalar object (e.g. a list or numpy array) (:issue:`31441`)
22+
- Fixed regression in ``.groupby().agg()`` raising an ``AssertionError`` for some reductions like ``min`` on object-dtype columns (:issue:`31522`)
2223
- Fixed regression in :meth:`to_datetime` when parsing non-nanosecond resolution datetimes (:issue:`31491`)
2324
- Fixed regression in :meth:`~DataFrame.to_csv` where specifying an ``na_rep`` might truncate the values written (:issue:`31447`)
2425
- Fixed regression where setting :attr:`pd.options.display.max_colwidth` was not accepting negative integer. In addition, this behavior has been deprecated in favor of using ``None`` (:issue:`31532`)

pandas/core/groupby/generic.py

+10-2
Original file line numberDiff line numberDiff line change
@@ -1061,8 +1061,16 @@ def _cython_agg_blocks(
10611061
else:
10621062
result = cast(DataFrame, result)
10631063
# unwrap DataFrame to get array
1064-
assert len(result._data.blocks) == 1
1065-
result = result._data.blocks[0].values
1064+
if len(result._data.blocks) != 1:
1065+
# An input (P, N)-shape block was split into P
1066+
# (1, n_groups) blocks. This is problematic since it breaks
1067+
# the assumption that one input block is aggregated
1068+
# to one output block. We should be OK as long as
1069+
# the split output can be put back into a single block below
1070+
assert len(result._data.blocks) == result.shape[1]
1071+
result = np.asarray(result)
1072+
else:
1073+
result = result._data.blocks[0].values
10661074
if isinstance(result, np.ndarray) and result.ndim == 1:
10671075
result = result.reshape(1, -1)
10681076

pandas/tests/groupby/aggregate/test_aggregate.py

+24
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,30 @@ def test_agg_index_has_complex_internals(index):
377377
tm.assert_frame_equal(result, expected)
378378

379379

380+
def test_agg_split_block():
381+
# https://github.com/pandas-dev/pandas/issues/31522
382+
df = pd.DataFrame(
383+
{
384+
"key1": ["a", "a", "b", "b", "a"],
385+
"key2": ["one", "two", "one", "two", "one"],
386+
"key3": ["three", "three", "three", "six", "six"],
387+
"data1": [0.0, 1, 2, 3, 4],
388+
"data2": [0.0, 1, 2, 3, 4],
389+
}
390+
)
391+
result = df.groupby("key1").min()
392+
expected = pd.DataFrame(
393+
{
394+
"key2": ["one", "six"],
395+
"key3": ["one", "six"],
396+
"data1": [0.0, 2.0],
397+
"data2": [0.0, 2.0],
398+
},
399+
index=pd.Index(["a", "b"], name="key1"),
400+
)
401+
tm.assert_frame_equal(result, expected)
402+
403+
380404
class TestNamedAggregationSeries:
381405
def test_series_named_agg(self):
382406
df = pd.Series([1, 2, 3, 4])

0 commit comments

Comments
 (0)