BUG: pd.Series.mode fails (sometimes) as aggregation method on groupby object #41368

Huite · 2021-05-07T14:25:33Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

This popped up randomly for some of my dataframes, but only a few.
EDIT: Thanks to @mzeitlin11 below for a properly minimal example.

import pandas as pd

df = pd.DataFrame([1, 1, 2, 2])
df.groupby([0, 0, 0, 0]).agg([pd.Series.mode])

This results in a ValueError: no results:

ValueError                                Traceback (most recent call last)
c:\tmp\zonal-mode\debug.py in 
----> 7 selection.groupby("id").agg([pd.Series.mode])

C:\Miniconda3\envs\main\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    943         func = maybe_mangle_lambdas(func)
    944 
--> 945         result, how = aggregate(self, func, *args, **kwargs)
    946         if how is None:
    947             return result

C:\Miniconda3\envs\main\lib\site-packages\pandas\core\aggregation.py in aggregate(obj, arg, *args, **kwargs)
    584         # we require a list, but not an 'str'
    585         arg = cast(List[AggFuncTypeBase], arg)
--> 586         return agg_list_like(obj, arg, _axis=_axis), None
    587     else:
    588         result = None

C:\Miniconda3\envs\main\lib\site-packages\pandas\core\aggregation.py in agg_list_like(obj, arg, _axis)
    672         raise ValueError("no results")
    673 
--> 674     try:
    675         return concat(results, keys=keys, axis=1, sort=False)
    676     except TypeError as err:

ValueError: no results

other methods seem to work okay:

selection.groupby("id").agg([pd.Series.sum])
selection.groupby("id").agg([pd.Series.mean])
selection.groupby("id").agg([pd.Series.std])
selection.groupby("id").agg(["count"])

Oddly enough, slightly changing the selection makes it run:

selection.iloc[100:].groupby("id").agg([pd.Series.mode])  # works
selection.iloc[:900].groupby("id").agg([pd.Series.mode])  # works
selection.iloc[:917].groupby("id").agg([pd.Series.mode])  # works
selection.iloc[:918].groupby("id").agg([pd.Series.mode])  # errors
selection.iloc[18:918].groupby("id").agg([pd.Series.mode])  # works
selection.iloc[17:918].groupby("id").agg([pd.Series.mode])  # errors

scipy.stats.mode also works fine:

import scipy.stats
selection.groupby("id").agg([scipy.stats.mode])

Problem description

I don't understand why it seems to randomly fail, there are no oddities with the data as far as I can tell.
Given the difference the selections make, I guess it's related with the specific number of occurences, which would suggest something internally in implementation of pd.Series.mode.

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Dutch_Netherlands.1252

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 49.6.0.post20210108
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : 3.5.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.23.1
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : 2021.04.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.18.0
xlrd : None
xlwt : None
numba : 0.52.0

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2021-05-07T14:29:22Z

Thanks @Huite for your report - to expedite resolution, could you please make your example copy-and-pasteable (i.e. without requiring external data to be downloaded)? E.g. see here #41320 for such a bug report

Huite · 2021-05-07T14:51:01Z

Hi @MarcoGorelli, I've updated the example. I can sympathize with not downloading my random files, I wasn't sure given the 1000 lines. I've put it behind another one of those <details> sections, and put an abridged example in the main body of the post.

MarcoGorelli · 2021-05-07T15:04:55Z

Thanks @Huite - what I meant was, does the problem only appear if you have 1000 lines? If not, could you try narrowing it down, so that there's a small reproducible example (such as the one in #41320 ) which could be used as a test

mzeitlin11 · 2021-05-07T15:15:23Z

Skipping the fallback logic reveals what looks like the actual error is:

Traceback (most recent call last):
  File "/Users/matthewzeitlin/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-af0500c19e92>", line 1, in <module>
    sgrouper.get_result()
  File "pandas/_libs/reduction.pyx", line 279, in pandas._libs.reduction.SeriesGrouper.get_result
    res, initialized = self._apply_to_group(cached_series, cached_index,
  File "pandas/_libs/reduction.pyx", line 95, in pandas._libs.reduction._BaseGrouper._apply_to_group
    check_result_array(res, cached_series.dtype)
  File "pandas/_libs/reduction.pyx", line 38, in pandas._libs.reduction.check_result_array
    raise ValueError("Must produce aggregated value")
ValueError: Must produce aggregated value

I think the root cause here is the same as #38534.

Regardless of that this error message isn't overly helpful - would be nice to report why there are no results with more details.

mzeitlin11 · 2021-05-07T15:18:38Z

An MRE would be

df = pd.DataFrame([1, 1, 2, 2])
df.groupby([0, 0, 0, 0]).agg([pd.Series.mode])

or

df = pd.DataFrame([1, 1, 2, 2])
df.groupby([0, 0, 0, 0]).agg(pd.Series.mode)

which gives a more informative but not a great user-facing error:

Traceback (most recent call last):
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/try.py", line 15, in <module>
    df.groupby([0, 0, 0, 0]).agg(pd.Series.mode)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/groupby/generic.py", line 1008, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/groupby/groupby.py", line 1280, in _python_agg_general
    return self._python_apply_general(f, self._selected_obj)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/groupby/groupby.py", line 1249, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/groupby/ops.py", line 777, in apply
    result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/groupby/ops.py", line 1293, in fast_apply
    return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
  File "pandas/_libs/reduction.pyx", line 381, in pandas._libs.reduction.apply_frame_axis0
    piece = f(chunk)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/groupby/groupby.py", line 1258, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/series.py", line 1949, in mode
    return algorithms.mode(self, dropna=dropna)
  File "/Users/matthewzeitlin/Code/contrib/pandas-mzeitlin11/pandas/core/algorithms.py", line 977, in mode
    result = f(values, dropna=dropna)
  File "pandas/_libs/hashtable_func_helper.pxi", line 1764, in pandas._libs.hashtable.mode_int64
    def mode_int64(const int64_t[:] values, bint dropna):
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Huite · 2021-05-07T15:42:31Z

Thanks @mzeitlin11, I've included your MRE in the original post instead of my, ehm, rather non-minimal example.

This basically makes this a duplicate of #38534, although I agree it would also be worthwhile to replace ValueError: no results by something more informative -- but that should probably be a separate issue.

lithomas1 · 2021-05-07T19:59:59Z

@Huite OK closing as duplicate then. It's probably better to track all discussion there.

Huite added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 7, 2021

lithomas1 added Groupby Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 7, 2021

lithomas1 closed this as completed May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.Series.mode fails (sometimes) as aggregation method on groupby object #41368

BUG: pd.Series.mode fails (sometimes) as aggregation method on groupby object #41368

Huite commented May 7, 2021 •

edited

Loading

INSTALLED VERSIONS

MarcoGorelli commented May 7, 2021

Huite commented May 7, 2021

MarcoGorelli commented May 7, 2021

mzeitlin11 commented May 7, 2021

mzeitlin11 commented May 7, 2021 •

edited

Loading

Huite commented May 7, 2021 •

edited

Loading

lithomas1 commented May 7, 2021 •

edited

Loading

BUG: pd.Series.mode fails (sometimes) as aggregation method on groupby object #41368

BUG: pd.Series.mode fails (sometimes) as aggregation method on groupby object #41368

Comments

Huite commented May 7, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

MarcoGorelli commented May 7, 2021

Huite commented May 7, 2021

MarcoGorelli commented May 7, 2021

mzeitlin11 commented May 7, 2021

mzeitlin11 commented May 7, 2021 • edited Loading

Huite commented May 7, 2021 • edited Loading

lithomas1 commented May 7, 2021 • edited Loading

Huite commented May 7, 2021 •

edited

Loading

Output of `pd.show_versions()`

mzeitlin11 commented May 7, 2021 •

edited

Loading

Huite commented May 7, 2021 •

edited

Loading

lithomas1 commented May 7, 2021 •

edited

Loading