-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: RecursionError using agg
on a resampled SeriesGroupBy
#42905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just ran your example and got ---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-1-7682602f0189> in <module>
27
28 # This will exit Python
---> 29 b = a\
30 .set_index('date')\
31 .groupby('class')\
~/anaconda3/lib/python3.8/site-packages/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
332 def aggregate(self, func, *args, **kwargs):
333
--> 334 result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
335 if result is None:
336 how = func
~/anaconda3/lib/python3.8/site-packages/pandas/core/apply.py in agg(self)
162 elif is_list_like(arg):
163 # we require a list, but not a 'str'
--> 164 return self.agg_list_like()
165
166 if callable(arg):
~/anaconda3/lib/python3.8/site-packages/pandas/core/apply.py in agg_list_like(self)
353 colg = obj._gotitem(col, ndim=1, subset=selected_obj.iloc[:, index])
354 try:
--> 355 new_res = colg.aggregate(arg)
356 except (TypeError, DataError):
357 pass
... last 3 frames repeated, from the frame below ...
~/anaconda3/lib/python3.8/site-packages/pandas/core/resample.py in aggregate(self, func, *args, **kwargs)
332 def aggregate(self, func, *args, **kwargs):
333
--> 334 result = ResamplerWindowApply(self, func, args=args, kwargs=kwargs).agg()
335 if result is None:
336 how = func
RecursionError: maximum recursion depth exceeded while calling a Python object the problem doesn't happen when I do c = a\
.set_index('date')\
.groupby('class')\
.resample('M') \
.sum() or c = a\
.set_index('date')\
.groupby('class')\
.resample('M') \
.agg('count') |
agg
on a resampled SeriesGroupBy exits Python without tracebackagg
on a resampled SeriesGroupBy
On pandas 1.2.5, the first example (b=..) gives the expected output. The second example (c=...) raises I'll label as a regression for now pending further investigation. |
first bad commit: [212323f] BUG: DataFrame.agg and apply with 'size' returns a scalar (#39935) after this commit the error was
cc @rhshadrach |
Thanks @manoelpqueiroz for the report. I'm seeing the same with a git bisect as @simonjayhawkins reported, but don't quite understand it yet. The method I ran a bisect to see where this method started to return a DataFrame with:
and found first bad commit is a222322, cc @jbrockmendel |
This is pretty nasty. Best guess is that in GroupByMixin._gotitem when we do groupby = self._groupby[key] and catch IndexError, we are catching cases that we shouldn't be. |
Thanks @jbrockmendel - that was pretty much it. I think I have a good resolution here, PR going up shortly. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
When you mix
resample
withgroupby
and try to use theagg
method to supply multiple functions to either a DataFrameGroupBy or SeriesGroupBy, Python suddently exits without even raising an error.I first thought I was running into this because I was supplying a single column expecting a DataFrame with multiple columns, but I can confirm this happens to me whether I provide a column (variable
b
) or apply the method to the entire GroupBy (variablec
):Code Sample
Problem description
I'm not sure if this method is supported for instances of
DatetimeIndexResamplerGroupby
objects, but calling it without arguments is valid, giving:Also, while the problem arises with either a Series or a DataFrame, given that using
agg
with multiple functions on aSeriesGroupBy
will correctly create a DataFrame, I would expect the same to happen when resampling with timestamps:Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : c7f7443
python : 3.9.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : pt_BR.cp1252
pandas : 1.3.1
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.1
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : 3.5.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : 1.0.8
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: