-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: wrong df.groupby().groups when grouping with [Grouper(freq=), ...] #33132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Using the previous example code, we have: result.grouper
# <pandas.core.groupby.ops.BaseGrouper at 0x7fbdc11255c0>
result.grouper.groupings
# [Grouping(alpha), Grouping(beta)]
result.grouper.groupings[0].grouper
# <pandas.core.groupby.ops.BinGrouper at 0x7fbdd05e0710>
result.grouper.groupings[0].grouper.groupings[0].grouper
# DatetimeIndex(['2020-03-29', '2020-03-30'], dtype='datetime64[ns]', name='alpha', freq=None)
result.grouper.groupings[1].grouper
Index(['C', 'D', 'C', 'D'], dtype='object', name='beta') As discussed in #26326, the issue is in pandas/pandas/core/groupby/ops.py Line 253 in 7673357
This will zip the iteration over BinGrouper ( [Timestamp('2020-03-29 00:00:00', freq='D'), Timestamp('2020-03-30 00:00:00', freq='D')] ),and result.grouper.groupings[1].grouper ( Index(['C', 'D', 'C', 'D'], dtype='object', name='beta') ),and we end up with [(Timestamp('2020-03-29 00:00:00', freq='D'), 'C'), (Timestamp('2020-03-30 00:00:00', freq='D'), 'D')] .
|
I've tried to fix this in the above PR, but it breaks too many things. The basic idea was to make pandas/pandas/core/groupby/ops.py Lines 839 to 843 in 7673357
Any idea on a better approach? |
its possible this is resolved on master and resample has been updated a bit (which is what this ultimately calls). please re-test on master. |
Moving off 1.1, but there's an open PR, so we can add it back if that PR progresses. |
I can confirm that this bug exists on 1.1.3. It's a nasty one, because I was doing something like:
which doesn't break, but instead produces incorrect mean values. |
I believe this is a duplicate of #51158. Confirmed OP is now fixed on main. |
Code
Problem description
This issue is an extension of the bug reported in #26326. The PR #26374 resolved the bug for the case of when we have a nested
BaseGrouper
. Nonetheless, having a nestedBinGrouper
still results in wrong behavior, as can be checked by the above code.Note that
len(result)
is based onlen(result.groups)
, and thatresult.groups
should return the following:The text was updated successfully, but these errors were encountered: