-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[BUG]: Groupy and Resample miscalculated aggregation #36198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
e4d6735
03d3050
bc127f2
fcc11e7
d76f1c5
d4d02d8
087c682
1e43de6
f08d56e
19569c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -314,6 +314,7 @@ Groupby/resample/rolling | |
- Bug in :meth:`DataFrameGroupby.tshift` failing to raise ``ValueError`` when a frequency cannot be inferred for the index of a group (:issue:`35937`) | ||
- Bug in :meth:`DataFrame.groupby` does not always maintain column index name for ``any``, ``all``, ``bfill``, ``ffill``, ``shift`` (:issue:`29764`) | ||
- Bug in :meth:`DataFrameGroupBy.apply` raising error with ``np.nan`` group(s) when ``dropna=False`` (:issue:`35889`) | ||
- Bug when combining methods :meth:`DataFrame.groupby` with :meth:`DataFrame.resample` and restricting to `Series` or using `agg` did miscalculate the aggregation (:issue:`27343`, :issue:`33548`, :issue:`35275`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would write out There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wrote out the two methods but I would like to keep the rest that way, because
produces the same error |
||
- | ||
|
||
Reshaping | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -311,7 +311,7 @@ def _get_grouper(self, obj, validate: bool = True): | |
) | ||
return self.binner, self.grouper, self.obj | ||
|
||
def _set_grouper(self, obj: FrameOrSeries, sort: bool = False): | ||
def _set_grouper(self, obj: FrameOrSeries, sort: bool = False, group_indices: Dict = None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to hand the group_indices over, otherwise we can not know with which index positions our series corresponds. The connection would be lost otherwise. |
||
""" | ||
given an object and the specifications, setup the internal grouper | ||
for this particular specification | ||
|
@@ -327,9 +327,10 @@ def _set_grouper(self, obj: FrameOrSeries, sort: bool = False): | |
if self.key is not None and self.level is not None: | ||
raise ValueError("The Grouper cannot specify both a key and a level!") | ||
|
||
# Keep self.grouper value before overriding | ||
# Keep self.grouper and self.indexer value before overriding | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to keep the indexer too, to have the new order of the Index. |
||
if self._grouper is None: | ||
self._grouper = self.grouper | ||
self._indexer = self.indexer | ||
|
||
# the key must be a valid info item | ||
if self.key is not None: | ||
|
@@ -338,7 +339,11 @@ def _set_grouper(self, obj: FrameOrSeries, sort: bool = False): | |
if getattr(self.grouper, "name", None) == key and isinstance( | ||
obj, ABCSeries | ||
): | ||
ax = self._grouper.take(obj.index) | ||
indices = group_indices.get(obj.name) | ||
if self._indexer is not None: | ||
ax = self._grouper.take(self._indexer.argsort()).take(indices) | ||
else: | ||
ax = self._grouper.take(indices) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the df was not resorted, indexer is None. We have to use indices, because a for example a string index would kill obj.index |
||
else: | ||
if key not in obj._info_axis: | ||
raise KeyError(f"The grouper name {key} is not found") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elabortate what 'miscalculate' means here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things can happen here right now: