Skip to content

BUG: RollingGroupby when groupby key is in the index #37661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke added Bug Groupby Window rolling, ewma, expanding labels Nov 6, 2020
@mroeschke mroeschke added this to the 1.2 milestone Nov 6, 2020
@rhshadrach
Copy link
Member

This will remove an index label in the case of partial overlap between the index labels and the groupby labels; I am wondering if a more natural behavior here would be "add any groupby labels to the index if they aren't already there".

@mroeschke
Copy link
Member Author

Not sure if I 100% follow, but this is the resulting index behavior I am trying to mimic here:

In [1]: from pandas import *

In [2]:         arrays = [["val1", "val1", "val2"], ["val1", "val1", "val2"]]
   ...:         index = MultiIndex.from_arrays(arrays, names=("idx1", "idx2"))
   ...:
   ...:         df = DataFrame({"A": [1, 1, 2], "B": range(3)}, index=index)

In [3]: df
Out[3]:
           A  B
idx1 idx2
val1 val1  1  0
     val1  1  1
val2 val2  2  2

# idx2 is dropped
In [4]: df.groupby(["idx1", "A"]).mean()
Out[4]:
          B
idx1 A
val1 1  0.5
val2 2  2.0

@rhshadrach
Copy link
Member

Shouldn't groupby(...).rolling behave more like a transform than a reduction? In any case, the behavior here does not mimic a reduction in the case when the group labels are entirely not in the index (please correct me if this is wrong!). This means we are mimicking the behavior of a reduction in one case but not another.

I almost never use rolling in my work so I don't have much of an opinion as to what the right behavior is. Just noticed what I think is an inconsistency.

@mroeschke
Copy link
Member Author

Yes groupby.rolling should essentially act as a transform, and I think in both test cases I added the resulting shape (number of rows) is maintained with consideration that not all columns should be maintained since one of the columns was a grouping key.

In [7]:         arrays = [["val1", "val1", "val2"], ["val1", "val1", "val2"]]
   ...:         index = MultiIndex.from_arrays(arrays, names=("idx1", "idx2"))
   ...:
   ...:         df = DataFrame({"A": [1, 1, 2], "B": range(3)}, index=index)

In [8]: result = df.groupby(["idx1", "A"]).rolling(1).mean()

In [9]: result
Out[9]:
          B
idx1 A
val1 1  0.0
     1  1.0
val2 2  2.0

In [10]: df
Out[10]:
           A  B
idx1 idx2
val1 val1  1  0
     val1  1  1
val2 val2  2  2

@@ -535,6 +535,7 @@ Groupby/resample/rolling
- Bug in :meth:`df.groupby(..).quantile() <pandas.core.groupby.DataFrameGroupBy.quantile>` and :meth:`df.resample(..).quantile() <pandas.core.resample.Resampler.quantile>` raised ``TypeError`` when values were of type ``Timedelta`` (:issue:`29485`)
- Bug in :meth:`Rolling.median` and :meth:`Rolling.quantile` returned wrong values for :class:`BaseIndexer` subclasses with non-monotonic starting or ending points for windows (:issue:`37153`)
- Bug in :meth:`DataFrame.groupby` dropped ``nan`` groups from result with ``dropna=False`` when grouping over a single column (:issue:`35646`, :issue:`35542`)
- Bug in :meth:`RollingGroupby` with the resulting :class:`MultiIndex` when grouping by a label that is in the index (:issue:`37641`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is ok to backport ton 1.1.5 if possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixes a regression from 1.0.5 -> 1.1.0, so backport is preferable.

@simonjayhawkins simonjayhawkins modified the milestones: 1.2, 1.1.5 Nov 9, 2020
@jreback jreback merged commit 87e554d into pandas-dev:master Nov 9, 2020
@jreback
Copy link
Contributor

jreback commented Nov 9, 2020

thanks @mroeschke

@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

@lumberbot-app

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: RollingGroupby duplicates columns in index even with group_keys=False
4 participants