Skip to content

BUG: RollingGroupby ignored as_index=False #40789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 9, 2021
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,7 @@ Groupby/resample/rolling
- Bug in :class:`core.window.ewm.ExponentialMovingWindow` when calling ``__getitem__`` would not retain ``com``, ``span``, ``alpha`` or ``halflife`` attributes (:issue:`40164`)
- :class:`core.window.ewm.ExponentialMovingWindow` now raises a ``NotImplementedError`` when specifying ``times`` with ``adjust=False`` due to an incorrect calculation (:issue:`40098`)
- Bug in :meth:`Series.asfreq` and :meth:`DataFrame.asfreq` dropping rows when the index is not sorted (:issue:`39805`)
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)

Reshaping
^^^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1926,6 +1926,7 @@ def rolling(self, *args, **kwargs):
self._selected_obj,
*args,
_grouper=self.grouper,
_as_index=self.as_index,
**kwargs,
)

Expand Down
4 changes: 4 additions & 0 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,11 +553,13 @@ def __init__(
obj: FrameOrSeries,
*args,
_grouper=None,
_as_index=True,
**kwargs,
):
if _grouper is None:
raise ValueError("Must pass a Grouper object.")
self._grouper = _grouper
self._as_index = _as_index
# GH 32262: It's convention to keep the grouping column in
# groupby.<agg_func>, but unexpected to users in
# groupby.rolling.<agg_func>
Expand Down Expand Up @@ -619,6 +621,8 @@ def _apply(
)

result.index = result_index
if not self._as_index:
result = result.reset_index(level=list(range(len(groupby_keys))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here when the groupby is on an explicit list, e.g. in your test use groupby(["A", "A", "B", "B"]) instead. What is groupby_keys in this case?

Copy link
Member Author

@mroeschke mroeschke Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the result

In [4]: df.groupby(["A", "A", "B", "B"], as_index=False).rolling(window=2, min_periods=1).mean()
> /Users/matthewroeschke/pandas-mroeschke/pandas/core/window/rolling.py(580)_apply()
-> result_index_names = groupby_keys + grouped_index_name
(Pdb) groupby_keys
[None]
(Pdb) c
Out[4]:
           level_0    num
date
2018-01-01       A  100.0
2018-01-02       A  150.0
2018-01-01       B  150.0
2018-01-02       B  200.0

In [5]: df.groupby(["A", "A", "B", "B"], as_index=False).mean()
Out[5]:
     num
0  150.0
1  200.0

Not sure if the normal groupby has the expected result but appears that groupby.rolling brings the list into the dataframe as a column

return result

def _apply_pairwise(
Expand Down
36 changes: 36 additions & 0 deletions pandas/tests/window/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -732,6 +732,42 @@ def test_groupby_level(self):
)
tm.assert_series_equal(result, expected)

def test_as_index_false(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the multi-key groupby here as well (parameterize if you can)

df_res2 = df.groupby([df.id, df.index.weekday], as_index=False).rolling(window=2, min_periods=1).mean()
df_res3 = df.groupby([df.id]).rolling(window=2, min_periods=1).mean()
df_res4 = df.groupby([df.id], as_index=False).rolling(window=2, min_periods=1).mean()

e.g. 2 & 4 (we likley have 1 & 3 covered, but wouldn't object to those included as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm all these tested (except the as_index=True cases which are tested everywhere else). So you just want it parameterized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they are elsewhere then don't (just the cases that is covering in this PR are fine). parameterize if you can.

# GH 39433
data = [
["A", "2018-01-01", 100.0],
["A", "2018-01-02", 200.0],
["B", "2018-01-01", 150.0],
["B", "2018-01-02", 250.0],
]
df = DataFrame(data, columns=["id", "date", "num"])
df["date"] = to_datetime(df["date"])
df = df.set_index(["date"])

result = (
df.groupby([df.id], as_index=False).rolling(window=2, min_periods=1).mean()
)
expected = DataFrame(
{"id": ["A", "A", "B", "B"], "num": [100.0, 150.0, 150.0, 200.0]},
index=df.index,
)
tm.assert_frame_equal(result, expected)

result = (
df.groupby([df.id, df.index.weekday], as_index=False)
.rolling(window=2, min_periods=1)
.mean()
)
expected = DataFrame(
{
"id": ["A", "A", "B", "B"],
"date": [0, 1, 0, 1],
"num": [100.0, 200.0, 150.0, 250.0],
},
index=df.index,
)
tm.assert_frame_equal(result, expected)


class TestExpanding:
def setup_method(self):
Expand Down