BUG: Series.groupby.rolling duplicates index when grouping over index #36794

phofl · 2020-10-01T22:50:39Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

data = {
    'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
    'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
}
df = pd.DataFrame(data).set_index("groupby_col")
grouped = df.groupby('groupby_col')
rolled = grouped.rolling(4)

result = rolled.mean()
print(result)

Output:

                         agg_col
groupby_col groupby_col         
A           A                NaN
            A                NaN
            A                NaN
            A               0.75
            A               0.50
B           B                NaN
            B                NaN
            B                NaN
            B               0.25
            B               0.25

Problem description

The duplicate groupby_col seems at least odd.

...

Expected Output

Would expect, that we get groupby_col only once in the index and a Series instead of a DataFrame

Maybe related to #36507, but

data = {
    'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
    'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
}
df = pd.DataFrame(data).set_index("groupby_col")
grouped = df.groupby('groupby_col')
result = grouped.apply(sum)
print(result)

works. So I am not sure.

Output of `pd.show_versions()`

master

The text was updated successfully, but these errors were encountered:

mroeschke · 2020-10-02T22:35:58Z

Though there's redundancy, I am not sure that I'd totally classify it as a bug.

This behavior is consistent before I made changes to groupby.rolling it's own independent calculation (in version 1.0.0). Previously this operation under the hood was equivalent to groupby.apply(lambda x.rolling(...))

In [1]: data = {
   ...:     'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
   ...:     'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
   ...: }
   ...: df = pd.DataFrame(data).set_index("groupby_col")
   ...: grouped = df.groupby('groupby_col')
   ...: rolled = grouped.rolling(4)
   ...:
   ...: result = rolled.mean()
   ...: print(result)
                         agg_col
groupby_col groupby_col
A           A                NaN
            A                NaN
            A                NaN
            A               0.75
            A               0.50
B           B                NaN
            B                NaN
            B                NaN
            B               0.25
            B               0.25

In [2]: pd.__version__
Out[2]: '1.0.0'

I think the MultiIndex levels should always be comprised by [groupby levels, rolling index] even if both levels are redundant. I think special casing for redundant levels is not worth it, and dropping the redundant level can be left to the user.

phofl · 2020-10-02T23:35:01Z

Thanks very much for the background information. Closing this issue.

jreback · 2020-10-02T23:48:00Z

yep this makes sense
thanks @mroeschke and @phofl for bringing it up

phofl added Bug Needs Triage Issue that has not been reviewed by a pandas team member Window rolling, ewma, expanding labels Oct 1, 2020

phofl changed the title ~~BUG: Series.groupby.rolling duplicates index when grouping over index~~ BUG: Series.groupby.rolling duplicates index when grouping over index and returns DataFrame instead of Series Oct 1, 2020

phofl changed the title ~~BUG: Series.groupby.rolling duplicates index when grouping over index and returns DataFrame instead of Series~~ BUG: Series.groupby.rolling duplicates index when grouping over index Oct 2, 2020

phofl mentioned this issue Oct 2, 2020

Rolling on DataFrameGroupBy duplicated index column when part of the grouping cols is from index #36816

Closed

5 tasks

jreback added this to the 1.2 milestone Oct 2, 2020

phofl closed this as completed Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.groupby.rolling duplicates index when grouping over index #36794

BUG: Series.groupby.rolling duplicates index when grouping over index #36794

phofl commented Oct 1, 2020 •

edited

Loading

mroeschke commented Oct 2, 2020

phofl commented Oct 2, 2020

jreback commented Oct 2, 2020 •

edited

Loading

BUG: Series.groupby.rolling duplicates index when grouping over index #36794

BUG: Series.groupby.rolling duplicates index when grouping over index #36794

Comments

phofl commented Oct 1, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

mroeschke commented Oct 2, 2020

phofl commented Oct 2, 2020

jreback commented Oct 2, 2020 • edited Loading

phofl commented Oct 1, 2020 •

edited

Loading

Output of `pd.show_versions()`

jreback commented Oct 2, 2020 •

edited

Loading