Skip to content

BUG: Series.groupby.rolling duplicates index when grouping over index #36794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
phofl opened this issue Oct 1, 2020 · 3 comments
Closed
3 tasks done

BUG: Series.groupby.rolling duplicates index when grouping over index #36794

phofl opened this issue Oct 1, 2020 · 3 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member Window rolling, ewma, expanding
Milestone

Comments

@phofl
Copy link
Member

phofl commented Oct 1, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

data = {
    'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
    'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
}
df = pd.DataFrame(data).set_index("groupby_col")
grouped = df.groupby('groupby_col')
rolled = grouped.rolling(4)

result = rolled.mean()
print(result)

Output:

                         agg_col
groupby_col groupby_col         
A           A                NaN
            A                NaN
            A                NaN
            A               0.75
            A               0.50
B           B                NaN
            B                NaN
            B                NaN
            B               0.25
            B               0.25

Problem description

The duplicate groupby_col seems at least odd.

...

Expected Output

Would expect, that we get groupby_col only once in the index and a Series instead of a DataFrame

Maybe related to #36507, but

data = {
    'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
    'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
}
df = pd.DataFrame(data).set_index("groupby_col")
grouped = df.groupby('groupby_col')
result = grouped.apply(sum)
print(result)

works. So I am not sure.

Output of pd.show_versions()

master

@phofl phofl added Bug Needs Triage Issue that has not been reviewed by a pandas team member Window rolling, ewma, expanding labels Oct 1, 2020
@phofl phofl changed the title BUG: Series.groupby.rolling duplicates index when grouping over index BUG: Series.groupby.rolling duplicates index when grouping over index and returns DataFrame instead of Series Oct 1, 2020
@phofl phofl changed the title BUG: Series.groupby.rolling duplicates index when grouping over index and returns DataFrame instead of Series BUG: Series.groupby.rolling duplicates index when grouping over index Oct 2, 2020
@jreback jreback added this to the 1.2 milestone Oct 2, 2020
@mroeschke
Copy link
Member

Though there's redundancy, I am not sure that I'd totally classify it as a bug.

This behavior is consistent before I made changes to groupby.rolling it's own independent calculation (in version 1.0.0). Previously this operation under the hood was equivalent to groupby.apply(lambda x.rolling(...))

In [1]: data = {
   ...:     'groupby_col': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', ],
   ...:     'agg_col': [1, 1, 0, 1, 0, 0, 0, 0, 1, 0],
   ...: }
   ...: df = pd.DataFrame(data).set_index("groupby_col")
   ...: grouped = df.groupby('groupby_col')
   ...: rolled = grouped.rolling(4)
   ...:
   ...: result = rolled.mean()
   ...: print(result)
                         agg_col
groupby_col groupby_col
A           A                NaN
            A                NaN
            A                NaN
            A               0.75
            A               0.50
B           B                NaN
            B                NaN
            B                NaN
            B               0.25
            B               0.25

In [2]: pd.__version__
Out[2]: '1.0.0'

I think the MultiIndex levels should always be comprised by [groupby levels, rolling index] even if both levels are redundant. I think special casing for redundant levels is not worth it, and dropping the redundant level can be left to the user.

@phofl
Copy link
Member Author

phofl commented Oct 2, 2020

Thanks very much for the background information. Closing this issue.

@phofl phofl closed this as completed Oct 2, 2020
@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

yep this makes sense
thanks @mroeschke and @phofl for bringing it up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member Window rolling, ewma, expanding
Projects
None yet
3 participants