Skip to content

Rolling groupby should not maintain the by column in the resulting DataFrame #32262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andreas-vester opened this issue Feb 26, 2020 · 6 comments · Fixed by #40341
Closed

Rolling groupby should not maintain the by column in the resulting DataFrame #32262

andreas-vester opened this issue Feb 26, 2020 · 6 comments · Fixed by #40341
Labels
Bug Groupby Window rolling, ewma, expanding
Milestone

Comments

@andreas-vester
Copy link

andreas-vester commented Feb 26, 2020

When performing a rolling operation on a groupby object, the index level will be incorrectly maintained in the dataframe.

import numpy as np
import pandas as pd

df = pd.DataFrame({'idx_L1': ['Level_1'] * 10 + ['Level_2'] * 10 + ['Level_3'] * 10,  
                   'idx_L2': ([1] * 5 + [2] * 5 + [3] * 5) * 2,  
                   'vals': np.arange(30)})

df.groupby(['idx_L1', 'idx_L2']).rolling(3).sum()

This issue has been discussed in #14013.

It has been closed eventhough the bug still exists in v1.01.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2020

update the top with a minimal reproducible example

@andreas-vester
Copy link
Author

image

image

When performing a groupby and apply the rolling function, it is my understanding, that idx_L2 should be deleted from the value section.

@MarcoGorelli
Copy link
Member

Thanks @Kraxelhuber for the report!

To expedite resolution, could you please copy and paste your example into the issue you opened?

@MarcoGorelli
Copy link
Member

This isn't just confined to rolling, but also to DataFrame.groupby.apply:

>>> import pandas as pd
>>> df = pd.DataFrame({'d': [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], 'v': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.groupby('d').sum()                                                                                   
        v
d        
1.0   6.0
2.0  15.0

>>> df.groupby('d').apply(sum)                                                                              
       d     v
d             
1.0  3.0   6.0
2.0  6.0  15.0

@MarcoGorelli
Copy link
Member

take

@jreback jreback added Groupby Window rolling, ewma, expanding labels Mar 3, 2020
@mroeschke mroeschke added the Bug label May 8, 2020
@benjamin-ny
Copy link

Not sure if it's the same issue, but I see a similar thing with groupby + expanding:

(
    pd.DataFrame({
        'a': [0, 0, 0, 1, 1, 1, 1],
        'val': [10, 12, 11, 105, 103, 109, 109]
    })
    .set_index('a')
    .groupby('a')
    .expanding()
    .max()
)

Bildschirmfoto 2020-06-10 um 10 35 36

The same does not happen when using, for instance, .transform(max).

(pandas version 1.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Window rolling, ewma, expanding
Projects
None yet
5 participants