Rolling groupby should not maintain the by column in the resulting DataFrame #32262

andreas-vester · 2020-02-26T08:35:11Z

When performing a rolling operation on a groupby object, the index level will be incorrectly maintained in the dataframe.

import numpy as np
import pandas as pd

df = pd.DataFrame({'idx_L1': ['Level_1'] * 10 + ['Level_2'] * 10 + ['Level_3'] * 10,  
                   'idx_L2': ([1] * 5 + [2] * 5 + [3] * 5) * 2,  
                   'vals': np.arange(30)})

df.groupby(['idx_L1', 'idx_L2']).rolling(3).sum()

This issue has been discussed in #14013.

It has been closed eventhough the bug still exists in v1.01.

The text was updated successfully, but these errors were encountered:

jreback · 2020-02-26T11:39:22Z

update the top with a minimal reproducible example

andreas-vester · 2020-02-27T21:56:57Z

When performing a groupby and apply the rolling function, it is my understanding, that idx_L2 should be deleted from the value section.

MarcoGorelli · 2020-02-28T14:01:56Z

Thanks @Kraxelhuber for the report!

To expedite resolution, could you please copy and paste your example into the issue you opened?

MarcoGorelli · 2020-03-01T11:59:19Z

This isn't just confined to rolling, but also to DataFrame.groupby.apply:

>>> import pandas as pd
>>> df = pd.DataFrame({'d': [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], 'v': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})
>>> df.groupby('d').sum()                                                                                   
        v
d        
1.0   6.0
2.0  15.0

>>> df.groupby('d').apply(sum)                                                                              
       d     v
d             
1.0  3.0   6.0
2.0  6.0  15.0

MarcoGorelli · 2020-03-01T12:02:11Z

take

benjamin-ny · 2020-06-10T08:38:29Z

Not sure if it's the same issue, but I see a similar thing with groupby + expanding:

(
    pd.DataFrame({
        'a': [0, 0, 0, 1, 1, 1, 1],
        'val': [10, 12, 11, 105, 103, 109, 109]
    })
    .set_index('a')
    .groupby('a')
    .expanding()
    .max()
)

The same does not happen when using, for instance, .transform(max).

(pandas version 1.0.1)

MarcoGorelli mentioned this issue Feb 28, 2020

BUG: Rolling groupby should not maintain the by column in the resulting DataFrame #32332

Closed

5 tasks

github-actions bot assigned MarcoGorelli Mar 1, 2020

jreback added Groupby Window rolling, ewma, expanding labels Mar 3, 2020

mroeschke added the Bug label May 8, 2020

MarcoGorelli removed their assignment Jun 10, 2020

mroeschke mentioned this issue Oct 11, 2020

ENH: Implement sem for Rolling and Expanding #37043

Merged

5 tasks

mroeschke mentioned this issue Mar 10, 2021

BUG: RollingGroupby no longer keeps the groupby column in the result #40341

Merged

4 tasks

jreback added this to the 1.3 milestone Mar 23, 2021

jreback closed this as completed in #40341 Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling groupby should not maintain the by column in the resulting DataFrame #32262

Rolling groupby should not maintain the by column in the resulting DataFrame #32262

andreas-vester commented Feb 26, 2020 •

edited

Loading

jreback commented Feb 26, 2020

andreas-vester commented Feb 27, 2020

MarcoGorelli commented Feb 28, 2020

MarcoGorelli commented Mar 1, 2020

MarcoGorelli commented Mar 1, 2020

benjamin-ny commented Jun 10, 2020

Rolling groupby should not maintain the by column in the resulting DataFrame #32262

Rolling groupby should not maintain the by column in the resulting DataFrame #32262

Comments

andreas-vester commented Feb 26, 2020 • edited Loading

jreback commented Feb 26, 2020

andreas-vester commented Feb 27, 2020

MarcoGorelli commented Feb 28, 2020

MarcoGorelli commented Mar 1, 2020

MarcoGorelli commented Mar 1, 2020

benjamin-ny commented Jun 10, 2020

andreas-vester commented Feb 26, 2020 •

edited

Loading