Skip to content

BUG: GroupByRolling duplicates MultiIndex levels, which prevents assignment on parent #58444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
drmarshall opened this issue Apr 26, 2024 · 1 comment
Closed
2 of 3 tasks
Labels
Bug Duplicate Report Duplicate issue or pull request Window rolling, ewma, expanding

Comments

@drmarshall
Copy link

drmarshall commented Apr 26, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame({
    'x': {('a', i): i ** 2 for i in range(5)}
})

print(df.x.groupby(level=0).rolling(2).sum().index)

'''
MultiIndex([('a', 'a', 0),
            ('a', 'a', 1),
            ('a', 'a', 2),
            ('a', 'a', 3),
            ('a', 'a', 4),
           )
'''

# raises: AssertionError: Length of new_levels (3) must be <= self.nlevels (2)

df['new_col'] = df.x.groupby(level=0).rolling(2).sum()

Issue Description

When performing a groupby.rolling using a level parameter, an additional level(s) are inserted into the MultiIndex. The result is an index incompatible with original frame (raises: AssertionError: Length of new_levels (3) must be <= self.nlevels (2))

Expected Behavior

Groupby(level=n).rolling should return compatible MultiIndex for assignment to derived DataFrame, to allow ergonomics such as:

df['new_col'] = df.x.groupby(level=0).rolling(2).sum()

This behavior should be roughly the same as grouping by column(s) string names:

print(df.reset_index().groupby('level_0').x.rolling(2).sum().index)

'''
MultiIndex([('a', 0),
            ('a', 1),
            ('a', 2),
            ('a', 3),
            ('a', 4)],
           names=['level_0', None])
'''

# this works!
df['new_col'] = df.reset_index().groupby('level_0').x.rolling(2).sum()

Totally possible there is a specific difference in grouping by named columns versus using level param that I do not understand; but I have always understood them to have been syntactic sugar for functionally equivalent operations. Happy to submit a documentation update PR if this is my misunderstanding. 🙏

Installed Versions

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.11.8.final.0
python-bits           : 64
OS                    : Darwin
OS-release            : 23.1.0
Version               : Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : None
LOCALE                : None.UTF-8

pandas                : 2.2.2
numpy                 : 1.26.3
pytz                  : 2023.3.post1
dateutil              : 2.8.2
setuptools            : 68.0.0
pip                   : 23.2.1
Cython                : None
pytest                : 7.4.0
hypothesis            : None
...
zstandard             : 0.19.0
tzdata                : 2023.3
qtpy                  : 2.4.1
pyqt5                 : None
@drmarshall drmarshall added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2024
@drmarshall drmarshall changed the title BUG: GroupByRolling creates additional MultiIndex level which prevents assignment on parent BUG: GroupByRolling duplicates MultiIndex levels, which prevents assignment on parent Apr 26, 2024
@rhshadrach
Copy link
Member

Thanks for the report! I think this is a duplicate of #51751, which generally speaking, is a discussion on how to treat the index on a rolling operation. Can you take a look at that issue and provide any feedback there?

Closing as a duplicate for now.

@rhshadrach rhshadrach added Window rolling, ewma, expanding Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

2 participants