Skip to content

BUG: groupby.nth reorders categories #49129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhshadrach opened this issue Oct 16, 2022 · 2 comments
Closed

BUG: groupby.nth reorders categories #49129

rhshadrach opened this issue Oct 16, 2022 · 2 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug Groupby Needs Discussion Requires discussion from core team before further action

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Oct 16, 2022

df = pd.DataFrame(
    {"x": pd.Categorical([4, 4, 2, 2, 4], categories=[1, 2, 3, 4]), "y": range(5)}
)

gb_ait = df.groupby('x', observed=True, sort=False, as_index=True)
gb_aif = df.groupby('x', observed=True, sort=False, as_index=False)

result1 = gb_ait.nth(0).reset_index()
result2 = gb_aif.nth(0)

print(result1["x"])
print()
print(result2["x"])

# 0    4
# 1    2
# Name: x, dtype: category
# Categories (4, int64): [4, 2, 1, 3]
# 
# 0    4
# 2    2
# Name: x, dtype: category
# Categories (4, int64): [1, 2, 3, 4]

While the values are the same, the order of the categories are not. I'm not certain which one here is correct - is the expectation that sort=False orders the categories by appearance?

@rhshadrach rhshadrach added Bug Groupby Needs Discussion Requires discussion from core team before further action API - Consistency Internal Consistency of API/Behavior labels Oct 16, 2022
@rhshadrach
Copy link
Member Author

Reducers seem to give order of appearance, transforms maintain the original order. But I've only tested a handful of reducers / transformers thus far.

Reducers
df = pd.DataFrame(
    {"x": pd.Categorical([4, 4, 2, 2, 4], categories=[1, 2, 4, 3]), "y": range(5)}
)

gb_ait = df.groupby('x', observed=True, sort=False, as_index=True)
gb_aif = df.groupby('x', observed=True, sort=False, as_index=False)

result1 = gb_ait.sum().reset_index()
result2 = gb_aif.sum()

print(result1["x"])
print()
print(result2["x"])

# 0    4
# 1    2
# Name: x, dtype: category
# Categories (4, int64): [4, 2, 1, 3]
# 
# 0    4
# 1    2
# Name: x, dtype: category
# Categories (4, int64): [4, 2, 1, 3]
Transformers
df = pd.DataFrame(
    {"x": pd.Categorical([4, 4, 2, 2, 4], categories=[1, 2, 4, 3]), "y": range(5)}
).set_index('x')

gb_ait = df.groupby('x', observed=True, sort=False, as_index=True)
gb_aif = df.groupby('x', observed=True, sort=False, as_index=False)

result1 = gb_ait.cumsum().reset_index()
result2 = gb_aif.cumsum().reset_index()

print(result1["x"])
print()
print(result2["x"])

# 0    4
# 1    4
# 2    2
# 3    2
# 4    4
# Name: x, dtype: category
# Categories (4, int64): [1, 2, 4, 3]
# 
# 0    4
# 1    4
# 2    2
# 3    2
# 4    4
# Name: x, dtype: category
# Categories (4, int64): [1, 2, 4, 3]

@rhshadrach
Copy link
Member Author

Duplicate of #48749

@rhshadrach rhshadrach marked this as a duplicate of #48749 Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Groupby Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

1 participant