ENH: Bring back the observed argument for groupby on Categorical columns #55237

n-splv · 2023-09-22T08:59:25Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Hi!
The reasons for deprecation of this parameter are nowhere to be found, so I'm curious whether a discussion took place.

Feature Description

After upgrading to 2.1 I have to replace my old code

sample_distribution = df.sample(n).groupby(categorical_cols).size()

with this:

main_distribution = df.groupby(categorical_cols).size()
sample_distribution = df.sample(n).groupby(categorical_cols).size().reindex(main_distribution.index, fill_value=0)

Alternative Solutions

Changing the default value of observed to True is fine I guess, but the ability to use False was indeed convenient. Maybe we should bring it back?)

Additional Context

No response

The text was updated successfully, but these errors were encountered:

rhshadrach · 2023-09-22T09:19:27Z

There are no plans to remove the argument, it is only changing the default.

Is there something in the message that made you think it was going to be removed?

n-splv · 2023-09-22T16:48:42Z

@rhshadrach Apologies, the argument is working as intended.
Do you know what made me think that is doesn't? Watch:

data = {
    'col1': ['a', 'b', 'c'],
    'col2': [1, 2, 3],
}
df_ = pd.DataFrame(data)

df_.loc[:, 'col1'] = df_['col1'].astype('category')
df_.loc[:, 'col2'] = df_['col2'].astype('category')

df_.iloc[:2].groupby(['col1', 'col2'], observed=False).size()

col1  col2
a     1       1
      2       0
      3       0
b     1       0
      2       1
      3       0

The category c is missing, and I blamed the observed argument for it. But the real source of the problem is:

df_.dtypes

col1      object
col2    category
dtype: object

For some reason the .astype('category') syntax doesn't convert the object columns, and the worst part is that no warning is raised. Should I open a separate issue? As far as I remember this worked just fine in 1.5.3.

rhshadrach · 2023-09-23T00:09:57Z

I see - this is then a duplicate of #52593. In the meantime, when changing an entire column, everything works if you don't use .loc.

data = {
    'col1': ['a', 'b', 'c'],
    'col2': [1, 2, 3],
}
df_ = pd.DataFrame(data)

df_['col1'] = df_['col1'].astype('category')
df_.loc[:, 'col2'] = df_['col2'].astype('category')

result = df_.iloc[:2].groupby(['col1', 'col2'], observed=False).size()
print(result)
# col1  col2
# a     1       1
#       2       0
# b     1       0
#       2       1
# c     1       0
#       2       0
# dtype: int64

n-splv added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 22, 2023

rhshadrach closed this as completed Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Bring back the observed argument for groupby on Categorical columns #55237

ENH: Bring back the observed argument for groupby on Categorical columns #55237

n-splv commented Sep 22, 2023 •

edited

Loading

rhshadrach commented Sep 22, 2023 •

edited

Loading

n-splv commented Sep 22, 2023 •

edited

Loading

rhshadrach commented Sep 23, 2023

ENH: Bring back the observed argument for groupby on Categorical columns #55237

ENH: Bring back the observed argument for groupby on Categorical columns #55237

Comments

n-splv commented Sep 22, 2023 • edited Loading

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

rhshadrach commented Sep 22, 2023 • edited Loading

n-splv commented Sep 22, 2023 • edited Loading

rhshadrach commented Sep 23, 2023

n-splv commented Sep 22, 2023 •

edited

Loading

rhshadrach commented Sep 22, 2023 •

edited

Loading

n-splv commented Sep 22, 2023 •

edited

Loading