BUG: DataFrameGroupBy.nunique fails with categorical, multiple groupings, and as_index=False #52848

rhshadrach · 2023-04-22T15:08:14Z

df = DataFrame({"a1": [0, 0, 1], "a2": [2, 3, 3], "b": [4, 5, 6]})
df = df.astype({"a1": "category", "a2": "category"})
gb = df.groupby(by=["a1", "a2"], as_index=False, observed=False)
gb.nunique()

raises ValueError: Length of values (3) does not match length of index (4)

The text was updated successfully, but these errors were encountered:

szczekulskij · 2023-04-23T01:17:43Z

This looks a bit hard, but I'll give it a try. If I won't be able to solve this within a week, I'll re-assign

szczekulskij · 2023-04-23T01:17:46Z

take

szczekulskij · 2023-04-24T10:00:57Z

Commenting out lines 1911-1913 solves issue. File: pandas/core/groupby/generic.py

        # if not self.as_index:
        #     res_df.index = default_index(len(res_df))
        #     res_df = self._insert_inaxis_grouper(res_df)

Gonna work on fixing it now.
Initial solution idea w. temp as_index didn't work : /

szczekulskij · 2023-04-24T10:37:23Z

Weirdly, It's the same function causing the issue as in #52397 that I'm working on

The name of function: self._insert_inaxis_grouper - in both issues we need to skip call to this function(or fix the way function works)

szczekulskij · 2023-04-24T11:22:50Z

Is it okay if I force as_index = True using with com.temp_setattr(self, "as_index", True): ?

With as_index=True forced, the output would always look like this:

I assume we want the behaviour to differ depending on as_index, but wanted to double check

@rhshadrach

rhshadrach · 2023-04-25T01:52:11Z

I assume we want the behaviour to differ depending on as_index, but wanted to double check

Correct. Running the OP example without categorical data produces:

   a1  a2  b
0   0   2  1
1   0   3  1
2   1   3  1

szczekulskij · 2023-05-08T12:47:37Z

Hey, I'm sorry but I'll need to drop this issue. Pandas is a bit too much for me right now, I'll come back to this in the future

rhshadrach added Bug Groupby Categorical Categorical Data Type Reduction Operations sum, mean, min, max, etc. labels Apr 22, 2023

github-actions bot assigned szczekulskij Apr 23, 2023

rhshadrach unassigned szczekulskij May 9, 2023

rhshadrach mentioned this issue Nov 12, 2023

REF: Compute complete result_index upfront in groupby #55738

Merged

11 tasks

mroeschke closed this as completed in #55738 Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrameGroupBy.nunique fails with categorical, multiple groupings, and as_index=False #52848

BUG: DataFrameGroupBy.nunique fails with categorical, multiple groupings, and as_index=False #52848

rhshadrach commented Apr 22, 2023

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 24, 2023 •

edited

Loading

szczekulskij commented Apr 24, 2023 •

edited

Loading

szczekulskij commented Apr 24, 2023

rhshadrach commented Apr 25, 2023

szczekulskij commented May 8, 2023

BUG: DataFrameGroupBy.nunique fails with categorical, multiple groupings, and as_index=False #52848

BUG: DataFrameGroupBy.nunique fails with categorical, multiple groupings, and as_index=False #52848

Comments

rhshadrach commented Apr 22, 2023

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 24, 2023 • edited Loading

szczekulskij commented Apr 24, 2023 • edited Loading

szczekulskij commented Apr 24, 2023

rhshadrach commented Apr 25, 2023

szczekulskij commented May 8, 2023

szczekulskij commented Apr 24, 2023 •

edited

Loading

szczekulskij commented Apr 24, 2023 •

edited

Loading