BUG: SeriesGroupBy.value_counts - index name missing in categorical columns #45625

NumberPiOso · 2022-01-25T22:29:57Z

closes BUG: SeriesGroupBy.value_counts - index name missing when applied on categorical column #44324
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Value counts tend to preserve index names pandas-dev#45625 Change test test_sorting_with_different_categoricals to comply to this change

mroeschke · 2022-01-30T18:37:31Z

pandas/core/groupby/generic.py

        def apply_series_value_counts():
-            return self.apply(


Could you take this opportunity to refactor and inline this logic like

if (bins is not None and np.iterable(bins) or is_categorical_dtype(val.dtype): # comments s = self.apply(...) s.index.names = ...

Also will s always have a MultiIndex here?

If I run the test code with the following change

if is_categorical_dtype(val.dtype) or (bins and np.iterable(bins)): # scalar bins cannot be done at top level # in a backward compatible way # GH38672 s = self.apply( Series.value_counts, normalize=normalize, sort=sort, ascending=ascending, bins=bins, ) print(s.index) s.index.names = names

I get these results

df = pd.DataFrame( { "gender": ["female"], "country": ["US"], } ) df["gender"] = df["gender"].astype("category") result2 = df.groupby("country")["gender"].value_counts() MultiIndex([('US', 'female')], names=['country', None])

And I think the fact that we are inside a group by should always ensure that we are always using MultiIndex.

Running the following code calls a different method. Specifically, pandas.core.algorithms.value_counts.

pandas/core/groupby/generic.py

In line with 44324

…at-col

jbrockmendel · 2022-02-01T04:54:47Z

pandas/core/groupby/generic.py

+        if is_categorical_dtype(val.dtype) or (bins and not np.iterable(bins)):
+            # scalar bins cannot be done at top level
+            # in a backward compatible way
+            # GH38672


can you clarify that GH#38672 refers to the categorical case

jbrockmendel · 2022-02-01T04:55:01Z

pandas/core/groupby/generic.py

-            # GH38672
-            return apply_series_value_counts()
+            s.index.names = names
+            return s


nitpick: s -> ser

…at-col

NumberPiOso · 2022-01-31T14:24:09Z

pandas/core/groupby/generic.py

-            return self.apply(
+        names = self.grouper.names + [self.obj.name]
+
+        if is_categorical_dtype(val.dtype) or (bins and np.iterable(bins)):


I found this way of refactoring the conditionals very Pythonic, let me know what you think. @mroeschke

This looks okay but looks like tests are still failing

Indeed, apparently I introduced the error there

> if is_categorical_dtype(val.dtype) or (bins and not np.iterable(bins)): > ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

pandas/core/groupby/generic.py

…at-col

rhshadrach

lgtm

mroeschke · 2022-02-05T23:19:35Z

Thanks @NumberPiOso

…olumns (pandas-dev#45625) * BUG: SeriesGroupBy.value_counts index name missing Issue pandas-dev#44324 * TST: Change test to correct categorical naming Value counts tend to preserve index names pandas-dev#45625 Change test test_sorting_with_different_categoricals to comply to this change * REF: Refactor conditionals in value_counts() * RFT: correct mistake introduced via RFT In line with 44324 * RFT: Change variable names and comment pandas-dev#38672 * BUG: Update conditional to is None to consider series

NumberPiOso added 4 commits January 25, 2022 17:25

BUG: SeriesGroupBy.value_counts index name missing

32cf991

Issue pandas-dev#44324

Merge branch 'main' into index-value-count-cat-col

4ae936e

TST: Change test to correct categorical naming

8e8b37c

Value counts tend to preserve index names pandas-dev#45625 Change test test_sorting_with_different_categoricals to comply to this change

Merge branch 'main' into index-value-count-cat-col

337d220

mroeschke reviewed Jan 30, 2022

View reviewed changes

mroeschke added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Apply Apply, Aggregate, Transform, Map labels Jan 30, 2022

NumberPiOso added 2 commits January 31, 2022 09:08

REF: Refactor conditionals in value_counts()

7692b36

Merge branch 'main' into index-value-count-cat-col

b26d0d6

rhshadrach reviewed Jan 31, 2022

View reviewed changes

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved

pandas/core/groupby/generic.py Show resolved Hide resolved

NumberPiOso added 2 commits January 31, 2022 19:19

RFT: correct mistake introduced via RFT

d81d7b9

In line with 44324

Merge remote-tracking branch 'upstream/main' into index-value-count-c…

9cdc5d8

…at-col

jbrockmendel reviewed Feb 1, 2022

View reviewed changes

NumberPiOso added 2 commits February 1, 2022 08:14

RFT: Change variable names and comment pandas-dev#38672

9d561b9

Merge remote-tracking branch 'upstream/main' into index-value-count-c…

55622f3

…at-col

NumberPiOso commented Feb 1, 2022

View reviewed changes

NumberPiOso added 2 commits February 2, 2022 18:26

Merge remote-tracking branch 'upstream/main' into index-value-count-c…

4bedca5

…at-col

BUG: Update conditional to is None to consider series

49a375d

NumberPiOso requested a review from mroeschke February 3, 2022 15:12

mroeschke approved these changes Feb 3, 2022

View reviewed changes

NumberPiOso requested a review from rhshadrach February 5, 2022 13:53

rhshadrach added Bug Groupby Series Series data structure Categorical Categorical Data Type and removed Apply Apply, Aggregate, Transform, Map labels Feb 5, 2022

rhshadrach added this to the 1.5 milestone Feb 5, 2022

rhshadrach approved these changes Feb 5, 2022

View reviewed changes

mroeschke merged commit c80b145 into pandas-dev:main Feb 5, 2022

NumberPiOso deleted the index-value-count-cat-col branch February 6, 2022 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: SeriesGroupBy.value_counts - index name missing in categorical columns #45625

BUG: SeriesGroupBy.value_counts - index name missing in categorical columns #45625

NumberPiOso commented Jan 25, 2022

mroeschke Jan 30, 2022

NumberPiOso Jan 31, 2022

jbrockmendel Feb 1, 2022

jbrockmendel Feb 1, 2022

NumberPiOso Jan 31, 2022

mroeschke Feb 2, 2022

NumberPiOso Feb 3, 2022

rhshadrach left a comment

mroeschke commented Feb 5, 2022

BUG: SeriesGroupBy.value_counts - index name missing in categorical columns #45625

BUG: SeriesGroupBy.value_counts - index name missing in categorical columns #45625

Conversation

NumberPiOso commented Jan 25, 2022

mroeschke Jan 30, 2022

Choose a reason for hiding this comment

NumberPiOso Jan 31, 2022

Choose a reason for hiding this comment

jbrockmendel Feb 1, 2022

Choose a reason for hiding this comment

jbrockmendel Feb 1, 2022

Choose a reason for hiding this comment

NumberPiOso Jan 31, 2022

Choose a reason for hiding this comment

mroeschke Feb 2, 2022

Choose a reason for hiding this comment

NumberPiOso Feb 3, 2022

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

mroeschke commented Feb 5, 2022