-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: SeriesGroupBy.value_counts - index name missing in categorical columns #45625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
32cf991
4ae936e
8e8b37c
337d220
7692b36
b26d0d6
d81d7b9
9cdc5d8
9d561b9
55622f3
4bedca5
49a375d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -602,23 +602,21 @@ def value_counts( | |
ids, _, _ = self.grouper.group_info | ||
val = self.obj._values | ||
|
||
def apply_series_value_counts(): | ||
return self.apply( | ||
names = self.grouper.names + [self.obj.name] | ||
|
||
if is_categorical_dtype(val.dtype) or (bins and np.iterable(bins)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found this way of refactoring the conditionals very Pythonic, let me know what you think. @mroeschke There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks okay but looks like tests are still failing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, apparently I introduced the error there > if is_categorical_dtype(val.dtype) or (bins and not np.iterable(bins)):
> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
rhshadrach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# scalar bins cannot be done at top level | ||
# in a backward compatible way | ||
# GH38672 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you clarify that GH#38672 refers to the categorical case |
||
s = self.apply( | ||
Series.value_counts, | ||
normalize=normalize, | ||
sort=sort, | ||
ascending=ascending, | ||
bins=bins, | ||
) | ||
|
||
if bins is not None: | ||
if not np.iterable(bins): | ||
# scalar bins cannot be done at top level | ||
# in a backward compatible way | ||
return apply_series_value_counts() | ||
elif is_categorical_dtype(val.dtype): | ||
# GH38672 | ||
return apply_series_value_counts() | ||
s.index.names = names | ||
return s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: s -> ser |
||
|
||
# groupby removes null keys from groupings | ||
mask = ids != -1 | ||
|
@@ -683,7 +681,6 @@ def apply_series_value_counts(): | |
levels = [ping.group_index for ping in self.grouper.groupings] + [ | ||
lev # type: ignore[list-item] | ||
] | ||
names = self.grouper.names + [self.obj.name] | ||
rhshadrach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if dropna: | ||
mask = codes[-1] != -1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you take this opportunity to refactor and inline this logic like
Also will s always have a MultiIndex here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I run the test code with the following change
I get these results
And I think the fact that we are inside a group by should always ensure that we are always using MultiIndex.
Running the following code calls a different method. Specifically,
pandas.core.algorithms.value_counts
.