-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrameGroupBy.value_counts
includes non-observed categories of non-grouping columns
#46798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
95c141c
to
9b9ffc4
Compare
DataFrameGroupBy.value_counts
includes non-observed categories of non-grouping columnsDataFrameGroupBy.value_counts
includes non-observed categories of non-grouping columns
9b9ffc4
to
db359a2
Compare
This PR has not been reviewed for 1 month. @rhshadrach as you were involved in the issue description, would you like to have a look, or should I wait/ping for the review from a specific maintainer ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping @LucasG0; looks good. Some requests/questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment.
db359a2
to
ddf50d4
Compare
ddf50d4
to
b535f77
Compare
04c1654
to
0683e17
Compare
@jreback @rhshadrach do you have any other comments to add on this PR or should we consider merging it (after rebase) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, I'm not sure why the one test is still changing though.
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -634,7 +634,8 @@ Categorical | |||
^^^^^^^^^^^ | |||
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) | |||
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) | |||
- | |||
- Bug in :meth:`.DataFrameGroupBy.value_counts` includes non-observed categories of non-grouping columns regardless of ``observed`` (:issue:`46357`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok since this is changing non-trivially, can you make a sub-section (ok here is fine, where you show the prior and the fixed behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a sub-section in notable fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-section looks good; need to remove this line.
55cde73
to
fc8ea62
Compare
Unsuccessful job seems to be related to the flakyness of #45651 |
@jreback @rhshadrach Firstly. do we need the extensive release note. These are normally reserved for bug fixes that have a significant change in behavior that might be considered a breaking change by users. Also because |
no backport in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs two fixes from the whatsnew (some from me, one from Simon); otherwise looks good.
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -634,7 +634,8 @@ Categorical | |||
^^^^^^^^^^^ | |||
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) | |||
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) | |||
- | |||
- Bug in :meth:`.DataFrameGroupBy.value_counts` includes non-observed categories of non-grouping columns regardless of ``observed`` (:issue:`46357`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-section looks good; need to remove this line.
@rhshadrach I fixed the whatsnew |
Job failure is due to the unavailability of Github Actions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Apologies @LucasG0 - this fell off my radar. Can you resolve conflicts. @simonjayhawkins - are you good here? |
…-grouping columns
4cbbcea
to
fa3869c
Compare
No worries @rhshadrach, conflicts resolved. |
@simonjayhawkins - friendly ping |
My comments were related to whether this should have been a targeted backport or not. But I'm not going to block. |
@jreback - friendly ping |
thanks @LucasG0 this is very nice! and thanks @rhshadrach for all the work on this as well. |
… of non-grouping columns (pandas-dev#46798)
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.