-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: NaN should have pct rank of NaN #22600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @gfyoung! Thanks for submitting the PR.
|
pandas/_libs/groupby_helper.pxi.in
Outdated
@@ -584,7 +584,10 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out, | |||
|
|||
if pct: | |||
for i in range(N): | |||
out[i, 0] = out[i, 0] / grp_sizes[i, 0] | |||
if out[i, 0] != out[i, 0] or out[i, 0] == NAN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this all be simplified to just if out[i, 0] != NAN:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think those are semantically equivalent. Did you mean out[i, 0] == NAN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was only do the division if out[i, 0] != NAN
otherwise leave as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha! 🙂 Let's try it and see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd : Good idea, but unfortunately, the tests I added disagree with it. You need both conditionals when checking. Thus, this code needs to stay as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK I got you. Is it a particular type that's failing?
For some reason I thought some of the work @realead was doing was supposed to remove the need for comparisons like out[i, 0 ] != out[i, 0]
to figure out if a value was NA though it's entirely possible I have misunderstood that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "right" handling of NA would only apply to algorithms using hash-map, which is here not the case here if I see it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on what is going on here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
0ac5ae9
to
3f3f30b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment on whatsnew but otherwise lgtm
doc/source/whatsnew/v0.23.5.txt
Outdated
@@ -23,6 +23,9 @@ Fixed Regressions | |||
- Constructing a DataFrame with an index argument that wasn't already an | |||
instance of :class:`~pandas.core.Index` was broken in `4efb39f | |||
<https://github.com/pandas-dev/pandas/commit/4efb39f01f5880122fa38d91e12d217ef70fad9e>`_ (:issue:`22227`). | |||
- Calling :meth:`DataFrameGroupBy.rank` and :meth:`SeriesGroupBy.rank` with empty groups | |||
and ``pct=True`` was broken in `c1068d9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we typically reference commits where regressions occurred in whatsnew notes? Would think it better to just call out the ZeroDivisionError
instead of the commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just following what was done above. This is a relatively new addition to whatsnew
, but I don't any reason to buck this trend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Ultimately indifferent on the commit reference though I think calling out the ZeroDivisionError
is much more useful of an indicator when either googling or looking at the whatsnew to see what has actually changed (rather than clicking through to issue or commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'll let the CI run its course and then make the addition.
3f3f30b
to
ad4166f
Compare
Codecov Report
@@ Coverage Diff @@
## master #22600 +/- ##
=======================================
Coverage 92.05% 92.05%
=======================================
Files 169 169
Lines 50783 50783
=======================================
Hits 46749 46749
Misses 4034 4034
Continue to review full report at Codecov.
|
ad4166f
to
5cffe39
Compare
Circle failure is related to Hypothesis timeout, which is not related to my PR. cc @jreback |
pandas/_libs/groupby_helper.pxi.in
Outdated
@@ -584,7 +584,10 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out, | |||
|
|||
if pct: | |||
for i in range(N): | |||
out[i, 0] = out[i, 0] / grp_sizes[i, 0] | |||
if out[i, 0] != out[i, 0] or out[i, 0] == NAN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on what is going on here
5cffe39
to
b904ec2
Compare
@jreback : Made the requested change and all is green. PTAL. |
thanks @gfyoung I think this will backport cleanly. |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
@jreback : The |
Closes #22519.