BUG: NaN should have pct rank of NaN #22600

gfyoung · 2018-09-05T06:35:26Z

pep8speaks · 2018-09-05T06:35:28Z

Hello @gfyoung! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/tests/groupby/test_rank.py !

doc/source/whatsnew/v0.23.5.txt

WillAyd · 2018-09-05T15:42:06Z

pandas/_libs/groupby_helper.pxi.in

@@ -584,7 +584,10 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,

        if pct:
            for i in range(N):
-                out[i, 0] = out[i, 0] / grp_sizes[i, 0]
+                if out[i, 0] != out[i, 0] or out[i, 0] == NAN:


Can this all be simplified to just if out[i, 0] != NAN:?

I don't think those are semantically equivalent. Did you mean out[i, 0] == NAN ?

What I meant was only do the division if out[i, 0] != NAN otherwise leave as is

Ah, gotcha! 🙂 Let's try it and see what happens.

@WillAyd : Good idea, but unfortunately, the tests I added disagree with it. You need both conditionals when checking. Thus, this code needs to stay as is.

Hmm OK I got you. Is it a particular type that's failing?

For some reason I thought some of the work @realead was doing was supposed to remove the need for comparisons like out[i, 0 ] != out[i, 0] to figure out if a value was NA though it's entirely possible I have misunderstood that

The "right" handling of NA would only apply to algorithms using hash-map, which is here not the case here if I see it correctly.

can you add a comment on what is going on here

WillAyd

minor comment on whatsnew but otherwise lgtm

WillAyd · 2018-09-05T21:18:52Z

doc/source/whatsnew/v0.23.5.txt

@@ -23,6 +23,9 @@ Fixed Regressions
 - Constructing a DataFrame with an index argument that wasn't already an
  instance of :class:`~pandas.core.Index` was broken in `4efb39f
  <https://github.com/pandas-dev/pandas/commit/4efb39f01f5880122fa38d91e12d217ef70fad9e>`_ (:issue:`22227`).
+- Calling :meth:`DataFrameGroupBy.rank` and :meth:`SeriesGroupBy.rank` with empty groups
+  and ``pct=True`` was broken in `c1068d9


Do we typically reference commits where regressions occurred in whatsnew notes? Would think it better to just call out the ZeroDivisionError instead of the commit

Just following what was done above. This is a relatively new addition to whatsnew, but I don't any reason to buck this trend.

Gotcha. Ultimately indifferent on the commit reference though I think calling out the ZeroDivisionError is much more useful of an indicator when either googling or looking at the whatsnew to see what has actually changed (rather than clicking through to issue or commit)

That's fair. I'll let the CI run its course and then make the addition.

codecov · 2018-09-06T19:40:25Z

Codecov Report

Merging #22600 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22600   +/-   ##
=======================================
  Coverage   92.05%   92.05%           
=======================================
  Files         169      169           
  Lines       50783    50783           
=======================================
  Hits        46749    46749           
  Misses       4034     4034

Flag	Coverage Δ
#multiple	`90.46% <ø> (ø)`	⬆️
#single	`42.3% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70c9003...b904ec2. Read the comment docs.

gfyoung · 2018-09-07T00:22:18Z

Circle failure is related to Hypothesis timeout, which is not related to my PR.

cc @jreback

jreback · 2018-09-07T12:16:26Z

pandas/_libs/groupby_helper.pxi.in

@@ -584,7 +584,10 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,

        if pct:
            for i in range(N):
-                out[i, 0] = out[i, 0] / grp_sizes[i, 0]
+                if out[i, 0] != out[i, 0] or out[i, 0] == NAN:


can you add a comment on what is going on here

Closes pandas-devgh-22519.

gfyoung · 2018-09-07T21:34:47Z

@jreback : Made the requested change and all is green. PTAL.

jreback · 2018-09-08T02:27:34Z

thanks @gfyoung I think this will backport cleanly.

lumberbot-app · 2018-09-08T02:28:06Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

$ git checkout 0.23.x
$ git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

$ git cherry-pick -m1 e6843c4b9754ae149cc6ff5cd58db05138327b74

You will likely have some merge/cherry-pick conflict here, fix them and commit:

$ git commit -am 'Backport PR #22600: BUG: NaN should have pct rank of NaN'

Push to a named branch :

git push YOURFORK 0.23.x:auto-backport-of-pr-22600-on-0.23.x

Create a PR against branch 0.23.x, I would have named this PR:

"Backport PR #22600 on branch 0.23.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

gfyoung · 2018-09-08T04:37:29Z

@jreback : The cherry-pick is not cooperating. I'll go backport it myself.

Backport of pandas-devgh-22600.

Backport of gh-22600.

Closes pandas-devgh-22519.

gfyoung added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version labels Sep 5, 2018

gfyoung added this to the 0.23.5 milestone Sep 5, 2018

WillAyd requested changes Sep 5, 2018

View reviewed changes

gfyoung force-pushed the group-rank-nan branch 2 times, most recently from 0ac5ae9 to 3f3f30b Compare September 5, 2018 19:18

WillAyd approved these changes Sep 5, 2018

View reviewed changes

gfyoung force-pushed the group-rank-nan branch from 3f3f30b to ad4166f Compare September 6, 2018 19:40

gfyoung force-pushed the group-rank-nan branch from ad4166f to 5cffe39 Compare September 6, 2018 19:56

jreback requested changes Sep 7, 2018

View reviewed changes

BUG: NaN should have pct rank of NaN

b904ec2

Closes pandas-devgh-22519.

gfyoung force-pushed the group-rank-nan branch from 5cffe39 to b904ec2 Compare September 7, 2018 16:41

jreback approved these changes Sep 8, 2018

View reviewed changes

jreback merged commit e6843c4 into pandas-dev:master Sep 8, 2018

lumberbot-app bot added the Still Needs Manual Backport label Sep 8, 2018

gfyoung deleted the group-rank-nan branch September 8, 2018 04:37

gfyoung removed the Still Needs Manual Backport label Sep 8, 2018

gfyoung added a commit to forking-repos/pandas that referenced this pull request Sep 8, 2018

BUG: NaN should have pct rank of NaN

d12f1d0

Backport of pandas-devgh-22600.

gfyoung mentioned this pull request Sep 8, 2018

BUG: NaN should have pct rank of NaN #22634

Merged

gfyoung added a commit to forking-repos/pandas that referenced this pull request Sep 11, 2018

BUG: NaN should have pct rank of NaN

349e939

Backport of pandas-devgh-22600.

gfyoung added a commit that referenced this pull request Sep 11, 2018

BUG: NaN should have pct rank of NaN (#22634)

af7b0ba

Backport of gh-22600.

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018

BUG: NaN should have pct rank of NaN (pandas-dev#22600)

ec1f7eb

Closes pandas-devgh-22519.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: NaN should have pct rank of NaN (pandas-dev#22600)

65fb037

Closes pandas-devgh-22519.

Uh oh!

BUG: NaN should have pct rank of NaN #22600

BUG: NaN should have pct rank of NaN #22600

Uh oh!

Conversation

gfyoung commented Sep 5, 2018

Uh oh!

pep8speaks commented Sep 5, 2018

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung Sep 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfyoung commented Sep 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gfyoung commented Sep 7, 2018

Uh oh!

jreback commented Sep 8, 2018

Uh oh!

lumberbot-app bot commented Sep 8, 2018

Uh oh!

gfyoung commented Sep 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gfyoung Sep 5, 2018 •

edited

Loading

codecov bot commented Sep 6, 2018 •

edited

Loading

gfyoung commented Sep 8, 2018 •

edited

Loading