-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Remove null values before sorting during groupby nunique calculation #27951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Remove null values before sorting during groupby nunique calculation #27951
Conversation
Does this completely close #27904? Or should it remain open for the case of |
In the case of |
Just one NA value I think. |
And to be clear, we can leave handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for the dropna=True case. Will leave the issue open for dropna=False, which seems broken still.
Actually... I'm going to push this off the 0.25.1 milestone (sorry @MarcoGorelli) Can you check a few things:
|
Sorry for the delay - yes, I agree with your decision, Tom :) I'll pick this up again next week |
7018a8a
to
4445c02
Compare
Hello @MarcoGorelli! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-09-06 15:50:47 UTC |
As noted as a comment in #27904, this works fine with numpy's own |
Haven't looked deeply yet but does this also impact Series and DataFrame methods? The location of the change would imply so hence my reason for asking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment. Can you also add a whatsnew for v1.0.0
Thanks for your review! Sure, whatsnew entry added. If I've understood your question, then the |
Ah misread the file location but makes sense so thanks for confirming. Should be able to review more deeply over the next few days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for delay in review. Looks pretty good some general comments
@@ -1143,6 +1143,9 @@ def nunique(self, dropna=True): | |||
|
|||
val = self.obj._internal_get_values() | |||
|
|||
# GH 27951 | |||
val[isna(val)] = np.datetime64("NaT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the actual root of the issue is a bug in NumPy as described by @TomAugspurger where NaT values are not sorted as you'd expected
So I think this works for now but maybe add a comment about NumPy bug 12629 for reference
@@ -1015,6 +1025,81 @@ def test_nunique_with_timegrouper(): | |||
tm.assert_series_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parametrization here is pretty repetitive, though I realize that you have three items at a time being sent through to keep the expectation different across each.
Is there a way to more succinctly parametrize though? It's rather difficult to read this and find what's expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - have modified it so the DataFrame is constructed within the test
6538f5e
to
9b09add
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - @TomAugspurger mind another quick look?
Thanks @MarcoGorelli! Can you confirm that this fixed all of #27904, or are there outstanding tasks? |
That's right - thanks for having taken the time to review my work! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff