BUG: Avoid ambiguous condition in GroupBy.first / last #32124

dsaxton · 2020-02-20T01:37:52Z

closes BUG: GroupBy.first fails with pd.NA on Series with object dtype #32123
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This comparison val == val happens in a lot of these groupby operations but only seems to raise here in the presence of NA. Are we just always / mostly converting to numpy beforehand in the other cases?

WillAyd

cool thanks for PR

WillAyd · 2020-02-20T01:54:32Z

pandas/_libs/groupby.pyx

@@ -887,7 +889,7 @@ def group_last(rank_t[:, :] out,
            for j in range(K):
                val = values[i, j]

-                if val == val:
+                if (val is not NA) and (val == val):


I think can use the checknull function from missing instead

Maybe just not checknull(val) for the whole condition? I think there was an issue with checknull not catching decimal.Decimal("nan") but I could try to add that here

WillAyd · 2020-02-20T01:55:22Z

pandas/tests/groupby/test_nth.py

+@pytest.mark.parametrize("method", ["first", "last"])
+def test_first_last_with_na_object(method):
+    # https://github.com/pandas-dev/pandas/issues/32123
+    groups = pd.DataFrame({"a": [1, 1, 2, 2], "b": [1, 2, 3, pd.NA]}).groupby("a")


Should also use nulls_fixture here after #31799

Can you merge master? This should be available now

jbrockmendel · 2020-02-20T02:39:14Z

#31227 turned up a bunch of other places where having pd.NA in columns/index is going to break the world. the is_matching_na implemented there might be useful here

jprafael · 2020-02-20T11:13:21Z

pandas/tests/groupby/test_nth.py

+    if method == "first":
+        values = {"b": [1, 3]}
+    else:
+        values = {"b": [2, 3]}


This raises a question (that perhaps should be handled in another issue).
GroupBy.last() returns [2, 3] however GroupBy.tail(1) returns [2, pd.NA].
Is this intended behaviour? it is consistent with pandas 0.25, but undocumented.

It seems that last gives the last non-null value but that doesn't seem to be well-documented like you say (probably does warrant a separate issue)

simonjayhawkins · 2020-02-21T16:39:56Z

@dsaxton can you merge master to resolve conflicts

jreback · 2020-02-22T15:31:18Z

pandas/tests/groupby/test_nth.py

@@ -530,3 +530,23 @@ def test_nth_nan_in_grouper(dropna):
    )

    tm.assert_frame_equal(result, expected)
+
+
+@pytest.mark.parametrize("method", ["first", "last"])


can you also test .nth(0) and .nth(-1) which are the same results (except the nth -1 will have the null as the result

Added a test for nth with nulls; I gave it its own test rather than putting it with first / last to avoid too much awkward if / else branching inside the test

jreback · 2020-02-22T15:32:26Z

pandas/tests/groupby/test_nth.py

+@pytest.mark.parametrize("method", ["first", "last"])
+def test_first_last_with_na_object(method, nulls_fixture):
+    # https://github.com/pandas-dev/pandas/issues/32123
+    groups = pd.DataFrame({"a": [1, 1, 2, 2], "b": [1, 2, 3, nulls_fixture]}).groupby(


also move this belowt est_first_last_nth (it may actually be simpler to modify that test though)

jreback · 2020-02-23T14:59:04Z

thanks @dsaxton

jreback · 2020-02-23T14:59:43Z

@meeseeksdev backport to 1.0.x

…By.first / last

… last (#32199) Co-authored-by: Daniel Saxton <[email protected]>

)

dsaxton added 3 commits February 19, 2020 19:28

Add test

627410d

Check for NA

bb4ad11

Update whatsnew

3213718

WillAyd requested changes Feb 20, 2020

View reviewed changes

dsaxton added 3 commits February 19, 2020 20:45

Use nulls_fixture

58bbb57

Use checknull

4f2d126

Merge branch 'master' into grpby-nth

c4a6518

jprafael reviewed Feb 20, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into grpby-nth

e09bbe6

simonjayhawkins added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 21, 2020

simonjayhawkins added this to the 1.0.2 milestone Feb 21, 2020

Merge remote-tracking branch 'upstream/master' into grpby-nth

c7c7a95

jreback requested changes Feb 22, 2020

View reviewed changes

dsaxton added 2 commits February 22, 2020 21:50

Merge remote-tracking branch 'upstream/master' into grpby-nth

4329021

Move and add test

87456af

jreback approved these changes Feb 23, 2020

View reviewed changes

jreback merged commit 20a84a5 into pandas-dev:master Feb 23, 2020

jreback added the Still Needs Manual Backport label Feb 23, 2020

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 23, 2020

Backport PR pandas-dev#32124: BUG: Avoid ambiguous condition in Group…

d13dbbf

…By.first / last

meeseeksmachine mentioned this pull request Feb 23, 2020

Backport PR #32124 on branch 1.0.x (BUG: Avoid ambiguous condition in GroupBy.first / last) #32199

Merged

jreback removed the Still Needs Manual Backport label Feb 23, 2020

dsaxton deleted the grpby-nth branch February 23, 2020 15:07

jreback pushed a commit that referenced this pull request Feb 23, 2020

Backport PR #32124: BUG: Avoid ambiguous condition in GroupBy.first /…

4444870

… last (#32199) Co-authored-by: Daniel Saxton <[email protected]>

roberthdevries pushed a commit to roberthdevries/pandas that referenced this pull request Mar 2, 2020

BUG: Avoid ambiguous condition in GroupBy.first / last (pandas-dev#32124

c4db4a9

)

JDkuba mentioned this pull request Apr 10, 2020

BUG: None converted to NaN after groupby first and last #33462

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Avoid ambiguous condition in GroupBy.first / last #32124

BUG: Avoid ambiguous condition in GroupBy.first / last #32124

dsaxton commented Feb 20, 2020 •

edited

Loading

WillAyd left a comment

WillAyd Feb 20, 2020

dsaxton Feb 20, 2020

WillAyd Feb 20, 2020

WillAyd Feb 20, 2020

jbrockmendel commented Feb 20, 2020

jprafael Feb 20, 2020

dsaxton Feb 20, 2020

simonjayhawkins commented Feb 21, 2020

jreback Feb 22, 2020

dsaxton Feb 23, 2020

jreback Feb 22, 2020

jreback commented Feb 23, 2020

jreback commented Feb 23, 2020

BUG: Avoid ambiguous condition in GroupBy.first / last #32124

BUG: Avoid ambiguous condition in GroupBy.first / last #32124

Conversation

dsaxton commented Feb 20, 2020 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Feb 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Feb 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 23, 2020

jreback commented Feb 23, 2020

dsaxton commented Feb 20, 2020 •

edited

Loading