-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Avoid ambiguous condition in GroupBy.first / last #32124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool thanks for PR
pandas/_libs/groupby.pyx
Outdated
@@ -887,7 +889,7 @@ def group_last(rank_t[:, :] out, | |||
for j in range(K): | |||
val = values[i, j] | |||
|
|||
if val == val: | |||
if (val is not NA) and (val == val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think can use the checknull
function from missing instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just not checknull(val)
for the whole condition? I think there was an issue with checknull
not catching decimal.Decimal("nan")
but I could try to add that here
pandas/tests/groupby/test_nth.py
Outdated
@pytest.mark.parametrize("method", ["first", "last"]) | ||
def test_first_last_with_na_object(method): | ||
# https://github.com/pandas-dev/pandas/issues/32123 | ||
groups = pd.DataFrame({"a": [1, 1, 2, 2], "b": [1, 2, 3, pd.NA]}).groupby("a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also use nulls_fixture here after #31799
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge master? This should be available now
#31227 turned up a bunch of other places where having pd.NA in columns/index is going to break the world. the |
pandas/tests/groupby/test_nth.py
Outdated
if method == "first": | ||
values = {"b": [1, 3]} | ||
else: | ||
values = {"b": [2, 3]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises a question (that perhaps should be handled in another issue).
GroupBy.last()
returns [2, 3]
however GroupBy.tail(1)
returns [2, pd.NA]
.
Is this intended behaviour? it is consistent with pandas 0.25, but undocumented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that last
gives the last non-null value but that doesn't seem to be well-documented like you say (probably does warrant a separate issue)
@dsaxton can you merge master to resolve conflicts |
pandas/tests/groupby/test_nth.py
Outdated
@@ -530,3 +530,23 @@ def test_nth_nan_in_grouper(dropna): | |||
) | |||
|
|||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("method", ["first", "last"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test .nth(0)
and .nth(-1)
which are the same results (except the nth -1 will have the null as the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for nth with nulls; I gave it its own test rather than putting it with first / last to avoid too much awkward if / else branching inside the test
pandas/tests/groupby/test_nth.py
Outdated
@pytest.mark.parametrize("method", ["first", "last"]) | ||
def test_first_last_with_na_object(method, nulls_fixture): | ||
# https://github.com/pandas-dev/pandas/issues/32123 | ||
groups = pd.DataFrame({"a": [1, 1, 2, 2], "b": [1, 2, 3, nulls_fixture]}).groupby( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also move this belowt est_first_last_nth (it may actually be simpler to modify that test though)
thanks @dsaxton |
@meeseeksdev backport to 1.0.x |
…By.first / last
… last (#32199) Co-authored-by: Daniel Saxton <[email protected]>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This comparison
val == val
happens in a lot of these groupby operations but only seems to raise here in the presence ofNA
. Are we just always / mostly converting to numpy beforehand in the other cases?