-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixes grouped rank bug with nullable types. #54460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes grouped rank bug with nullable types. #54460
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Can you also put a note in the whatsnew for 2.1 under the groupby subsection of bugfixes.
@rhshadrach sir, I have made all the suggested changes, can you once quickly review. |
@rhshadrach sir, I have made all the suggested changes, and all the tests have passed. Please check now, Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Sorry - a few things I missed.
doc/source/whatsnew/v2.1.0.rst
Outdated
@@ -783,6 +783,7 @@ Plotting | |||
Groupby/resample/rolling | |||
^^^^^^^^^^^^^^^^^^^^^^^^ | |||
- Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmax` returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (:issue:`51423`) | |||
- Bug in :meth:`DataFrame.groupby` when using the rank function on nullable datatypes when passing ``na_option="bottom"`` or ``na_option="top"`` (:issue:`54206`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do :meth:`DataFrameGroupBy.rank`
here instead. Then no need to say "when using the rank function". This gives the reader a direct link to the method.
pandas/core/arrays/masked.py
Outdated
@@ -1492,6 +1492,9 @@ def _groupby_op( | |||
else: | |||
result_mask = np.zeros(ngroups, dtype=bool) | |||
|
|||
if op.how == "rank" and kwargs.get("na_option") in ["top", "bottom"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can just use how
here instead of op.how
. It avoids an attribute access, and relies less on the state of WrappedCythonOp.
pandas/tests/groupby/test_rank.py
Outdated
result_ext = df_ext.groupby("x", dropna=False)["x"].rank( | ||
method="min", na_option=na_option | ||
) | ||
expected_result_ext = Series([1.0], dtype="float64", name=result_ext.name) | ||
tm.assert_series_equal(result_ext, expected_result_ext, check_dtype=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the suffix _ext
here? We typically just call these df
, result
, and expected
. I think it's best to stick to that (unless there is a reason not to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - why check_dtype=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach sir, I made all the suggested changes. The reason for adding check_dtype=False
was that initially the test was failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for adding
check_dtype=False
was that initially the test was failing.
If the issue with the dtype is result
(and not caused by this PR), then we should report a bug. If the issue is with expected
, then we should change the dtype of expected
to the correct one and remove check_dtype=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sai-Suraj-27: Friendly ping here; I think this is the last bit and then we're all set to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach sir, Sorry i missed this notification somehow, just saw it again today! I have removed the check_dtype=False
condition. let's see if all the tests pass, Please give your review, Thank you.
@rhshadrach sir, I made the required changes and all the tests are passing, Can you please check now, Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @Sai-Suraj-27 |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.