Fixes grouped rank bug with nullable types. #54460

Sai-Suraj-27 · 2023-08-08T13:13:16Z

closes BUG: Grouped rank incorrect behaviour with nullable types #54206
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…grouped-rank-bug

rhshadrach

Thanks for the PR! Can you also put a note in the whatsnew for 2.1 under the groupby subsection of bugfixes.

pandas/core/arrays/masked.py

pandas/tests/groupby/test_rank.py

…grouped-rank-bug

Sai-Suraj-27 · 2023-08-15T11:54:07Z

@rhshadrach sir, I have made all the suggested changes, can you once quickly review.

doc/source/whatsnew/v2.1.0.rst

pandas/tests/groupby/test_rank.py

Sai-Suraj-27 · 2023-08-24T15:07:11Z

@rhshadrach sir, I have made all the suggested changes, and all the tests have passed. Please check now, Thank you.

rhshadrach

Thanks for the update! Sorry - a few things I missed.

rhshadrach · 2023-08-29T09:49:28Z

doc/source/whatsnew/v2.1.0.rst

@@ -783,6 +783,7 @@ Plotting
 Groupby/resample/rolling
 ^^^^^^^^^^^^^^^^^^^^^^^^
 - Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmax` returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (:issue:`51423`)
+- Bug in :meth:`DataFrame.groupby` when using the rank function on nullable datatypes when passing ``na_option="bottom"`` or ``na_option="top"`` (:issue:`54206`)


Can you do :meth:`DataFrameGroupBy.rank` here instead. Then no need to say "when using the rank function". This gives the reader a direct link to the method.

rhshadrach · 2023-08-29T09:52:23Z

pandas/core/arrays/masked.py

@@ -1492,6 +1492,9 @@ def _groupby_op(
        else:
            result_mask = np.zeros(ngroups, dtype=bool)

+        if op.how == "rank" and kwargs.get("na_option") in ["top", "bottom"]:


nit: can just use how here instead of op.how. It avoids an attribute access, and relies less on the state of WrappedCythonOp.

rhshadrach · 2023-08-29T09:53:06Z

pandas/tests/groupby/test_rank.py

+    result_ext = df_ext.groupby("x", dropna=False)["x"].rank(
+        method="min", na_option=na_option
+    )
+    expected_result_ext = Series([1.0], dtype="float64", name=result_ext.name)
+    tm.assert_series_equal(result_ext, expected_result_ext, check_dtype=False)


Why the suffix _ext here? We typically just call these df, result, and expected. I think it's best to stick to that (unless there is a reason not to).

Also - why check_dtype=False?

@rhshadrach sir, I made all the suggested changes. The reason for adding check_dtype=False was that initially the test was failing.

The reason for adding check_dtype=False was that initially the test was failing.

If the issue with the dtype is result (and not caused by this PR), then we should report a bug. If the issue is with expected, then we should change the dtype of expected to the correct one and remove check_dtype=False.

@Sai-Suraj-27: Friendly ping here; I think this is the last bit and then we're all set to go.

@rhshadrach sir, Sorry i missed this notification somehow, just saw it again today! I have removed the check_dtype=False condition. let's see if all the tests pass, Please give your review, Thank you.

…grouped-rank-bug

…7/pandas into fix-grouped-rank-bug

…grouped-rank-bug

Sai-Suraj-27 · 2023-09-21T05:31:19Z

@rhshadrach sir, I made the required changes and all the tests are passing, Can you please check now, Thank you.

rhshadrach

lgtm

rhshadrach · 2023-09-22T02:31:01Z

Thanks @Sai-Suraj-27

Sai-Suraj-27 added 4 commits August 8, 2023 18:38

Fix grouped rank bug with nullable types, and added a test.

41d4419

Updated the test function to remove errors.

ce3b5d8

updated files with pre-commit run.

fc3ac9f

Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…

6a425f1

…grouped-rank-bug

mroeschke requested a review from rhshadrach August 8, 2023 20:31

mroeschke added Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Aug 8, 2023

rhshadrach requested changes Aug 14, 2023

View reviewed changes

pandas/core/arrays/masked.py Outdated Show resolved Hide resolved

pandas/tests/groupby/test_rank.py Outdated Show resolved Hide resolved

rhshadrach added Bug Transformations e.g. cumsum, diff, rank labels Aug 14, 2023

Sai-Suraj-27 added 2 commits August 14, 2023 22:29

Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…

63faab6

…grouped-rank-bug

Removed unnecessary comments, and updated about bug fix in what's new.

41f4529

Sai-Suraj-27 requested review from MarcoGorelli, WillAyd, datapythonista and mroeschke as code owners August 18, 2023 17:11

Sai-Suraj-27 requested a review from rhshadrach August 18, 2023 17:52

Sai-Suraj-27 added 3 commits August 18, 2023 23:33

Revert last two commits

566cf2d

Merge branch 'main' into fix-grouped-rank-bug

1551301

Made all the requested changes.

313219c

rhshadrach requested changes Aug 18, 2023

View reviewed changes

doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved

pandas/tests/groupby/test_rank.py Outdated Show resolved Hide resolved

Sai-Suraj-27 added 8 commits August 19, 2023 15:13

Merge branch 'main' into fix-grouped-rank-bug

0a2d569

Edited the note in whatsnew 2.1 and also added test for 'top'.

476a9b6

Fixed small pre-commit errors.

676d975

fixed the error in tests.

9a4215b

Merge branch 'main' into fix-grouped-rank-bug

2b7ae59

updated the test to add both 'top' and 'bottom'.

091ce9f

Merge branch 'main' into fix-grouped-rank-bug

18efb62

Merge branch 'main' into fix-grouped-rank-bug

2251f7b

Sai-Suraj-27 requested a review from rhshadrach August 22, 2023 06:35

Merge branch 'main' into fix-grouped-rank-bug

a7b8f79

Sai-Suraj-27 added 2 commits August 25, 2023 00:13

Merge branch 'main' into fix-grouped-rank-bug

5cf2b59

Merge branch 'main' into fix-grouped-rank-bug

254d8d0

rhshadrach requested changes Aug 29, 2023

View reviewed changes

Sai-Suraj-27 added 8 commits August 29, 2023 22:32

Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…

8961c61

…grouped-rank-bug

Merge branch 'fix-grouped-rank-bug' of https://github.com/Sai-Suraj-2…

31832f1

…7/pandas into fix-grouped-rank-bug

Renamed few variables and updated what's new.

59d8b25

Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…

22e9239

…grouped-rank-bug

Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…

65e1c45

…grouped-rank-bug

Removed the check_dtype=False option in tests.

1e22b7e

Changed the dtype of expected to Float64.

85627e6

Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…

e285efa

…grouped-rank-bug

Merge branch 'main' into fix-grouped-rank-bug

5a63a11

rhshadrach approved these changes Sep 22, 2023

View reviewed changes

rhshadrach merged commit b43d79d into pandas-dev:main Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes grouped rank bug with nullable types. #54460

Fixes grouped rank bug with nullable types. #54460

Sai-Suraj-27 commented Aug 8, 2023 •

edited

Loading

rhshadrach left a comment

Sai-Suraj-27 commented Aug 15, 2023

Sai-Suraj-27 commented Aug 24, 2023

rhshadrach left a comment

rhshadrach Aug 29, 2023

rhshadrach Aug 29, 2023

rhshadrach Aug 29, 2023

rhshadrach Aug 29, 2023

Sai-Suraj-27 Aug 30, 2023

rhshadrach Sep 1, 2023

rhshadrach Sep 6, 2023

Sai-Suraj-27 Sep 19, 2023

Sai-Suraj-27 commented Sep 21, 2023

rhshadrach left a comment

rhshadrach commented Sep 22, 2023

Fixes grouped rank bug with nullable types. #54460

Fixes grouped rank bug with nullable types. #54460

Conversation

Sai-Suraj-27 commented Aug 8, 2023 • edited Loading

rhshadrach left a comment

Choose a reason for hiding this comment

Sai-Suraj-27 commented Aug 15, 2023

Sai-Suraj-27 commented Aug 24, 2023

rhshadrach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sai-Suraj-27 commented Sep 21, 2023

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach commented Sep 22, 2023

Sai-Suraj-27 commented Aug 8, 2023 •

edited

Loading