Skip to content

Fixes grouped rank bug with nullable types. #54460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
41d4419
Fix grouped rank bug with nullable types, and added a test.
Sai-Suraj-27 Aug 8, 2023
ce3b5d8
Updated the test function to remove errors.
Sai-Suraj-27 Aug 8, 2023
fc3ac9f
updated files with pre-commit run.
Sai-Suraj-27 Aug 8, 2023
6a425f1
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 8, 2023
63faab6
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 14, 2023
41f4529
Removed unnecessary comments, and updated about bug fix in what's new.
Sai-Suraj-27 Aug 14, 2023
566cf2d
Revert last two commits
Sai-Suraj-27 Aug 18, 2023
1551301
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 18, 2023
313219c
Made all the requested changes.
Sai-Suraj-27 Aug 18, 2023
0a2d569
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 19, 2023
476a9b6
Edited the note in whatsnew 2.1 and also added test for 'top'.
Sai-Suraj-27 Aug 19, 2023
676d975
Fixed small pre-commit errors.
Sai-Suraj-27 Aug 19, 2023
9a4215b
fixed the error in tests.
Sai-Suraj-27 Aug 19, 2023
2b7ae59
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 19, 2023
091ce9f
updated the test to add both 'top' and 'bottom'.
Sai-Suraj-27 Aug 19, 2023
18efb62
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 21, 2023
2251f7b
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 22, 2023
a7b8f79
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 24, 2023
5cf2b59
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 24, 2023
254d8d0
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 25, 2023
8961c61
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 29, 2023
31832f1
Merge branch 'fix-grouped-rank-bug' of https://github.com/Sai-Suraj-2…
Sai-Suraj-27 Aug 29, 2023
59d8b25
Renamed few variables and updated what's new.
Sai-Suraj-27 Aug 30, 2023
22e9239
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 30, 2023
65e1c45
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Sep 19, 2023
1e22b7e
Removed the check_dtype=False option in tests.
Sai-Suraj-27 Sep 19, 2023
85627e6
Changed the dtype of expected to Float64.
Sai-Suraj-27 Sep 20, 2023
e285efa
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Sep 20, 2023
5a63a11
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Sep 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -783,6 +783,7 @@ Plotting
Groupby/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmax` returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (:issue:`51423`)
- Bug in :meth:`DataFrame.groupby` when using the rank function on nullable datatypes when passing ``na_option="bottom"`` or ``na_option="top"`` (:issue:`54206`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do :meth:`DataFrameGroupBy.rank` here instead. Then no need to say "when using the rank function". This gives the reader a direct link to the method.

- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` in incorrectly allowing non-fixed ``freq`` when resampling on a :class:`TimedeltaIndex` (:issue:`51896`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` losing time zone when resampling empty data (:issue:`53664`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` where ``origin`` has no effect in resample when values are outside of axis (:issue:`53662`)
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -1492,6 +1492,9 @@ def _groupby_op(
else:
result_mask = np.zeros(ngroups, dtype=bool)

if op.how == "rank" and kwargs.get("na_option") in ["top", "bottom"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can just use how here instead of op.how. It avoids an attribute access, and relies less on the state of WrappedCythonOp.

result_mask[:] = False

res_values = op._cython_op_ndim_compat(
self._data,
min_count=min_count,
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/groupby/test_rank.py
Original file line number Diff line number Diff line change
Expand Up @@ -710,3 +710,14 @@ def test_rank_categorical():

expected = df.astype(object).groupby("col1").rank()
tm.assert_frame_equal(res, expected)


@pytest.mark.parametrize("na_option", ["top", "bottom"])
def test_groupby_op_with_nullables(na_option):
# GH 54206
df_ext = DataFrame({"x": [None]}, dtype="Float64")
result_ext = df_ext.groupby("x", dropna=False)["x"].rank(
method="min", na_option=na_option
)
expected_result_ext = Series([1.0], dtype="float64", name=result_ext.name)
tm.assert_series_equal(result_ext, expected_result_ext, check_dtype=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the suffix _ext here? We typically just call these df, result, and expected. I think it's best to stick to that (unless there is a reason not to).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - why check_dtype=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach sir, I made all the suggested changes. The reason for adding check_dtype=False was that initially the test was failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for adding check_dtype=False was that initially the test was failing.

If the issue with the dtype is result (and not caused by this PR), then we should report a bug. If the issue is with expected, then we should change the dtype of expected to the correct one and remove check_dtype=False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sai-Suraj-27: Friendly ping here; I think this is the last bit and then we're all set to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach sir, Sorry i missed this notification somehow, just saw it again today! I have removed the check_dtype=False condition. let's see if all the tests pass, Please give your review, Thank you.