Skip to content

Fixes grouped rank bug with nullable types. #54460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
41d4419
Fix grouped rank bug with nullable types, and added a test.
Sai-Suraj-27 Aug 8, 2023
ce3b5d8
Updated the test function to remove errors.
Sai-Suraj-27 Aug 8, 2023
fc3ac9f
updated files with pre-commit run.
Sai-Suraj-27 Aug 8, 2023
6a425f1
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 8, 2023
63faab6
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 14, 2023
41f4529
Removed unnecessary comments, and updated about bug fix in what's new.
Sai-Suraj-27 Aug 14, 2023
566cf2d
Revert last two commits
Sai-Suraj-27 Aug 18, 2023
1551301
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 18, 2023
313219c
Made all the requested changes.
Sai-Suraj-27 Aug 18, 2023
0a2d569
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 19, 2023
476a9b6
Edited the note in whatsnew 2.1 and also added test for 'top'.
Sai-Suraj-27 Aug 19, 2023
676d975
Fixed small pre-commit errors.
Sai-Suraj-27 Aug 19, 2023
9a4215b
fixed the error in tests.
Sai-Suraj-27 Aug 19, 2023
2b7ae59
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 19, 2023
091ce9f
updated the test to add both 'top' and 'bottom'.
Sai-Suraj-27 Aug 19, 2023
18efb62
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 21, 2023
2251f7b
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 22, 2023
a7b8f79
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 24, 2023
5cf2b59
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 24, 2023
254d8d0
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Aug 25, 2023
8961c61
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 29, 2023
31832f1
Merge branch 'fix-grouped-rank-bug' of https://github.com/Sai-Suraj-2…
Sai-Suraj-27 Aug 29, 2023
59d8b25
Renamed few variables and updated what's new.
Sai-Suraj-27 Aug 30, 2023
22e9239
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Aug 30, 2023
65e1c45
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Sep 19, 2023
1e22b7e
Removed the check_dtype=False option in tests.
Sai-Suraj-27 Sep 19, 2023
85627e6
Changed the dtype of expected to Float64.
Sai-Suraj-27 Sep 20, 2023
e285efa
Merge branch 'main' of https://github.com/pandas-dev/pandas into fix-…
Sai-Suraj-27 Sep 20, 2023
5a63a11
Merge branch 'main' into fix-grouped-rank-bug
Sai-Suraj-27 Sep 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -775,6 +775,8 @@ Plotting

Groupby/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in :meth:`DataFrame.groupby` When using the rank function on grouped data frames, the result is different when using nullable datatypes compared to numpy datatypes.
This is only an issue when passing ``na_option="bottom"`` or ``na_option="top"`` (:issue:`54206`)
- Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmax` returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (:issue:`51423`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` in incorrectly allowing non-fixed ``freq`` when resampling on a :class:`TimedeltaIndex` (:issue:`51896`)
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` losing time zone when resampling empty data (:issue:`53664`)
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -1492,6 +1492,9 @@ def _groupby_op(
else:
result_mask = np.zeros(ngroups, dtype=bool)

if op.how == "rank" and kwargs.get("na_option") in ["top", "bottom"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can just use how here instead of op.how. It avoids an attribute access, and relies less on the state of WrappedCythonOp.

result_mask[:] = False

res_values = op._cython_op_ndim_compat(
self._data,
min_count=min_count,
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/groupby/test_rank.py
Original file line number Diff line number Diff line change
Expand Up @@ -710,3 +710,13 @@ def test_rank_categorical():

expected = df.astype(object).groupby("col1").rank()
tm.assert_frame_equal(res, expected)


def test_groupby_op_with_nullables():
# GH 54206
df_ext = DataFrame({"x": [None]}, dtype="Float64")
result_ext = df_ext.groupby("x", dropna=False)["x"].rank(
method="min", na_option="bottom"
)
expected_result_ext = Series([1.0], dtype="float64", name=result_ext.name)
tm.assert_series_equal(result_ext, expected_result_ext, check_dtype=False)