Skip to content

BUG: rank_2d raising with mixed dtypes #38932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 5, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,8 @@ Numeric
- Bug in :meth:`DataFrame.select_dtypes` with ``include=np.number`` now retains numeric ``ExtensionDtype`` columns (:issue:`35340`)
- Bug in :meth:`DataFrame.mode` and :meth:`Series.mode` not keeping consistent integer :class:`Index` for empty input (:issue:`33321`)
- Bug in :meth:`DataFrame.rank` with ``np.inf`` and mixture of ``np.nan`` and ``np.inf`` (:issue:`32593`)
- Bug in :meth:`DataFrame.rank` with ``axis=0`` and columns holding incomparable types raising ``IndexError``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is not issue (its fine don't create one), but add this PR number as the issue number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for committing that, yep no issue

-

Conversion
^^^^^^^^^^
Expand Down
33 changes: 6 additions & 27 deletions pandas/_libs/algos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1112,7 +1112,6 @@ def rank_2d(
int tiebreak = 0
int64_t idx
bint check_mask, condition, keep_na
const int64_t[:] labels

tiebreak = tiebreakers[ties_method]

Expand Down Expand Up @@ -1158,34 +1157,14 @@ def rank_2d(

n, k = (<object>values).shape
ranks = np.empty((n, k), dtype='f8')
# For compatibility when calling rank_1d
labels = np.zeros(k, dtype=np.int64)

if rank_t is object:
try:
_as = values.argsort(1)
except TypeError:
values = in_arr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the source of the bug - if axis=0, values is supposed to be transposed (as done earlier in the function) - setting equal to in_arr makes values the wrong shape since in_arr was not transposed

for i in range(len(values)):
ranks[i] = rank_1d(
in_arr[i],
labels=labels,
ties_method=ties_method,
ascending=ascending,
pct=pct
)
if axis == 0:
return ranks.T
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole try/except is not necessary because the except clause will always raise a TypeError anyway since if values.argsort(1) raises a TypeError, at least one column in values must not be sortable, so that same column will cause another TypeError to be raised when trying to fallback to rank_1d

else:
return ranks
if tiebreak == TIEBREAK_FIRST:
# need to use a stable sort here
_as = values.argsort(axis=1, kind='mergesort')
if not ascending:
tiebreak = TIEBREAK_FIRST_DESCENDING
else:
if tiebreak == TIEBREAK_FIRST:
# need to use a stable sort here
_as = values.argsort(axis=1, kind='mergesort')
if not ascending:
tiebreak = TIEBREAK_FIRST_DESCENDING
else:
_as = values.argsort(1)
_as = values.argsort(1)

if not ascending:
_as = _as[:, ::-1]
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/frame/methods/test_rank.py
Original file line number Diff line number Diff line change
Expand Up @@ -445,3 +445,15 @@ def test_rank_both_inf(self):
expected = DataFrame({"a": [1.0, 2.0, 3.0]})
result = df.rank()
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize(
"data,expected",
[
({"a": [1, 2, "a"], "b": [4, 5, 6]}, DataFrame({"b": [1.0, 2.0, 3.0]})),
({"a": [1, 2, "a"]}, DataFrame(index=range(3))),
],
)
def test_rank_mixed_axis_zero(self, data, expected):
df = DataFrame(data)
result = df.rank()
tm.assert_frame_equal(result, expected)