Skip to content

BUG: Series.rank modifies inplace with NaT #18576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 14, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -336,4 +336,4 @@ Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to the reshaping section (n bug fix)

- Bug in :func:`Series.rank` where ``Series`` containing ``NaT`` modifies the ``Series`` inplace (:issue:`18521`)
3 changes: 3 additions & 0 deletions pandas/_libs/algos_rank_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,9 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
mask = np.isnan(values)
{{elif dtype == 'int64'}}
mask = values == iNaT
# create copy in case of iNaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add that we are mutating the values in-place here

if mask.any():
values = values.copy()
{{endif}}

# double sort first by mask and then by values to ensure nan values are
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -2214,3 +2214,11 @@ def test_series_broadcasting(self):
df_nan.clip_lower(s, axis=0)
for op in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']:
getattr(df, op)(s_nan, axis=0)

def test_series_nat_conversion(self):
# GH 18521
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a 1-liner explaining this is testing non-mutataion of the input data

df = DataFrame(np.random.randn(10, 3), dtype='float64')
expected = df.copy()
df.rank()
result = df
tm.assert_frame_equal(result, expected)
11 changes: 10 additions & 1 deletion pandas/tests/series/test_rank.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
from pandas import compat
from pandas import compat, Timestamp

import pytest

Expand Down Expand Up @@ -368,3 +368,12 @@ def test_rank_object_bug(self):
# smoke tests
Series([np.nan] * 32).astype(object).rank(ascending=True)
Series([np.nan] * 32).astype(object).rank(ascending=False)

def test_rank_modify_inplace(self):
# GH 18521
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

s = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT])
expected = s.copy()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for an all-float DataFrame); in pandas/tests/frame/test_analytics

s.rank()
result = s
assert_series_equal(result, expected)