-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series.rank modifies inplace with NaT #18576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18576 +/- ##
==========================================
+ Coverage 91.59% 91.6% +0.01%
==========================================
Files 153 153
Lines 51364 51367 +3
==========================================
+ Hits 47046 47054 +8
+ Misses 4318 4313 -5
Continue to review full report at Codecov.
|
pandas/core/generic.py
Outdated
@@ -5620,7 +5620,7 @@ def rank(self, axis=0, method='average', numeric_only=None, | |||
raise NotImplementedError(msg) | |||
|
|||
def ranker(data): | |||
ranks = algos.rank(data.values, axis=axis, method=method, | |||
ranks = algos.rank(data.values.copy(), axis=axis, method=method, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this in pandas/core/algorithms.py
instead (even better would be to change the actual cython code to avoid the need for a copy here)
pandas/tests/series/test_rank.py
Outdated
|
||
def test_rank_modify_inplace(self): | ||
# GH 18521 | ||
df = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this s
use result= and expected=
pandas/tests/series/test_rank.py
Outdated
# GH 18521 | ||
df = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT]) | ||
pre_rank_df = df.copy() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test for an all-float DataFrame); in pandas/tests/frame/test_analytics
Pushed to show still thinking about it. Currently looking at other methods that handle |
a6ce383
to
6e24fe9
Compare
doc/source/whatsnew/v0.22.0.txt
Outdated
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`) | ||
- | ||
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`) | ||
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like some rebase issues
pandas/core/algorithms.py
Outdated
@@ -219,6 +219,10 @@ def _get_data_algo(values, func_map): | |||
if is_categorical_dtype(values): | |||
values = values._values_for_rank() | |||
|
|||
# Create copy in case NaT converts to asi8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do this here at all, this will affect virtually all algos. in pandas/_libs/algos_rank_helper.pxi.in
, you can just do
if mask.any():
values = values.copy()
np.putmask(....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it makes a lot more sense now and exercises a gentle introduction to working with cython for me!
6e24fe9
to
ca5d28c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. doc-comments. ping when green.
@@ -336,4 +336,4 @@ Other | |||
^^^^^ | |||
|
|||
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`) | |||
- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to the reshaping section (n bug fix)
@@ -84,6 +84,9 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True, | |||
mask = np.isnan(values) | |||
{{elif dtype == 'int64'}} | |||
mask = values == iNaT | |||
# create copy in case of iNaT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add that we are mutating the values in-place here
@@ -2214,3 +2214,11 @@ def test_series_broadcasting(self): | |||
df_nan.clip_lower(s, axis=0) | |||
for op in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']: | |||
getattr(df, op)(s_nan, axis=0) | |||
|
|||
def test_series_nat_conversion(self): | |||
# GH 18521 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a 1-liner explaining this is testing non-mutataion of the input data
@@ -368,3 +368,12 @@ def test_rank_object_bug(self): | |||
# smoke tests | |||
Series([np.nan] * 32).astype(object).rank(ascending=True) | |||
Series([np.nan] * 32).astype(object).rank(ascending=False) | |||
|
|||
def test_rank_modify_inplace(self): | |||
# GH 18521 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@jreback green 🏁 |
thanks @GuessWhoSamFoo |
git diff upstream/master -u -- "*.py" | flake8 --diff
Looks like
_ensure_data
converts NaT to -9223372036854775808 then that gets converted back to datetime64.