BUG: Series.rank modifies inplace with NaT #18576

GuessWhoSamFoo · 2017-11-30T13:20:00Z

closes rank() makes unexpected inplace changes #18521
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Looks like _ensure_data converts NaT to -9223372036854775808 then that gets converted back to datetime64.

codecov · 2017-11-30T14:42:11Z

Codecov Report

Merging #18576 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #18576      +/-   ##
==========================================
+ Coverage   91.59%    91.6%   +0.01%     
==========================================
  Files         153      153              
  Lines       51364    51367       +3     
==========================================
+ Hits        47046    47054       +8     
+ Misses       4318     4313       -5

Flag	Coverage Δ
#multiple	`89.46% <ø> (+0.02%)`	⬆️
#single	`40.76% <ø> (-0.1%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/util/_test_decorators.py	`93.33% <0%> (-0.79%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️
pandas/core/internals.py	`94.42% <0%> (-0.02%)`	⬇️
pandas/util/testing.py	`82.52% <0%> (-0.01%)`	⬇️
pandas/core/indexes/base.py	`96.44% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.71% <0%> (+0.01%)`	⬆️
pandas/core/categorical.py	`95.79% <0%> (+0.14%)`	⬆️
pandas/plotting/_converter.py	`66.52% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2fd22e...caa3fa9. Read the comment docs.

jreback · 2017-11-30T15:07:58Z

pandas/core/generic.py

@@ -5620,7 +5620,7 @@ def rank(self, axis=0, method='average', numeric_only=None,
            raise NotImplementedError(msg)

        def ranker(data):
-            ranks = algos.rank(data.values, axis=axis, method=method,
+            ranks = algos.rank(data.values.copy(), axis=axis, method=method,


do this in pandas/core/algorithms.py instead (even better would be to change the actual cython code to avoid the need for a copy here)

jreback · 2017-11-30T15:08:37Z

pandas/tests/series/test_rank.py

+
+    def test_rank_modify_inplace(self):
+        # GH 18521
+        df = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT])


call this s

use result= and expected=

jreback · 2017-11-30T15:08:59Z

pandas/tests/series/test_rank.py

+        # GH 18521
+        df = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT])
+        pre_rank_df = df.copy()
+


add a test for an all-float DataFrame); in pandas/tests/frame/test_analytics

GuessWhoSamFoo · 2017-12-05T00:32:49Z

Pushed to show still thinking about it. Currently looking at other methods that handle values.asi8

jreback · 2017-12-11T11:10:14Z

doc/source/whatsnew/v0.22.0.txt

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
-
+- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
+- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)


looks like some rebase issues

jreback · 2017-12-11T11:15:12Z

pandas/core/algorithms.py

@@ -219,6 +219,10 @@ def _get_data_algo(values, func_map):
    if is_categorical_dtype(values):
        values = values._values_for_rank()

+    # Create copy in case NaT converts to asi8


I wouldn't do this here at all, this will affect virtually all algos. in pandas/_libs/algos_rank_helper.pxi.in, you can just do

if mask.any(): values = values.copy() np.putmask(....)

Thanks, it makes a lot more sense now and exercises a gentle introduction to working with cython for me!

jreback

lgtm. doc-comments. ping when green.

jreback · 2017-12-13T14:38:46Z

doc/source/whatsnew/v0.22.0.txt

@@ -336,4 +336,4 @@ Other
 ^^^^^

 - Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
-


can you move to the reshaping section (n bug fix)

jreback · 2017-12-13T14:39:07Z

pandas/_libs/algos_rank_helper.pxi.in

@@ -84,6 +84,9 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
    mask = np.isnan(values)
    {{elif dtype == 'int64'}}
    mask = values == iNaT
+    # create copy in case of iNaT


add that we are mutating the values in-place here

jreback · 2017-12-13T14:39:26Z

pandas/tests/frame/test_analytics.py

@@ -2214,3 +2214,11 @@ def test_series_broadcasting(self):
            df_nan.clip_lower(s, axis=0)
            for op in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']:
                getattr(df, op)(s_nan, axis=0)
+
+    def test_series_nat_conversion(self):
+        # GH 18521


add a 1-liner explaining this is testing non-mutataion of the input data

jreback · 2017-12-13T14:39:32Z

pandas/tests/series/test_rank.py

@@ -368,3 +368,12 @@ def test_rank_object_bug(self):
        # smoke tests
        Series([np.nan] * 32).astype(object).rank(ascending=True)
        Series([np.nan] * 32).astype(object).rank(ascending=False)
+
+    def test_rank_modify_inplace(self):
+        # GH 18521


same as above

GuessWhoSamFoo · 2017-12-14T05:52:05Z

@jreback green 🏁

jreback · 2017-12-14T11:30:38Z

thanks @GuessWhoSamFoo

GuessWhoSamFoo changed the title ~~Copy data.values when ranking~~ BUG: Series.rank modifies inplace with NaT Nov 30, 2017

jreback requested changes Nov 30, 2017

View reviewed changes

jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 30, 2017

GuessWhoSamFoo force-pushed the rank_inplace_bug branch 2 times, most recently from a6ce383 to 6e24fe9 Compare December 11, 2017 06:15

jreback requested changes Dec 11, 2017

View reviewed changes

GuessWhoSamFoo added 4 commits December 12, 2017 19:09

Copy data.values when ranking

e82b128

WIP: Thinking of better way than a copy

d1cc880

Rebased; copy if datetime

9c07153

Fixed whatsnew; copy value in rank_1d

ca5d28c

GuessWhoSamFoo force-pushed the rank_inplace_bug branch from 6e24fe9 to ca5d28c Compare December 13, 2017 00:26

jreback added this to the 0.22.0 milestone Dec 13, 2017

jreback requested changes Dec 13, 2017

View reviewed changes

Added comments about mutating inplace per review

aad8fb8

doc updates

caa3fa9

jreback approved these changes Dec 14, 2017

View reviewed changes

jreback merged commit 34ef9eb into pandas-dev:master Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.rank modifies inplace with NaT #18576

BUG: Series.rank modifies inplace with NaT #18576

GuessWhoSamFoo commented Nov 30, 2017

codecov bot commented Nov 30, 2017 •

edited

Loading

jreback Nov 30, 2017

jreback Nov 30, 2017

jreback Nov 30, 2017

GuessWhoSamFoo commented Dec 5, 2017

jreback Dec 11, 2017

jreback Dec 11, 2017

GuessWhoSamFoo Dec 13, 2017

jreback left a comment

jreback Dec 13, 2017

jreback Dec 13, 2017

jreback Dec 13, 2017

jreback Dec 13, 2017

GuessWhoSamFoo commented Dec 14, 2017

jreback commented Dec 14, 2017

BUG: Series.rank modifies inplace with NaT #18576

BUG: Series.rank modifies inplace with NaT #18576

Conversation

GuessWhoSamFoo commented Nov 30, 2017

codecov bot commented Nov 30, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuessWhoSamFoo commented Dec 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GuessWhoSamFoo commented Dec 14, 2017

jreback commented Dec 14, 2017

codecov bot commented Nov 30, 2017 •

edited

Loading