Skip to content

BUG: Series.rank modifies inplace with NaT #18576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 14, 2017

Conversation

GuessWhoSamFoo
Copy link
Contributor

Looks like _ensure_data converts NaT to -9223372036854775808 then that gets converted back to datetime64.

@GuessWhoSamFoo GuessWhoSamFoo changed the title Copy data.values when ranking BUG: Series.rank modifies inplace with NaT Nov 30, 2017
@codecov
Copy link

codecov bot commented Nov 30, 2017

Codecov Report

Merging #18576 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18576      +/-   ##
==========================================
+ Coverage   91.59%    91.6%   +0.01%     
==========================================
  Files         153      153              
  Lines       51364    51367       +3     
==========================================
+ Hits        47046    47054       +8     
+ Misses       4318     4313       -5
Flag Coverage Δ
#multiple 89.46% <ø> (+0.02%) ⬆️
#single 40.76% <ø> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/_test_decorators.py 93.33% <0%> (-0.79%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/internals.py 94.42% <0%> (-0.02%) ⬇️
pandas/util/testing.py 82.52% <0%> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.44% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.71% <0%> (+0.01%) ⬆️
pandas/core/categorical.py 95.79% <0%> (+0.14%) ⬆️
pandas/plotting/_converter.py 66.52% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2fd22e...caa3fa9. Read the comment docs.

@@ -5620,7 +5620,7 @@ def rank(self, axis=0, method='average', numeric_only=None,
raise NotImplementedError(msg)

def ranker(data):
ranks = algos.rank(data.values, axis=axis, method=method,
ranks = algos.rank(data.values.copy(), axis=axis, method=method,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this in pandas/core/algorithms.py instead (even better would be to change the actual cython code to avoid the need for a copy here)


def test_rank_modify_inplace(self):
# GH 18521
df = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this s

use result= and expected=

# GH 18521
df = Series([Timestamp('2017-01-05 10:20:27.569000'), NaT])
pre_rank_df = df.copy()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for an all-float DataFrame); in pandas/tests/frame/test_analytics

@jreback jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 30, 2017
@GuessWhoSamFoo
Copy link
Contributor Author

Pushed to show still thinking about it. Currently looking at other methods that handle values.asi8

@GuessWhoSamFoo GuessWhoSamFoo force-pushed the rank_inplace_bug branch 2 times, most recently from a6ce383 to 6e24fe9 Compare December 11, 2017 06:15
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
-
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
- Fixed a bug where creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (:issue:`16406`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some rebase issues

@@ -219,6 +219,10 @@ def _get_data_algo(values, func_map):
if is_categorical_dtype(values):
values = values._values_for_rank()

# Create copy in case NaT converts to asi8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do this here at all, this will affect virtually all algos. in pandas/_libs/algos_rank_helper.pxi.in, you can just do

if mask.any():
    values = values.copy()
    np.putmask(....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it makes a lot more sense now and exercises a gentle introduction to working with cython for me!

@jreback jreback added this to the 0.22.0 milestone Dec 13, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. doc-comments. ping when green.

@@ -336,4 +336,4 @@ Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to the reshaping section (n bug fix)

@@ -84,6 +84,9 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True,
mask = np.isnan(values)
{{elif dtype == 'int64'}}
mask = values == iNaT
# create copy in case of iNaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add that we are mutating the values in-place here

@@ -2214,3 +2214,11 @@ def test_series_broadcasting(self):
df_nan.clip_lower(s, axis=0)
for op in ['lt', 'le', 'gt', 'ge', 'eq', 'ne']:
getattr(df, op)(s_nan, axis=0)

def test_series_nat_conversion(self):
# GH 18521
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a 1-liner explaining this is testing non-mutataion of the input data

@@ -368,3 +368,12 @@ def test_rank_object_bug(self):
# smoke tests
Series([np.nan] * 32).astype(object).rank(ascending=True)
Series([np.nan] * 32).astype(object).rank(ascending=False)

def test_rank_modify_inplace(self):
# GH 18521
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@GuessWhoSamFoo
Copy link
Contributor Author

@jreback green 🏁

@jreback jreback merged commit 34ef9eb into pandas-dev:master Dec 14, 2017
@jreback
Copy link
Contributor

jreback commented Dec 14, 2017

thanks @GuessWhoSamFoo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rank() makes unexpected inplace changes
2 participants