-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: sorting with large float and multiple columns incorrect #14944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
columns=("int", "float")) | ||
|
||
df2 = df.sort_values(["int", "float"]) | ||
assert is_sorted(df2.loc[:, "int"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert_frame_equal and construct the expected result.
having the int equivalent of NaT in an int64 column caused wrong sorting because this special value was considered as "missing value".
06f2bd2
to
1afdbb8
Compare
having the int equivalent of NaT in an int64 column caused wrong sorting because this special value was considered as "missing value".
Current coverage is 84.65% (diff: 100%)@@ master #14944 diff @@
==========================================
Files 144 144
Lines 51021 51031 +10
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43188 43199 +11
+ Misses 7833 7832 -1
Partials 0 0
|
|
||
from pandas.util.testing import (assert_series_equal, | ||
assert_frame_equal, | ||
assertRaisesRegexp) | ||
assertRaisesRegexp, | ||
is_sorted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to import the is_sorted
can you add a similar tests which uses datetimes and a NaT as well (just to verify that it doesn't break). further pls check both ascending and descending. |
def test_sort_nat_values_in_int_column(self): | ||
|
||
# GH 14922, sorting with large float and multiple columns incorrect | ||
int_values = (2, int(NaT)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use np.iinfo(np.int64).min instead (and NaT) is already an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought NaT
is implicitly indicating what went wrong...
can you add a whatsnew note (0.20.0, bug fixes) |
|
||
# and now check if NaT is still considered as "na" for datetime64 | ||
# columns: | ||
df = DataFrame(dict(int=int_values, float=float_values), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a not-used line
df = DataFrame(dict(datetime=[Timestamp("2016-01-01"), NaT], | ||
float=float_values), columns=["datetime", "float"]) | ||
|
||
# check if the dtype is datetime64[ns]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line
df = DataFrame(dict(datetime=[Timestamp("2016-01-01"), NaT], | ||
float=float_values), columns=["datetime", "float"]) | ||
|
||
assert df["datetime"].dtypes == np.dtype("datetime64[ns]"),\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line here is not needed (the assert of datetime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Ok now ? |
thanks! |
closes pandas-dev#14922 Having the `int` equivalent of `NaT` in an `int64` column caused wrong sorting because this special value was considered as "missing value". Author: Uwe <[email protected]> Closes pandas-dev#14944 from uweschmitt/fix-gh-14922 and squashes the following commits: c244438 [Uwe] further cleanup tests 4f28026 [Uwe] fixed typo in whatsnew/v0.20.0.txt 60cca5d [Uwe] add fix of GH14922 to release notes for 0.20.0 04dcbe8 [Uwe] further test cleanup 21e610c [Uwe] extended tests + minor cleanup 358a31e [Uwe] Merge branch 'fix-pandas-devgh-14922' of github.com:uweschmitt/pandas into fix-pandas-devgh-14922 03699c6 [Uwe] Fix GH 14922 1afdbb8 [Uwe] Fix GH 14922
Fixes #14922
Having the
int
equivalent ofNaT
in anint64
column caused wrong sorting because this special value was considered as "missing value".git diff upstream/master | flake8 --diff