-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: sorting with large float and multiple columns incorrect #14944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
1afdbb8
03699c6
358a31e
21e610c
04dcbe8
60cca5d
4f28026
c244438
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ | |
|
||
from pandas.compat import lrange | ||
from pandas import (DataFrame, Series, MultiIndex, Timestamp, | ||
date_range) | ||
date_range, NaT) | ||
|
||
from pandas.util.testing import (assert_series_equal, | ||
assert_frame_equal, | ||
|
@@ -491,3 +491,56 @@ def test_frame_column_inplace_sort_exception(self): | |
|
||
cp = s.copy() | ||
cp.sort_values() # it works! | ||
|
||
def test_sort_nat_values_in_int_column(self): | ||
|
||
# GH 14922: "sorting with large float and multiple columns incorrect" | ||
|
||
# cause was that the int64 value NaT was considered as "na". Which is | ||
# only correct for datetime64 columns. | ||
|
||
int_values = (2, int(NaT)) | ||
float_values = (2.0, -1.797693e308) | ||
|
||
df = DataFrame(dict(int=int_values, float=float_values), | ||
columns=["int", "float"]) | ||
|
||
df_reversed = DataFrame(dict(int=int_values[::-1], | ||
float=float_values[::-1]), | ||
columns=["int", "float"], | ||
index=[1, 0]) | ||
|
||
# NaT is not a "na" for int64 columns, so na_position must not | ||
# influence the result: | ||
df_sorted = df.sort_values(["int", "float"], na_position="last") | ||
assert_frame_equal(df_sorted, df_reversed) | ||
|
||
df_sorted = df.sort_values(["int", "float"], na_position="first") | ||
assert_frame_equal(df_sorted, df_reversed) | ||
|
||
# reverse sorting order | ||
df_sorted = df.sort_values(["int", "float"], ascending=False) | ||
assert_frame_equal(df_sorted, df) | ||
|
||
# and now check if NaT is still considered as "na" for datetime64 | ||
# columns: | ||
df = DataFrame(dict(int=int_values, float=float_values), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a not-used line |
||
columns=["int", "float"]) | ||
|
||
df = DataFrame(dict(datetime=[Timestamp("2016-01-01"), NaT], | ||
float=float_values), columns=["datetime", "float"]) | ||
|
||
# check if the dtype is datetime64[ns]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove this line |
||
assert df["datetime"].dtypes == np.dtype("datetime64[ns]"),\ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line here is not needed (the assert of datetime) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
"this test function is not reliable anymore" | ||
|
||
df_reversed = DataFrame(dict(datetime=[NaT, Timestamp("2016-01-01")], | ||
float=float_values[::-1]), | ||
columns=["datetime", "float"], | ||
index=[1, 0]) | ||
|
||
df_sorted = df.sort_values(["datetime", "float"], na_position="first") | ||
assert_frame_equal(df_sorted, df_reversed) | ||
|
||
df_sorted = df.sort_values(["datetime", "float"], na_position="last") | ||
assert_frame_equal(df_sorted, df_reversed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use np.iinfo(np.int64).min instead (and NaT) is already an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought
NaT
is implicitly indicating what went wrong...