-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame sort_values and multiple "by" columns fails to order NaT correctly #16995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -269,6 +269,11 @@ def test_sort_datetimes(self): | |
df2 = df.sort_values(by=['B']) | ||
assert_frame_equal(df1, df2) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put a comment as to the issue number There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This particular change does not have anything to do with the fix. I added it to fix the test since I saw that its goal was broken |
||
df1 = df.sort_values(by='B') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use
|
||
|
||
df2 = df.sort_values(by=['C', 'B']) | ||
assert_frame_equal(df1, df2) | ||
|
||
def test_frame_column_inplace_sort_exception(self): | ||
s = self.frame['A'] | ||
with tm.assert_raises_regex(ValueError, "This Series is a view"): | ||
|
@@ -321,7 +326,27 @@ def test_sort_nat_values_in_int_column(self): | |
assert_frame_equal(df_sorted, df_reversed) | ||
|
||
df_sorted = df.sort_values(["datetime", "float"], na_position="last") | ||
assert_frame_equal(df_sorted, df_reversed) | ||
assert_frame_equal(df_sorted, df) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did this assertion statement change? (good to know for future reference) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I understand, this was a bug in the test. Previously, the two assertions implied that na_position does not have any effect on the sort, which is incorrect. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, makes sense. |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok for a new issue, generally like a separate test. can you fix here. otherwise lgtm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The latter test starting on line 337 is actually redundant with this test. Do you want me to remove it, or move it to a separate test? |
||
# Ascending should not affect the results. | ||
df_sorted = df.sort_values(["datetime", "float"], ascending=False) | ||
assert_frame_equal(df_sorted, df) | ||
|
||
# GH 16836 | ||
|
||
d1 = [Timestamp(x) for x in ['2016-01-01', '2015-01-01', | ||
np.nan, '2016-01-01']] | ||
d2 = [Timestamp(x) for x in ['2017-01-01', '2014-01-01', | ||
'2016-01-01', '2015-01-01']] | ||
df = pd.DataFrame({'a': d1, 'b': d2}, index=[0, 1, 2, 3]) | ||
|
||
d3 = [Timestamp(x) for x in ['2015-01-01', '2016-01-01', | ||
'2016-01-01', np.nan]] | ||
d4 = [Timestamp(x) for x in ['2014-01-01', '2015-01-01', | ||
'2017-01-01', '2016-01-01']] | ||
expected = pd.DataFrame({'a': d3, 'b': d4}, index=[1, 3, 0, 2]) | ||
sorted_df = df.sort_values(by=['a', 'b'], ) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
tm.assert_frame_equal(sorted_df, expected) | ||
|
||
|
||
class TestDataFrameSortIndexKinds(TestData): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetime64[ns]
dtypeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say bug in
DataFrame.sort_values()