-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame sort_values and multiple "by" columns fails to order NaT correctly #16995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -321,7 +342,11 @@ def test_sort_nat_values_in_int_column(self): | |||
assert_frame_equal(df_sorted, df_reversed) | |||
|
|||
df_sorted = df.sort_values(["datetime", "float"], na_position="last") | |||
assert_frame_equal(df_sorted, df_reversed) | |||
assert_frame_equal(df_sorted, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this assertion statement change? (good to know for future reference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, this was a bug in the test. Previously, the two assertions implied that na_position does not have any effect on the sort, which is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense.
Codecov Report
@@ Coverage Diff @@
## master #16995 +/- ##
==========================================
- Coverage 90.99% 90.97% -0.02%
==========================================
Files 161 161
Lines 49290 49286 -4
==========================================
- Hits 44851 44838 -13
- Misses 4439 4448 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16995 +/- ##
==========================================
- Coverage 90.99% 90.97% -0.02%
==========================================
Files 161 161
Lines 49294 49290 -4
==========================================
- Hits 44857 44844 -13
- Misses 4437 4446 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of comments
pandas/tests/frame/test_sorting.py
Outdated
@@ -89,6 +89,22 @@ def test_sort_values(self): | |||
with tm.assert_raises_regex(ValueError, msg): | |||
frame.sort_values(by=['A', 'B'], axis=0, ascending=[True] * 5) | |||
|
|||
# GH 16836 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a separately named tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is actually superfluous. The fix to test_sort_nat_values_in_int_columns
is sufficient to cover the issue. Do you still want to keep it?
@@ -269,6 +285,11 @@ def test_sort_datetimes(self): | |||
df2 = df.sort_values(by=['B']) | |||
assert_frame_equal(df1, df2) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a comment as to the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular change does not have anything to do with the fix. I added it to fix the test since I saw that its goal was broken
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -144,6 +144,7 @@ Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Fixes regression in 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`) | |||
- Fixes regression when sorting by multiple columns on a datetime array with NaT values (:issue:`16836`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double back-ticks around NaT
; say datetime64
dtype (its not an array, rather a Series)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to reshaping section
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -144,6 +144,7 @@ Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Fixes regression in 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`) | |||
- Fixes regression when sorting by multiple columns on a datetime array with NaT values (:issue:`16836`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to reshaping section
* Removed unnecessary conversion to i8 * Fixed failed test (`test_frame_column_inplace_sort_exception`) * Added check to ensure that the test is performing its intended goal(`test_sort_nan`)
4dd7e3c
to
257e10a
Compare
@@ -321,7 +326,27 @@ def test_sort_nat_values_in_int_column(self): | |||
assert_frame_equal(df_sorted, df_reversed) | |||
|
|||
df_sorted = df.sort_values(["datetime", "float"], na_position="last") | |||
assert_frame_equal(df_sorted, df_reversed) | |||
assert_frame_equal(df_sorted, df) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for a new issue, generally like a separate test. can you fix here. otherwise lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter test starting on line 337 is actually redundant with this test. Do you want me to remove it, or move it to a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some doc comments
@@ -222,6 +222,7 @@ Sparse | |||
Reshaping | |||
^^^^^^^^^ | |||
- Joining/Merging with a non unique ``PeriodIndex`` raised a TypeError (:issue:`16871`) | |||
- Fixes regression when sorting by multiple columns on a ``datetime64`` dtype ``Series`` with ``NaT`` values (:issue:`16836`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetime64[ns]
dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say bug in DataFrame.sort_values()
@@ -269,6 +269,11 @@ def test_sort_datetimes(self): | |||
df2 = df.sort_values(by=['B']) | |||
assert_frame_equal(df1, df2) | |||
|
|||
df1 = df.sort_values(by='B') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
expected =
result =
d4 = [Timestamp(x) for x in ['2014-01-01', '2015-01-01', | ||
'2017-01-01', '2016-01-01']] | ||
expected = pd.DataFrame({'a': d3, 'b': d4}, index=[1, 3, 0, 2]) | ||
sorted_df = df.sort_values(by=['a', 'b'], ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result=
can you rebase and update |
can you rebase / update |
looks fine, can you rebase. ping on green. |
thanks! |
…aT correctly closes pandas-dev#16836 Author: Jean-Mathieu Deschenes <[email protected]> This patch had conflicts when merged, resolved by Committer: Jeff Reback <[email protected]> Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits: 257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836
…aT correctly closes pandas-dev#16836 Author: Jean-Mathieu Deschenes <[email protected]> This patch had conflicts when merged, resolved by Committer: Jeff Reback <[email protected]> Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits: 257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836
test_frame_column_inplace_sort_exception
)test_sort_nan
)git diff upstream/master -u -- "*.py" | flake8 --diff