Skip to content

DataFrame sort_values and multiple "by" columns fails to order NaT correctly #16995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jdeschenes
Copy link
Contributor

@jdeschenes jdeschenes commented Jul 17, 2017

  • Removed unnecessary conversion to i8
  • Fixed failed test (test_frame_column_inplace_sort_exception)
  • Added check to ensure that the test is performing its intended goal(test_sort_nan)

@gfyoung gfyoung added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Jul 17, 2017
@@ -321,7 +342,11 @@ def test_sort_nat_values_in_int_column(self):
assert_frame_equal(df_sorted, df_reversed)

df_sorted = df.sort_values(["datetime", "float"], na_position="last")
assert_frame_equal(df_sorted, df_reversed)
assert_frame_equal(df_sorted, df)
Copy link
Member

@gfyoung gfyoung Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this assertion statement change? (good to know for future reference)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this was a bug in the test. Previously, the two assertions implied that na_position does not have any effect on the sort, which is incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense.

@codecov
Copy link

codecov bot commented Jul 17, 2017

Codecov Report

Merging #16995 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16995      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49290    49286       -4     
==========================================
- Hits        44851    44838      -13     
- Misses       4439     4448       +9
Flag Coverage Δ
#multiple 88.74% <ø> (-0.01%) ⬇️
#single 40.19% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.76% <ø> (-0.11%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bd871f...eaebb7b. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 17, 2017

Codecov Report

Merging #16995 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16995      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49294    49290       -4     
==========================================
- Hits        44857    44844      -13     
- Misses       4437     4446       +9
Flag Coverage Δ
#multiple 88.75% <100%> (-0.01%) ⬇️
#single 40.19% <0%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.76% <100%> (-0.11%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34210ac...257e10a. Read the comment docs.

@gfyoung gfyoung added this to the 0.21.0 milestone Jul 17, 2017
@jreback jreback changed the title Fix for #16836 DataFrame sort_values and multiple "by" columns fails to order NaT correctly Jul 18, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments

@@ -89,6 +89,22 @@ def test_sort_values(self):
with tm.assert_raises_regex(ValueError, msg):
frame.sort_values(by=['A', 'B'], axis=0, ascending=[True] * 5)

# GH 16836
Copy link
Contributor

@jreback jreback Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a separately named tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is actually superfluous. The fix to test_sort_nat_values_in_int_columns is sufficient to cover the issue. Do you still want to keep it?

@@ -269,6 +285,11 @@ def test_sort_datetimes(self):
df2 = df.sort_values(by=['B'])
assert_frame_equal(df1, df2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a comment as to the issue number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular change does not have anything to do with the fix. I added it to fix the test since I saw that its goal was broken

@@ -144,6 +144,7 @@ Bug Fixes
~~~~~~~~~

- Fixes regression in 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
- Fixes regression when sorting by multiple columns on a datetime array with NaT values (:issue:`16836`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double back-ticks around NaT; say datetime64 dtype (its not an array, rather a Series)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to reshaping section

@@ -144,6 +144,7 @@ Bug Fixes
~~~~~~~~~

- Fixes regression in 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
- Fixes regression when sorting by multiple columns on a datetime array with NaT values (:issue:`16836`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to reshaping section

Jean-Mathieu Deschenes added 2 commits July 18, 2017 14:38
* Removed unnecessary conversion to i8
* Fixed failed test (`test_frame_column_inplace_sort_exception`)
* Added check to ensure that the test is performing its intended goal(`test_sort_nan`)
@jdeschenes jdeschenes force-pushed the datetime_sort_issues branch from 4dd7e3c to 257e10a Compare July 18, 2017 18:42
@@ -321,7 +326,27 @@ def test_sort_nat_values_in_int_column(self):
assert_frame_equal(df_sorted, df_reversed)

df_sorted = df.sort_values(["datetime", "float"], na_position="last")
assert_frame_equal(df_sorted, df_reversed)
assert_frame_equal(df_sorted, df)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for a new issue, generally like a separate test. can you fix here. otherwise lgtm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter test starting on line 337 is actually redundant with this test. Do you want me to remove it, or move it to a separate test?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some doc comments

@@ -222,6 +222,7 @@ Sparse
Reshaping
^^^^^^^^^
- Joining/Merging with a non unique ``PeriodIndex`` raised a TypeError (:issue:`16871`)
- Fixes regression when sorting by multiple columns on a ``datetime64`` dtype ``Series`` with ``NaT`` values (:issue:`16836`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime64[ns] dtype

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say bug in DataFrame.sort_values()

@@ -269,6 +269,11 @@ def test_sort_datetimes(self):
df2 = df.sort_values(by=['B'])
assert_frame_equal(df1, df2)

df1 = df.sort_values(by='B')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

expected = 

result = 

d4 = [Timestamp(x) for x in ['2014-01-01', '2015-01-01',
'2017-01-01', '2016-01-01']]
expected = pd.DataFrame({'a': d3, 'b': d4}, index=[1, 3, 0, 2])
sorted_df = df.sort_values(by=['a', 'b'], )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result=

@jreback
Copy link
Contributor

jreback commented Aug 9, 2017

can you rebase and update

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

can you rebase / update

@jreback
Copy link
Contributor

jreback commented Sep 28, 2017

looks fine, can you rebase. ping on green.

@jreback
Copy link
Contributor

jreback commented Sep 29, 2017

thanks!

@jreback jreback closed this in ad7d051 Sep 29, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
…aT correctly

closes pandas-dev#16836

Author: Jean-Mathieu Deschenes <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Jeff Reback <[email protected]>

Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits:

257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback
c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
…aT correctly

closes pandas-dev#16836

Author: Jean-Mathieu Deschenes <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Jeff Reback <[email protected]>

Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits:

257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback
c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame sort_values and multiple "by" columns fails to order NaT correctly (since v0.19)
3 participants