DataFrame sort_values and multiple "by" columns fails to order NaT correctly #16995

jdeschenes · 2017-07-17T15:40:40Z

Removed unnecessary conversion to i8
Fixed failed test (test_frame_column_inplace_sort_exception)
Added check to ensure that the test is performing its intended goal(test_sort_nan)

closes DataFrame sort_values and multiple "by" columns fails to order NaT correctly (since v0.19) #16836
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

gfyoung · 2017-07-17T16:30:41Z

pandas/tests/frame/test_sorting.py

@@ -321,7 +342,11 @@ def test_sort_nat_values_in_int_column(self):
        assert_frame_equal(df_sorted, df_reversed)

        df_sorted = df.sort_values(["datetime", "float"], na_position="last")
-        assert_frame_equal(df_sorted, df_reversed)
+        assert_frame_equal(df_sorted, df)


Why did this assertion statement change? (good to know for future reference)

As far as I understand, this was a bug in the test. Previously, the two assertions implied that na_position does not have any effect on the sort, which is incorrect.

Okay, makes sense.

codecov · 2017-07-17T17:59:58Z

Codecov Report

Merging #16995 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #16995      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49290    49286       -4     
==========================================
- Hits        44851    44838      -13     
- Misses       4439     4448       +9

Flag	Coverage Δ
#multiple	`88.74% <ø> (-0.01%)`	⬇️
#single	`40.19% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.76% <ø> (-0.11%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bd871f...eaebb7b. Read the comment docs.

codecov · 2017-07-17T18:00:13Z

Codecov Report

Merging #16995 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16995      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49294    49290       -4     
==========================================
- Hits        44857    44844      -13     
- Misses       4437     4446       +9

Flag	Coverage Δ
#multiple	`88.75% <100%> (-0.01%)`	⬇️
#single	`40.19% <0%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.76% <100%> (-0.11%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34210ac...257e10a. Read the comment docs.

jreback

couple of comments

jreback · 2017-07-18T01:23:33Z

pandas/tests/frame/test_sorting.py

@@ -89,6 +89,22 @@ def test_sort_values(self):
        with tm.assert_raises_regex(ValueError, msg):
            frame.sort_values(by=['A', 'B'], axis=0, ascending=[True] * 5)

+        # GH 16836


make this a separately named tests

The test is actually superfluous. The fix to test_sort_nat_values_in_int_columns is sufficient to cover the issue. Do you still want to keep it?

jreback · 2017-07-18T01:23:57Z

pandas/tests/frame/test_sorting.py

@@ -269,6 +285,11 @@ def test_sort_datetimes(self):
        df2 = df.sort_values(by=['B'])
        assert_frame_equal(df1, df2)



put a comment as to the issue number

This particular change does not have anything to do with the fix. I added it to fix the test since I saw that its goal was broken

jreback · 2017-07-18T01:24:39Z

doc/source/whatsnew/v0.21.0.txt

@@ -144,6 +144,7 @@ Bug Fixes
 ~~~~~~~~~

 - Fixes regression in 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
+- Fixes regression when sorting by multiple columns on a datetime array with NaT values (:issue:`16836`)


double back-ticks around NaT; say datetime64 dtype (its not an array, rather a Series)

move to reshaping section

jreback · 2017-07-18T01:25:00Z

doc/source/whatsnew/v0.21.0.txt

@@ -144,6 +144,7 @@ Bug Fixes
 ~~~~~~~~~

 - Fixes regression in 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
+- Fixes regression when sorting by multiple columns on a datetime array with NaT values (:issue:`16836`)


move to reshaping section

* Removed unnecessary conversion to i8 * Fixed failed test (`test_frame_column_inplace_sort_exception`) * Added check to ensure that the test is performing its intended goal(`test_sort_nan`)

jreback · 2017-07-18T23:37:27Z

pandas/tests/frame/test_sorting.py

@@ -321,7 +326,27 @@ def test_sort_nat_values_in_int_column(self):
        assert_frame_equal(df_sorted, df_reversed)

        df_sorted = df.sort_values(["datetime", "float"], na_position="last")
-        assert_frame_equal(df_sorted, df_reversed)
+        assert_frame_equal(df_sorted, df)
+


ok for a new issue, generally like a separate test. can you fix here. otherwise lgtm.

The latter test starting on line 337 is actually redundant with this test. Do you want me to remove it, or move it to a separate test?

jreback

some doc comments

jreback · 2017-07-21T10:40:18Z

doc/source/whatsnew/v0.21.0.txt

@@ -222,6 +222,7 @@ Sparse
 Reshaping
 ^^^^^^^^^
 - Joining/Merging with a non unique ``PeriodIndex`` raised a TypeError (:issue:`16871`)
+- Fixes regression when sorting by multiple columns on a ``datetime64`` dtype ``Series`` with ``NaT`` values (:issue:`16836`)


datetime64[ns] dtype

say bug in DataFrame.sort_values()

jreback · 2017-07-21T10:41:34Z

pandas/tests/frame/test_sorting.py

@@ -269,6 +269,11 @@ def test_sort_datetimes(self):
        df2 = df.sort_values(by=['B'])
        assert_frame_equal(df1, df2)

+        df1 = df.sort_values(by='B')


use

expected = result =

jreback · 2017-07-21T10:41:55Z

pandas/tests/frame/test_sorting.py

+        d4 = [Timestamp(x) for x in ['2014-01-01', '2015-01-01',
+                                     '2017-01-01', '2016-01-01']]
+        expected = pd.DataFrame({'a': d3, 'b': d4}, index=[1, 3, 0, 2])
+        sorted_df = df.sort_values(by=['a', 'b'], )


use result=

jreback · 2017-08-09T00:00:27Z

can you rebase and update

jreback · 2017-09-23T17:04:26Z

can you rebase / update

jreback · 2017-09-28T14:18:46Z

looks fine, can you rebase. ping on green.

jreback · 2017-09-29T10:43:18Z

thanks!

@jreback

…aT correctly closes pandas-dev#16836 Author: Jean-Mathieu Deschenes <[email protected]> This patch had conflicts when merged, resolved by Committer: Jeff Reback <[email protected]> Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits: 257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836

@jreback

…aT correctly closes pandas-dev#16836 Author: Jean-Mathieu Deschenes <[email protected]> This patch had conflicts when merged, resolved by Committer: Jeff Reback <[email protected]> Closes pandas-dev#16995 from jdeschenes/datetime_sort_issues and squashes the following commits: 257e10a [Jean-Mathieu Deschenes] Changes requested by @jreback c6d55e2 [Jean-Mathieu Deschenes] Fix for pandas-dev#16836

gfyoung added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels Jul 17, 2017

gfyoung reviewed Jul 17, 2017

View reviewed changes

gfyoung approved these changes Jul 17, 2017

View reviewed changes

gfyoung added this to the 0.21.0 milestone Jul 17, 2017

jreback changed the title ~~Fix for #16836~~ DataFrame sort_values and multiple "by" columns fails to order NaT correctly Jul 18, 2017

jreback requested changes Jul 18, 2017

View reviewed changes

Jean-Mathieu Deschenes added 2 commits July 18, 2017 14:38

Fix for pandas-dev#16836

c6d55e2

* Removed unnecessary conversion to i8 * Fixed failed test (`test_frame_column_inplace_sort_exception`) * Added check to ensure that the test is performing its intended goal(`test_sort_nan`)

Changes requested by @jreback

257e10a

jdeschenes force-pushed the datetime_sort_issues branch from 4dd7e3c to 257e10a Compare July 18, 2017 18:42

jreback reviewed Jul 18, 2017

View reviewed changes

jreback requested changes Jul 21, 2017

View reviewed changes

mficek mentioned this pull request Aug 1, 2017

BUG: NaT in Timestamp ignored by sort_values with na_position='last' #17138

Closed

jreback approved these changes Sep 28, 2017

View reviewed changes

jreback closed this in ad7d051 Sep 29, 2017

		@@ -269,6 +285,11 @@ def test_sort_datetimes(self):
		df2 = df.sort_values(by=['B'])
		assert_frame_equal(df1, df2)

Uh oh!

DataFrame sort_values and multiple "by" columns fails to order NaT correctly #16995

DataFrame sort_values and multiple "by" columns fails to order NaT correctly #16995

Uh oh!

Conversation

jdeschenes commented Jul 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung Jul 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 17, 2017

Codecov Report

Uh oh!

codecov bot commented Jul 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jreback Jul 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Aug 9, 2017

Uh oh!

jreback commented Sep 23, 2017

Uh oh!

jreback commented Sep 28, 2017

Uh oh!

jreback commented Sep 29, 2017

Uh oh!

Uh oh!

jdeschenes commented Jul 17, 2017 •

edited

Loading

gfyoung Jul 17, 2017 •

edited

Loading

codecov bot commented Jul 17, 2017 •

edited

Loading

jreback Jul 18, 2017 •

edited

Loading