NumPy 1.18 behavior for sorting NaT #29884

TomAugspurger · 2019-11-27T14:28:59Z

Followup to #29877. cc @jbrockmendel if you could fill out the details.

Adopt NaT at the end for Index? (we already do for Series)
Report timedelta NaT-sorting issue to NumPy?

jorisvandenbossche · 2019-12-10T09:47:09Z

I reported the timedelta sorting to numpy, see numpy/numpy#15063, I think the plan is to fix this as well for numpy 1.8

I would propose we follow this change for Index.

jbrockmendel · 2019-12-26T02:10:58Z

xref #30460.

TomAugspurger · 2019-12-30T17:37:02Z

So the remaining item here is sorting NaT at the end for DatetimeIndex and TimedeltaIndex? So this would change from

In [4]: pd.DatetimeIndex(['2000', None, '2001']).sort_values()
Out[4]: DatetimeIndex(['NaT', '2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None)

to

DatetimeIndex(['2000-01-01', '2001-01-01', 'NaT'], dtype='datetime64[ns]', freq=None)

Is this a blocker for 1.0? If so, do we need a deprecation cycle?

jbrockmendel · 2020-01-02T22:48:23Z

Need to double check if this affects searchsorted. IIRC we use m8 values in some places and i8 in others.

…

On Mon, Dec 30, 2019 at 9:37 AM Tom Augspurger ***@***.***> wrote: So the remaining item here is sorting NaT at the end for DatetimeIndex and TimedeltaIndex? So this would change from In [4]: pd.DatetimeIndex(['2000', None, '2001']).sort_values() Out[4]: DatetimeIndex(['NaT', '2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None) to DatetimeIndex(['2000-01-01', '2001-01-01', 'NaT'], dtype='datetime64[ns]', freq=None) Is this a blocker for 1.0? If so, do we need a deprecation cycle? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29884?email_source=notifications&email_token=AB5UM6ERPUUC2TR3CSLDOWTQ3IWUBA5CNFSM4JSHTYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH2ZRXI#issuecomment-569743581>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5UM6BE7WMDECR4QAM2JDDQ3IWUBANCNFSM4JSHTYSQ> .

jorisvandenbossche · 2020-01-06T09:03:27Z

Since other index types sort the NaNs last, I would maybe consider this a bug and just change it (that's also what numpy did, IIUC). Eg:

In [22]: pd.Index([1, np.nan, 2]).sort_values() 
Out[22]: Float64Index([1.0, 2.0, nan], dtype='float64')

If we want to do a deprecation cycle, we probably need to add a na_position keyword to Index.sort_values? (Series.sort_values has that, but Index not)

jbrockmendel · 2020-01-08T20:53:15Z

If we want to do a deprecation cycle, we probably need to add a na_position keyword to Index.sort_values?

if we're calling it a bugfix we can probably do without a deprecation cycle

Do we have consensus that long-term we want NaT to be sorted last for each of DTA/TDA/PA/DTI/TDI/PI? If so, we should be able to update the DTA/TDA/PA versions without a deprecation cycle.

TomAugspurger · 2020-01-08T21:42:30Z

Following NumPy seems fine to me.

jorisvandenbossche · 2020-01-09T20:23:25Z

I think this is a blocker for the RC ...

jorisvandenbossche · 2020-01-09T21:16:50Z

I quickly looked at this. Fixing it for the latest numpy is easy (as we then just need to sort with the M8 values instead of i8 values), but we probably want to have it consistent for all numpy versions? (ensuring that might involve a bit more trickery)

The change to enable it for latest numpy would be something like:

--- a/pandas/core/indexes/datetimelike.py
+++ b/pandas/core/indexes/datetimelike.py
@@ -191,10 +191,7 @@ class DatetimeIndexOpsMixin(ExtensionIndex):
             sorted_index = self.take(_as)
             return sorted_index, _as
         else:
-            # NB: using asi8 instead of _ndarray_values matters in numpy 1.18
-            #  because the treatment of NaT has been changed to put NaT last
-            #  instead of first.
-            sorted_values = np.sort(self.asi8)
+            sorted_values = np.sort(self.values)
             attribs = self._get_attributes_dict()
             freq = attribs["freq"]

TomAugspurger · 2020-01-09T21:21:41Z

I'm OK with this missing the RC (or we can include it in a second RC if we're doing that).

Otherwise, we'll likely need a keyword to control the na position, which can be done anytime.

jorisvandenbossche · 2020-01-17T09:06:33Z

I am thinking we should maybe just change it, still for 1.0.0. Not ideal that it was not in the RC, but it's a clear inconsistency within pandas, and it would also be good to be consistent with latest numpy.

jorisvandenbossche · 2020-01-20T14:58:18Z

@TomAugspurger @jbrockmendel thoughts on this?

TomAugspurger · 2020-01-20T15:03:17Z

Matching numpy seems OK

…

On Jan 20, 2020, at 08:58, Joris Van den Bossche ***@***.***> wrote: @TomAugspurger @jbrockmendel thoughts on this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jorisvandenbossche · 2020-01-22T14:21:16Z

@jbrockmendel do you have time to look at this?

TomAugspurger · 2020-01-22T14:24:55Z

I'm looking into it today.

Closes pandas-dev#29884

TomAugspurger · 2020-01-24T15:53:30Z

Just FYI, I started #31210, but am unsure how to proceed. The fact that DatetimeIndex.sort_values().asi8 will no longer be sorted makes this pretty complex to get right.

TomAugspurger · 2020-01-27T14:58:53Z

Pushing this from 1.0 unfortunately. We'll need to re-evaluate how to proceed, but I think an option to control the na_position in sort_values makes the most sense.

jbrockmendel added the Compat pandas objects compatability with Numpy or Python functions label Nov 28, 2019

jorisvandenbossche added this to the 1.0 milestone Nov 29, 2019

TomAugspurger added Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Dec 30, 2019

TomAugspurger mentioned this issue Jan 9, 2020

RLS: 1.0.0 #27492

Closed

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 22, 2020

COMPAT: Argsort position matches NumPy

a9921ba

Closes pandas-dev#29884

TomAugspurger mentioned this issue Jan 22, 2020

COMPAT: Argsort position matches NumPy #31210

Closed

TomAugspurger modified the milestones: 1.0.0, Contributions Welcome Jan 27, 2020

mroeschke added the Bug label Apr 2, 2020

jbrockmendel mentioned this issue Sep 6, 2020

COMPAT: match numpy behavior for searchsorted on dt64/td64 #36176

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Sep 7, 2020

jreback closed this as completed in #36176 Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NumPy 1.18 behavior for sorting NaT #29884

NumPy 1.18 behavior for sorting NaT #29884

TomAugspurger commented Nov 27, 2019

jorisvandenbossche commented Dec 10, 2019

jbrockmendel commented Dec 26, 2019

TomAugspurger commented Dec 30, 2019

jbrockmendel commented Jan 2, 2020 via email

jorisvandenbossche commented Jan 6, 2020

jbrockmendel commented Jan 8, 2020

TomAugspurger commented Jan 8, 2020

jorisvandenbossche commented Jan 9, 2020

jorisvandenbossche commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 20, 2020

TomAugspurger commented Jan 20, 2020 via email

jorisvandenbossche commented Jan 22, 2020

TomAugspurger commented Jan 22, 2020

TomAugspurger commented Jan 24, 2020

TomAugspurger commented Jan 27, 2020

NumPy 1.18 behavior for sorting NaT #29884

NumPy 1.18 behavior for sorting NaT #29884

Comments

TomAugspurger commented Nov 27, 2019

jorisvandenbossche commented Dec 10, 2019

jbrockmendel commented Dec 26, 2019

TomAugspurger commented Dec 30, 2019

jbrockmendel commented Jan 2, 2020 via email

jorisvandenbossche commented Jan 6, 2020

jbrockmendel commented Jan 8, 2020

TomAugspurger commented Jan 8, 2020

jorisvandenbossche commented Jan 9, 2020

jorisvandenbossche commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

jorisvandenbossche commented Jan 17, 2020

jorisvandenbossche commented Jan 20, 2020

TomAugspurger commented Jan 20, 2020 via email

jorisvandenbossche commented Jan 22, 2020

TomAugspurger commented Jan 22, 2020

TomAugspurger commented Jan 24, 2020

TomAugspurger commented Jan 27, 2020