-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
NumPy 1.18 behavior for sorting NaT #29884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I reported the timedelta sorting to numpy, see numpy/numpy#15063, I think the plan is to fix this as well for numpy 1.8 I would propose we follow this change for Index. |
xref #30460. |
So the remaining item here is sorting NaT at the end for DatetimeIndex and TimedeltaIndex? So this would change from In [4]: pd.DatetimeIndex(['2000', None, '2001']).sort_values()
Out[4]: DatetimeIndex(['NaT', '2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None) to DatetimeIndex(['2000-01-01', '2001-01-01', 'NaT'], dtype='datetime64[ns]', freq=None) Is this a blocker for 1.0? If so, do we need a deprecation cycle? |
Need to double check if this affects searchsorted. IIRC we use m8 values
in some places and i8 in others.
…On Mon, Dec 30, 2019 at 9:37 AM Tom Augspurger ***@***.***> wrote:
So the remaining item here is sorting NaT at the end for DatetimeIndex and
TimedeltaIndex? So this would change from
In [4]: pd.DatetimeIndex(['2000', None, '2001']).sort_values()
Out[4]: DatetimeIndex(['NaT', '2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None)
to
DatetimeIndex(['2000-01-01', '2001-01-01', 'NaT'], dtype='datetime64[ns]', freq=None)
Is this a blocker for 1.0? If so, do we need a deprecation cycle?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29884?email_source=notifications&email_token=AB5UM6ERPUUC2TR3CSLDOWTQ3IWUBA5CNFSM4JSHTYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH2ZRXI#issuecomment-569743581>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5UM6BE7WMDECR4QAM2JDDQ3IWUBANCNFSM4JSHTYSQ>
.
|
Since other index types sort the NaNs last, I would maybe consider this a bug and just change it (that's also what numpy did, IIUC). Eg:
If we want to do a deprecation cycle, we probably need to add a |
if we're calling it a bugfix we can probably do without a deprecation cycle Do we have consensus that long-term we want NaT to be sorted last for each of DTA/TDA/PA/DTI/TDI/PI? If so, we should be able to update the DTA/TDA/PA versions without a deprecation cycle. |
Following NumPy seems fine to me. |
I think this is a blocker for the RC ... |
I quickly looked at this. Fixing it for the latest numpy is easy (as we then just need to sort with the M8 values instead of i8 values), but we probably want to have it consistent for all numpy versions? (ensuring that might involve a bit more trickery) The change to enable it for latest numpy would be something like: --- a/pandas/core/indexes/datetimelike.py
+++ b/pandas/core/indexes/datetimelike.py
@@ -191,10 +191,7 @@ class DatetimeIndexOpsMixin(ExtensionIndex):
sorted_index = self.take(_as)
return sorted_index, _as
else:
- # NB: using asi8 instead of _ndarray_values matters in numpy 1.18
- # because the treatment of NaT has been changed to put NaT last
- # instead of first.
- sorted_values = np.sort(self.asi8)
+ sorted_values = np.sort(self.values)
attribs = self._get_attributes_dict()
freq = attribs["freq"] |
I'm OK with this missing the RC (or we can include it in a second RC if we're doing that). Otherwise, we'll likely need a keyword to control the na position, which can be done anytime. |
I am thinking we should maybe just change it, still for 1.0.0. Not ideal that it was not in the RC, but it's a clear inconsistency within pandas, and it would also be good to be consistent with latest numpy. |
@TomAugspurger @jbrockmendel thoughts on this? |
Matching numpy seems OK
… On Jan 20, 2020, at 08:58, Joris Van den Bossche ***@***.***> wrote:
@TomAugspurger @jbrockmendel thoughts on this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@jbrockmendel do you have time to look at this? |
I'm looking into it today. |
Just FYI, I started #31210, but am unsure how to proceed. The fact that |
Pushing this from 1.0 unfortunately. We'll need to re-evaluate how to proceed, but I think an option to control the |
Followup to #29877. cc @jbrockmendel if you could fill out the details.
The text was updated successfully, but these errors were encountered: