Skip to content

COMPAT: Argsort position matches NumPy #31210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,44 @@ source, you should no longer need to install Cython into your build environment
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_100.api_breaking.nat_sort:

Changed sort position for ``NaT``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:attr:`NaT` will now sort at the *end* rather than the beginning in sorting functions (:issue:`29884`).
This matches the behavior in NumPy 1.18 and newer, which makes the ``NaT`` behavior consistent
with other missing values like :attr:`numpy.nan`.

.. ipython:: python

values = pd.Index(['2001', 'NaT', '2000'], dtype='datetime64[ns]')

*pandas 0.25.x*

.. code-block:: python

>>> values.sort_values()
DatetimeIndex(['NaT', '2000-01-01', '2001-01-01'], dtype='datetime64[ns]', freq=None)

>>> values.argsort()
array([1, 2, 0])


*pandas 1.0.0*

.. ipython:: python

values.sort_values()
values.argsort()

This affects all sorting functions on indexes, Series, DataFrames, and arrays.


.. note::

This change was made between pandas 1.0.0rc0 and pandas 1.0.0.

.. _whatsnew_100.api_breaking.MultiIndex._names:

Avoid using names from ``MultiIndex.levels``
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -710,7 +710,7 @@ def _from_factorized(cls, values, original):
return cls(values, dtype=original.dtype)

def _values_for_argsort(self):
return self._data
return self._data.view("M8[ns]")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel moved this down to the arrays. Basic idea is to just sort as datetime64[ns], which should be correct for datetime, timedelta, and period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, thanks


# ------------------------------------------------------------------
# Additional array methods
Expand Down
9 changes: 5 additions & 4 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,10 +191,8 @@ def sort_values(self, return_indexer=False, ascending=True):
sorted_index = self.take(_as)
return sorted_index, _as
else:
# NB: using asi8 instead of _ndarray_values matters in numpy 1.18
# because the treatment of NaT has been changed to put NaT last
# instead of first.
sorted_values = np.sort(self.asi8)
values = self._data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use self._data.values_for_argsort?

sorted_values = np.sort(values.view("M8[ns]")).view("i8")

freq = self.freq
if freq is not None and not is_period_dtype(self):
Expand Down Expand Up @@ -224,6 +222,9 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None, **kwargs):
self, indices, axis, allow_fill, fill_value, **kwargs
)

def argsort(self, *args, **kwargs) -> np.ndarray:
return np.argsort(self._data, *args, **kwargs)

@Appender(_shared_docs["searchsorted"])
def searchsorted(self, value, side="left", sorter=None):
if isinstance(value, str):
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/arrays/test_datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -795,3 +795,11 @@ def test_to_numpy_extra(array):
assert result[0] == result[1]

tm.assert_equal(array, original)


@pytest.mark.parametrize("dtype", ["datetime64[ns]", "timedelta64[ns]", "Period[D]"])
def test_argsort(dtype):
a = pd.array([2001, pd.NaT, 2000], dtype=dtype)
result = a.argsort()
expected = np.array([2, 0, 1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im a bit confused by this. how is this prevented from varying based on the numpy version?

tm.assert_numpy_array_equal(result, expected)
9 changes: 6 additions & 3 deletions pandas/tests/indexes/datetimes/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def test_order_with_freq(self, idx):
),
(
[pd.NaT, "2011-01-03", "2011-01-05", "2011-01-02", pd.NaT],
[pd.NaT, pd.NaT, "2011-01-02", "2011-01-03", "2011-01-05"],
["2011-01-02", "2011-01-03", "2011-01-05", pd.NaT, pd.NaT],
),
],
)
Expand All @@ -269,14 +269,17 @@ def test_order_without_freq(self, index_dates, expected_dates, tz_naive_fixture)
ordered, indexer = index.sort_values(return_indexer=True)
tm.assert_index_equal(ordered, expected)

exp = np.array([0, 4, 3, 1, 2])
if index.isna().any():
exp = np.array([3, 1, 2, 0, 4])
else:
exp = np.array([0, 4, 3, 1, 2])
tm.assert_numpy_array_equal(indexer, exp, check_dtype=False)
assert ordered.freq is None

ordered, indexer = index.sort_values(return_indexer=True, ascending=False)
tm.assert_index_equal(ordered, expected[::-1])

exp = np.array([2, 1, 3, 4, 0])
exp = exp[::-1]
tm.assert_numpy_array_equal(indexer, exp, check_dtype=False)
assert ordered.freq is None

Expand Down