Skip to content

Commit 13d2d71

Browse files
committed
ENH/DEPR: add .sorted() method for API consistency, pandas-dev#9816, pandas-dev#8239
DEPR: remove of na_last from Series.order/Series.sort, xref pandas-dev#5231
1 parent 13cb1a7 commit 13d2d71

23 files changed

+794
-488
lines changed

doc/source/api.rst

+3-6
Original file line numberDiff line numberDiff line change
@@ -434,9 +434,8 @@ Reshaping, sorting
434434
:toctree: generated/
435435

436436
Series.argsort
437-
Series.order
438437
Series.reorder_levels
439-
Series.sort
438+
Series.sort_values
440439
Series.sort_index
441440
Series.sortlevel
442441
Series.swaplevel
@@ -908,7 +907,7 @@ Reshaping, sorting, transposing
908907

909908
DataFrame.pivot
910909
DataFrame.reorder_levels
911-
DataFrame.sort
910+
DataFrame.sort_values
912911
DataFrame.sort_index
913912
DataFrame.sortlevel
914913
DataFrame.nlargest
@@ -1293,7 +1292,6 @@ Modifying and Computations
12931292
Index.insert
12941293
Index.min
12951294
Index.max
1296-
Index.order
12971295
Index.reindex
12981296
Index.repeat
12991297
Index.take
@@ -1319,8 +1317,7 @@ Sorting
13191317
:toctree: generated/
13201318

13211319
Index.argsort
1322-
Index.order
1323-
Index.sort
1320+
Index.sort_values
13241321

13251322
Time-specific operations
13261323
~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/basics.rst

+31-14
Original file line numberDiff line numberDiff line change
@@ -1418,39 +1418,56 @@ description.
14181418

14191419
.. _basics.sorting:
14201420

1421-
Sorting by index and value
1422-
--------------------------
1421+
Sorting
1422+
-------
1423+
1424+
.. warning::
1425+
1426+
The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
1427+
In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).
14231428

14241429
There are two obvious kinds of sorting that you may be interested in: sorting
1425-
by label and sorting by actual values. The primary method for sorting axis
1426-
labels (indexes) across data structures is the :meth:`~DataFrame.sort_index` method.
1430+
by label and sorting by actual values.
1431+
1432+
By Index
1433+
~~~~~~~~
1434+
1435+
The primary method for sorting axis
1436+
labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods.
14271437

14281438
.. ipython:: python
14291439
14301440
unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
14311441
columns=['three', 'two', 'one'])
1442+
1443+
# DataFrame
14321444
unsorted_df.sort_index()
14331445
unsorted_df.sort_index(ascending=False)
14341446
unsorted_df.sort_index(axis=1)
14351447
1436-
:meth:`DataFrame.sort_index` can accept an optional ``by`` argument for ``axis=0``
1448+
# Series
1449+
unsorted_df['three'].sort_index()
1450+
1451+
By Values
1452+
~~~~~~~~~
1453+
1454+
The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row).
1455+
:meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0``
14371456
which will use an arbitrary vector or a column name of the DataFrame to
14381457
determine the sort order:
14391458

14401459
.. ipython:: python
14411460
14421461
df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
1443-
df1.sort_index(by='two')
1462+
df1.sort_values(by='two')
14441463
14451464
The ``by`` argument can take a list of column names, e.g.:
14461465

14471466
.. ipython:: python
14481467
14491468
df1[['one', 'two', 'three']].sort_index(by=['one','two'])
14501469
1451-
Series has the method :meth:`~Series.order` (analogous to `R's order function
1452-
<http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html>`__) which
1453-
sorts by value, with special treatment of NA values via the ``na_position``
1470+
These methods have special treatment of NA values via the ``na_position``
14541471
argument:
14551472

14561473
.. ipython:: python
@@ -1459,11 +1476,11 @@ argument:
14591476
s.order()
14601477
s.order(na_position='first')
14611478
1462-
.. note::
14631479
1464-
:meth:`Series.sort` sorts a Series by value in-place. This is to provide
1465-
compatibility with NumPy methods which expect the ``ndarray.sort``
1466-
behavior. :meth:`Series.order` returns a copy of the sorted data.
1480+
.. _basics.searchsorted:
1481+
1482+
searchsorted
1483+
~~~~~~~~~~~~
14671484

14681485
Series has the :meth:`~Series.searchsorted` method, which works similar to
14691486
:meth:`numpy.ndarray.searchsorted`.
@@ -1493,7 +1510,7 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.
14931510
14941511
s = pd.Series(np.random.permutation(10))
14951512
s
1496-
s.order()
1513+
s.sort_values()
14971514
s.nsmallest(3)
14981515
s.nlargest(3)
14991516

doc/source/whatsnew/v0.17.0.txt

+61-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ users upgrade to this version.
1414
Highlights include:
1515

1616
- Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
17+
- The sorting API has been revamped to remove some long-time inconsistencies, see :ref:`here <whatsnew_0170.api_breaking.sorting>`
1718
- The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
1819
previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
1920
- The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
@@ -187,6 +188,65 @@ Other enhancements
187188
Backwards incompatible API changes
188189
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189190

191+
.. _whatsnew_0170.api_breaking.sorting:
192+
193+
Changes to sorting API
194+
^^^^^^^^^^^^^^^^^^^^^^
195+
196+
The sorting API has had some longtime inconsistencies. (:issue:`9816`,:issue:`8239`).
197+
198+
Here is a summary of the **prior** to 0.17.0 API
199+
200+
- ``Series.sort`` is **INPLACE** while ``DataFrame.sort`` returns a new object.
201+
- ``Series.order`` returned a new object
202+
- It was possible to use ``Series/DataFrame.sort_index`` to sort by **values** by passing the ``by`` keyword.
203+
- ``Series/DataFrame.sortlevel`` worked only on a ``MultiIndex`` for sorting by index.
204+
205+
To address these issues, we have revamped the API:
206+
207+
- We have introduced a new method, :meth:`DataFrame.sort_values`, which is the merger of ``DataFrame.sort()``, ``Series.sort()``,
208+
and ``Series.order``, to handle sorting of **values**.
209+
- The existing method ``Series.sort()`` has been deprecated and will be removed in a
210+
future version of pandas.
211+
- The ``by`` argument of ``DataFrame.sort_index()`` has been deprecated and will be removed in a future version of pandas.
212+
- The methods ``DataFrame.sort()``, ``Series.order()``, will not be recommended to use and will carry a deprecation warning
213+
in the doc-string.
214+
- The existing method ``.sort_index()`` will gain the ``level`` keyword to enable level sorting.
215+
216+
We now have two distinct and non-overlapping methods of sorting. A ``*`` marks items that
217+
will show a ``FutureWarning``.
218+
219+
To sort by the **values**:
220+
221+
================================= ====================================
222+
Previous Replacement
223+
================================= ====================================
224+
\*``Series.order()`` ``Series.sort_values()``
225+
\*``Series.sort()`` ``Series.sort_values(inplace=True)``
226+
\*``DataFrame.sort(columns=...)`` ``DataFrame.sort_values(by=...)``
227+
================================= ====================================
228+
229+
To sort by the **index**:
230+
231+
================================= ====================================
232+
Previous Equivalent
233+
================================= ====================================
234+
``Series.sort_index()`` ``Series.sort_index()``
235+
``Series.sortlevel(level=...)`` ``Series.sort_index(level=...``)
236+
``DataFrame.sort_index()`` ``DataFrame.sort_index()``
237+
``DataFrame.sortlevel(level=...)`` ``DataFrame.sort_index(level=...)``
238+
\*``DataFrame.sort()`` ``DataFrame.sort_index()``
239+
================================== ====================================
240+
241+
We have also deprecated and changed similar methods in two Series-like classes, ``Index`` and ``Categorical``.
242+
243+
================================== ====================================
244+
Previous Replacement
245+
================================== ====================================
246+
\*``Index.order()`` ``Index.sort_values()``
247+
\*``Categorical.order()`` ``Categorical.sort_values``
248+
================================== ====================================
249+
190250
.. _whatsnew_0170.api_breaking.to_datetime:
191251

192252
Changes to to_datetime and to_timedelta
@@ -570,7 +630,7 @@ Removal of prior version deprecations/changes
570630
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
571631

572632
- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)
573-
633+
- Removal of ``na_last`` parameters from ``Series.order()`` and ``Series.sort()``, in favor of ``na_position``, xref (:issue:`5231`)
574634

575635
.. _whatsnew_0170.performance:
576636

pandas/core/algorithms.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -262,9 +262,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
262262
result.index = bins[:-1]
263263

264264
if sort:
265-
result.sort()
266-
if not ascending:
267-
result = result[::-1]
265+
result = result.sort_values(ascending=ascending)
268266

269267
if normalize:
270268
result = result / float(values.size)
@@ -497,7 +495,7 @@ def select_n_slow(dropped, n, take_last, method):
497495
reverse_it = take_last or method == 'nlargest'
498496
ascending = method == 'nsmallest'
499497
slc = np.s_[::-1] if reverse_it else np.s_[:]
500-
return dropped[slc].order(ascending=ascending).head(n)
498+
return dropped[slc].sort_values(ascending=ascending).head(n)
501499

502500

503501
_select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}

pandas/core/categorical.py

+37-6
Original file line numberDiff line numberDiff line change
@@ -1083,7 +1083,7 @@ def argsort(self, ascending=True, **kwargs):
10831083
result = result[::-1]
10841084
return result
10851085

1086-
def order(self, inplace=False, ascending=True, na_position='last'):
1086+
def sort_values(self, inplace=False, ascending=True, na_position='last'):
10871087
""" Sorts the Category by category value returning a new Categorical by default.
10881088
10891089
Only ordered Categoricals can be sorted!
@@ -1092,10 +1092,10 @@ def order(self, inplace=False, ascending=True, na_position='last'):
10921092
10931093
Parameters
10941094
----------
1095-
ascending : boolean, default True
1096-
Sort ascending. Passing False sorts descending
10971095
inplace : boolean, default False
10981096
Do operation in place.
1097+
ascending : boolean, default True
1098+
Sort ascending. Passing False sorts descending
10991099
na_position : {'first', 'last'} (optional, default='last')
11001100
'first' puts NaNs at the beginning
11011101
'last' puts NaNs at the end
@@ -1139,6 +1139,37 @@ def order(self, inplace=False, ascending=True, na_position='last'):
11391139
return Categorical(values=codes,categories=self.categories, ordered=self.ordered,
11401140
fastpath=True)
11411141

1142+
def order(self, inplace=False, ascending=True, na_position='last'):
1143+
"""
1144+
DEPRECATED: use :meth:`Categorical.sort_values`
1145+
1146+
Sorts the Category by category value returning a new Categorical by default.
1147+
1148+
Only ordered Categoricals can be sorted!
1149+
1150+
Categorical.sort is the equivalent but sorts the Categorical inplace.
1151+
1152+
Parameters
1153+
----------
1154+
inplace : boolean, default False
1155+
Do operation in place.
1156+
ascending : boolean, default True
1157+
Sort ascending. Passing False sorts descending
1158+
na_position : {'first', 'last'} (optional, default='last')
1159+
'first' puts NaNs at the beginning
1160+
'last' puts NaNs at the end
1161+
1162+
Returns
1163+
-------
1164+
y : Category or None
1165+
1166+
See Also
1167+
--------
1168+
Category.sort
1169+
"""
1170+
warn("order is deprecated, use sort_values(...)",
1171+
FutureWarning, stacklevel=2)
1172+
return self.sort_values(inplace=inplace, ascending=ascending, na_position=na_position)
11421173

11431174
def sort(self, inplace=True, ascending=True, na_position='last'):
11441175
""" Sorts the Category inplace by category value.
@@ -1163,10 +1194,10 @@ def sort(self, inplace=True, ascending=True, na_position='last'):
11631194
11641195
See Also
11651196
--------
1166-
Category.order
1197+
Category.sort_values
11671198
"""
1168-
return self.order(inplace=inplace, ascending=ascending,
1169-
na_position=na_position)
1199+
return self.sort_values(inplace=inplace, ascending=ascending,
1200+
na_position=na_position)
11701201

11711202
def ravel(self, order='C'):
11721203
""" Return a flattened (numpy) array.

pandas/core/common.py

+3
Original file line numberDiff line numberDiff line change
@@ -2155,6 +2155,9 @@ def _mut_exclusive(**kwargs):
21552155
return val2
21562156

21572157

2158+
def _not_none(*args):
2159+
return (arg for arg in args if arg is not None)
2160+
21582161
def _any_none(*args):
21592162
for arg in args:
21602163
if arg is None:

0 commit comments

Comments
 (0)