Skip to content

Commit 7988029

Browse files
authored
DOC: Extended the documentation for DataFrame.sort_values() (#57678)
* DOC:extended the documentation for `pandas.DataFrame.sort_values`; further explain the single-column vs. multi-column sorting; added further explanation and simplification for customized sorting, e.g, using `natsort` package * shortened the added dostrings to 80 columns; fixed a typo * added another `shell` line to avoid `micromamba` test failure * fixed a typo in a `DataFrame.sort_values()` example * added a warning to raise awareness about a potential issue with `natsort` * simplified the examples * added a single example about `natsort`
1 parent 83112d7 commit 7988029

File tree

1 file changed

+33
-8
lines changed

1 file changed

+33
-8
lines changed

pandas/core/frame.py

+33-8
Original file line numberDiff line numberDiff line change
@@ -6824,7 +6824,9 @@ def sort_values(
68246824
4 D 7 2 e
68256825
5 C 4 3 F
68266826
6827-
Sort by col1
6827+
**Sort by a single column**
6828+
6829+
In this case, we are sorting the rows according to values in ``col1``:
68286830
68296831
>>> df.sort_values(by=["col1"])
68306832
col1 col2 col3 col4
@@ -6835,7 +6837,12 @@ def sort_values(
68356837
4 D 7 2 e
68366838
3 NaN 8 4 D
68376839
6838-
Sort by multiple columns
6840+
**Sort by multiple columns**
6841+
6842+
You can also provide multiple columns to ``by`` argument, as shown below.
6843+
In this example, the rows are first sorted according to ``col1``, and then
6844+
the rows that have an identical value in ``col1`` are sorted according
6845+
to ``col2``.
68396846
68406847
>>> df.sort_values(by=["col1", "col2"])
68416848
col1 col2 col3 col4
@@ -6846,7 +6853,9 @@ def sort_values(
68466853
4 D 7 2 e
68476854
3 NaN 8 4 D
68486855
6849-
Sort Descending
6856+
**Sort in a descending order**
6857+
6858+
The sort order can be reversed using ``ascending`` argument, as shown below:
68506859
68516860
>>> df.sort_values(by="col1", ascending=False)
68526861
col1 col2 col3 col4
@@ -6857,7 +6866,11 @@ def sort_values(
68576866
1 A 1 1 B
68586867
3 NaN 8 4 D
68596868
6860-
Putting NAs first
6869+
**Placing any** ``NA`` **first**
6870+
6871+
Note that in the above example, the rows that contain an ``NA`` value in their
6872+
``col1`` are placed at the end of the dataframe. This behavior can be modified
6873+
via ``na_position`` argument, as shown below:
68616874
68626875
>>> df.sort_values(by="col1", ascending=False, na_position="first")
68636876
col1 col2 col3 col4
@@ -6868,7 +6881,12 @@ def sort_values(
68686881
0 A 2 0 a
68696882
1 A 1 1 B
68706883
6871-
Sorting with a key function
6884+
**Customized sort order**
6885+
6886+
The ``key`` argument allows for a further customization of sorting behaviour.
6887+
For example, you may want
6888+
to ignore the `letter's case <https://en.wikipedia.org/wiki/Letter_case>`__
6889+
when sorting strings:
68726890
68736891
>>> df.sort_values(by="col4", key=lambda col: col.str.lower())
68746892
col1 col2 col3 col4
@@ -6879,8 +6897,12 @@ def sort_values(
68796897
4 D 7 2 e
68806898
5 C 4 3 F
68816899
6882-
Natural sort with the key argument,
6883-
using the `natsort <https://github.com/SethMMorton/natsort>` package.
6900+
Another typical example is
6901+
`natural sorting <https://en.wikipedia.org/wiki/Natural_sort_order>`__.
6902+
This can be done using
6903+
``natsort`` `package <https://github.com/SethMMorton/natsort>`__,
6904+
which provides sorted indices according
6905+
to their natural order, as shown below:
68846906
68856907
>>> df = pd.DataFrame(
68866908
... {
@@ -6896,8 +6918,11 @@ def sort_values(
68966918
3 48hr 40
68976919
4 96hr 50
68986920
>>> from natsort import index_natsorted
6921+
>>> index_natsorted(df["time"])
6922+
[0, 3, 2, 4, 1]
68996923
>>> df.sort_values(
6900-
... by="time", key=lambda x: np.argsort(index_natsorted(df["time"]))
6924+
... by="time",
6925+
... key=lambda x: np.argsort(index_natsorted(x)),
69016926
... )
69026927
time value
69036928
0 0hr 10

0 commit comments

Comments
 (0)