Skip to content

Commit fc25f7d

Browse files
MoisanTomAugspurger
authored andcommitted
DOC: Fix Series nsmallest and nlargest docstring/doctests (#22731)
1 parent 4e0b636 commit fc25f7d

File tree

2 files changed

+144
-45
lines changed

2 files changed

+144
-45
lines changed

ci/doctests.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ if [ "$DOCTEST" ]; then
2828
fi
2929

3030
pytest --doctest-modules -v pandas/core/series.py \
31-
-k"-nlargest -nonzero -nsmallest -reindex -searchsorted -to_dict"
31+
-k"-nonzero -reindex -searchsorted -to_dict"
3232

3333
if [ $? -ne "0" ]; then
3434
RET=1

pandas/core/series.py

+143-44
Original file line numberDiff line numberDiff line change
@@ -2741,17 +2741,20 @@ def nlargest(self, n=5, keep='first'):
27412741
27422742
Parameters
27432743
----------
2744-
n : int
2745-
Return this many descending sorted values
2746-
keep : {'first', 'last'}, default 'first'
2747-
Where there are duplicate values:
2748-
- ``first`` : take the first occurrence.
2749-
- ``last`` : take the last occurrence.
2744+
n : int, default 5
2745+
Return this many descending sorted values.
2746+
keep : {'first', 'last', 'all'}, default 'first'
2747+
When there are duplicate values that cannot all fit in a
2748+
Series of `n` elements:
2749+
- ``first`` : take the first occurrences based on the index order
2750+
- ``last`` : take the last occurrences based on the index order
2751+
- ``all`` : keep all occurrences. This can result in a Series of
2752+
size larger than `n`.
27502753
27512754
Returns
27522755
-------
2753-
top_n : Series
2754-
The n largest values in the Series, in sorted order
2756+
Series
2757+
The `n` largest values in the Series, sorted in decreasing order.
27552758
27562759
Notes
27572760
-----
@@ -2760,23 +2763,70 @@ def nlargest(self, n=5, keep='first'):
27602763
27612764
See Also
27622765
--------
2763-
Series.nsmallest
2766+
Series.nsmallest: Get the `n` smallest elements.
2767+
Series.sort_values: Sort Series by values.
2768+
Series.head: Return the first `n` rows.
27642769
27652770
Examples
27662771
--------
2767-
>>> s = pd.Series(np.random.randn(10**6))
2768-
>>> s.nlargest(10) # only sorts up to the N requested
2769-
219921 4.644710
2770-
82124 4.608745
2771-
421689 4.564644
2772-
425277 4.447014
2773-
718691 4.414137
2774-
43154 4.403520
2775-
283187 4.313922
2776-
595519 4.273635
2777-
503969 4.250236
2778-
121637 4.240952
2779-
dtype: float64
2772+
>>> countries_population = {"Italy": 59000000, "France": 65000000,
2773+
... "Malta": 434000, "Maldives": 434000,
2774+
... "Brunei": 434000, "Iceland": 337000,
2775+
... "Nauru": 11300, "Tuvalu": 11300,
2776+
... "Anguilla": 11300, "Monserat": 5200}
2777+
>>> s = pd.Series(countries_population)
2778+
>>> s
2779+
Italy 59000000
2780+
France 65000000
2781+
Malta 434000
2782+
Maldives 434000
2783+
Brunei 434000
2784+
Iceland 337000
2785+
Nauru 11300
2786+
Tuvalu 11300
2787+
Anguilla 11300
2788+
Monserat 5200
2789+
dtype: int64
2790+
2791+
The `n` largest elements where ``n=5`` by default.
2792+
2793+
>>> s.nlargest()
2794+
France 65000000
2795+
Italy 59000000
2796+
Malta 434000
2797+
Maldives 434000
2798+
Brunei 434000
2799+
dtype: int64
2800+
2801+
The `n` largest elements where ``n=3``. Default `keep` value is 'first'
2802+
so Malta will be kept.
2803+
2804+
>>> s.nlargest(3)
2805+
France 65000000
2806+
Italy 59000000
2807+
Malta 434000
2808+
dtype: int64
2809+
2810+
The `n` largest elements where ``n=3`` and keeping the last duplicates.
2811+
Brunei will be kept since it is the last with value 434000 based on
2812+
the index order.
2813+
2814+
>>> s.nlargest(3, keep='last')
2815+
France 65000000
2816+
Italy 59000000
2817+
Brunei 434000
2818+
dtype: int64
2819+
2820+
The `n` largest elements where ``n=3`` with all duplicates kept. Note
2821+
that the returned Series has five elements due to the three duplicates.
2822+
2823+
>>> s.nlargest(3, keep='all')
2824+
France 65000000
2825+
Italy 59000000
2826+
Malta 434000
2827+
Maldives 434000
2828+
Brunei 434000
2829+
dtype: int64
27802830
"""
27812831
return algorithms.SelectNSeries(self, n=n, keep=keep).nlargest()
27822832

@@ -2786,17 +2836,20 @@ def nsmallest(self, n=5, keep='first'):
27862836
27872837
Parameters
27882838
----------
2789-
n : int
2790-
Return this many ascending sorted values
2791-
keep : {'first', 'last'}, default 'first'
2792-
Where there are duplicate values:
2793-
- ``first`` : take the first occurrence.
2794-
- ``last`` : take the last occurrence.
2839+
n : int, default 5
2840+
Return this many ascending sorted values.
2841+
keep : {'first', 'last', 'all'}, default 'first'
2842+
When there are duplicate values that cannot all fit in a
2843+
Series of `n` elements:
2844+
- ``first`` : take the first occurrences based on the index order
2845+
- ``last`` : take the last occurrences based on the index order
2846+
- ``all`` : keep all occurrences. This can result in a Series of
2847+
size larger than `n`.
27952848
27962849
Returns
27972850
-------
2798-
bottom_n : Series
2799-
The n smallest values in the Series, in sorted order
2851+
Series
2852+
The `n` smallest values in the Series, sorted in increasing order.
28002853
28012854
Notes
28022855
-----
@@ -2805,23 +2858,69 @@ def nsmallest(self, n=5, keep='first'):
28052858
28062859
See Also
28072860
--------
2808-
Series.nlargest
2861+
Series.nlargest: Get the `n` largest elements.
2862+
Series.sort_values: Sort Series by values.
2863+
Series.head: Return the first `n` rows.
28092864
28102865
Examples
28112866
--------
2812-
>>> s = pd.Series(np.random.randn(10**6))
2813-
>>> s.nsmallest(10) # only sorts up to the N requested
2814-
288532 -4.954580
2815-
732345 -4.835960
2816-
64803 -4.812550
2817-
446457 -4.609998
2818-
501225 -4.483945
2819-
669476 -4.472935
2820-
973615 -4.401699
2821-
621279 -4.355126
2822-
773916 -4.347355
2823-
359919 -4.331927
2824-
dtype: float64
2867+
>>> countries_population = {"Italy": 59000000, "France": 65000000,
2868+
... "Brunei": 434000, "Malta": 434000,
2869+
... "Maldives": 434000, "Iceland": 337000,
2870+
... "Nauru": 11300, "Tuvalu": 11300,
2871+
... "Anguilla": 11300, "Monserat": 5200}
2872+
>>> s = pd.Series(countries_population)
2873+
>>> s
2874+
Italy 59000000
2875+
France 65000000
2876+
Brunei 434000
2877+
Malta 434000
2878+
Maldives 434000
2879+
Iceland 337000
2880+
Nauru 11300
2881+
Tuvalu 11300
2882+
Anguilla 11300
2883+
Monserat 5200
2884+
dtype: int64
2885+
2886+
The `n` largest elements where ``n=5`` by default.
2887+
2888+
>>> s.nsmallest()
2889+
Monserat 5200
2890+
Nauru 11300
2891+
Tuvalu 11300
2892+
Anguilla 11300
2893+
Iceland 337000
2894+
dtype: int64
2895+
2896+
The `n` smallest elements where ``n=3``. Default `keep` value is
2897+
'first' so Nauru and Tuvalu will be kept.
2898+
2899+
>>> s.nsmallest(3)
2900+
Monserat 5200
2901+
Nauru 11300
2902+
Tuvalu 11300
2903+
dtype: int64
2904+
2905+
The `n` smallest elements where ``n=3`` and keeping the last
2906+
duplicates. Anguilla and Tuvalu will be kept since they are the last
2907+
with value 11300 based on the index order.
2908+
2909+
>>> s.nsmallest(3, keep='last')
2910+
Monserat 5200
2911+
Anguilla 11300
2912+
Tuvalu 11300
2913+
dtype: int64
2914+
2915+
The `n` smallest elements where ``n=3`` with all duplicates kept. Note
2916+
that the returned Series has four elements due to the three duplicates.
2917+
2918+
>>> s.nsmallest(3, keep='all')
2919+
Monserat 5200
2920+
Nauru 11300
2921+
Tuvalu 11300
2922+
Anguilla 11300
2923+
dtype: int64
28252924
"""
28262925
return algorithms.SelectNSeries(self, n=n, keep=keep).nsmallest()
28272926

0 commit comments

Comments
 (0)