Skip to content

DOC: fix DataFrame.sample doctests and reformat the docstring #22937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 8, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/doctests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ if [ "$DOCTEST" ]; then
fi

pytest --doctest-modules -v pandas/core/generic.py \
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -sample -to_json -transpose -values -xs"
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -to_json -transpose -values -xs"

if [ $? -ne "0" ]; then
RET=1
Expand Down
97 changes: 48 additions & 49 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4326,8 +4326,8 @@ def sample(self, n=None, frac=None, replace=False, weights=None,
Default = 1 if `frac` = None.
frac : float, optional
Fraction of axis items to return. Cannot be used with `n`.
replace : boolean, optional
Sample with or without replacement. Default = False.
replace : bool, default False
Sample with or without replacement.
weights : str or ndarray-like, optional
Default 'None' results in equal probability weighting.
If passed a Series, will align with target object on index. Index
Expand All @@ -4340,7 +4340,7 @@ def sample(self, n=None, frac=None, replace=False, weights=None,
being sampled.
If weights do not sum to 1, they will be normalized to sum to 1.
Missing values in the weights column will be treated as zero.
inf and -inf values not allowed.
`inf` and `-inf` values not allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is code literals, then it should be double-backticks and quoted

``'inf'``

which is a bit noise. How about "infinite values are not allowed.?

random_state : int or numpy.random.RandomState, optional
Seed for the random number generator (if int), or numpy RandomState
object.
Expand All @@ -4350,58 +4350,57 @@ def sample(self, n=None, frac=None, replace=False, weights=None,

Returns
-------
A new object of same type as caller.
Series or DataFrame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colon at the end is not needed in this case.

A new object of same type as caller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to briefly describe what is returned, not only the type.


See Also
--------
numpy.choice: Generates a random sample from a given 1-D numpy array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.random.choice?


Examples
--------
Generate an example ``Series`` and ``DataFrame``:

>>> s = pd.Series(np.random.randn(50))
>>> s.head()
0 -0.038497
1 1.820773
2 -0.972766
3 -1.598270
4 -1.095526
dtype: float64
>>> df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
>>> df = pd.DataFrame({'A': range(0, 500, 10),
... 'B': range(0, 100, 2),
... 'C': range(0, 250, 5)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need so much data, and being arbitrary. Something like the animal datasets we've been using with 6 or 8 rows should be better to illustrate the resample method.

>>> df.head()
A B C D
0 0.016443 -2.318952 -0.566372 -1.028078
1 -1.051921 0.438836 0.658280 -0.175797
2 -1.243569 -0.364626 -0.215065 0.057736
3 1.768216 0.404512 -0.385604 -1.457834
4 1.072446 -1.137172 0.314194 -0.046661

Next extract a random sample from both of these objects...

3 random elements from the ``Series``:

>>> s.sample(n=3)
27 -0.994689
55 -1.049016
67 -0.224565
dtype: float64

And a random 10% of the ``DataFrame`` with replacement:

>>> df.sample(frac=0.1, replace=True)
A B C D
35 1.981780 0.142106 1.817165 -0.290805
49 -1.336199 -0.448634 -0.789640 0.217116
40 0.823173 -0.078816 1.009536 1.015108
15 1.421154 -0.055301 -1.922594 -0.019696
6 -0.148339 0.832938 1.787600 -1.383767

You can use `random state` for reproducibility:

>>> df.sample(random_state=1)
A B C D
37 -2.027662 0.103611 0.237496 -0.165867
43 -0.259323 -0.583426 1.516140 -0.479118
12 -1.686325 -0.579510 0.985195 -0.460286
8 1.167946 0.429082 1.215742 -1.636041
9 1.197475 -0.864188 1.554031 -1.505264
A B C
0 0 0 0
1 10 2 5
2 20 4 10
3 30 6 15
4 40 8 20

Next extract a random sample from both of these objects. Note that
we use `random_state` to ensure the reproducibility of the examples.

3 random elements from the ``Series`` ``df['A']``:

>>> df['A'].sample(n=3, random_state=1)
27 270
35 350
40 400
Name: A, dtype: int64

A random 10% sample of the ``DataFrame`` with replacement:

>>> df.sample(frac=0.1, replace=True, random_state=1)
A B C
37 370 74 185
43 430 86 215
12 120 24 60
8 80 16 40
9 90 18 45

Using a DataFrame column as weights. Column 'A' is increasing so last
elements of the DataFrame are more likely to be sampled.

>>> df.sample(n=3, weights='A', random_state=1)
A B C
32 320 64 160
42 420 84 210
1 10 2 5
"""

if axis is None:
Expand Down