You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/indexing.rst
+75
Original file line number
Diff line number
Diff line change
@@ -508,6 +508,81 @@ A list of indexers where any element is out of bounds will raise an
508
508
509
509
.. _indexing.basics.partial_setting:
510
510
511
+
Selecting Random Samples
512
+
------------------------
513
+
.. versionadded::0.16.1
514
+
515
+
A random selection of rows or columns from a Series, DataFrame, or Panel with the :meth:`~DataFrame.sample` method. The method will sample rows by default, and accepts a specific number of rows/columns to return, or a fraction of rows.
516
+
517
+
.. ipython :: python
518
+
519
+
s = Series([0,1,2,3,4,5])
520
+
521
+
# When no arguments are passed, returns 1 row.
522
+
s.sample()
523
+
524
+
# One may specify either a number of rows:
525
+
s.sample(n=3)
526
+
527
+
# Or a fraction of the rows:
528
+
s.sample(frac=0.5)
529
+
530
+
By default, ``sample`` will return each row at most once, but one can also sample with replacement
531
+
using the ``replace`` option:
532
+
533
+
.. ipython :: python
534
+
535
+
s = Series([0,1,2,3,4,5])
536
+
537
+
# Without replacement (default):
538
+
s.sample(n=6, replace=False)
539
+
540
+
# With replacement:
541
+
s.sample(n=6, replace=True)
542
+
543
+
544
+
By default, each row has an equal probability of being selected, but if you want rows
545
+
to have different probabilities, you can pass the ``sample`` function sampling weights as
546
+
``weights``. These weights can be a list, a numpy array, or a Series, but they must be of the same length as the object you are sampling. Missing values will be treated as a weight of zero, and inf values are not allowed. If weights do not sum to 1, they will be re-normalized by dividing all weights by the sum of the weights. For example:
547
+
548
+
.. ipython :: python
549
+
550
+
s = Series([0,1,2,3,4,5])
551
+
example_weights = [0, 0, 0.2, 0.2, 0.2, 0.4]
552
+
s.sample(n=3, weights=example_weights)
553
+
554
+
# Weights will be re-normalized automatically
555
+
example_weights2 = [0.5, 0, 0, 0, 0, 0]
556
+
s.sample(n=1, weights=example_weights2)
557
+
558
+
When applied to a DataFrame, you can use a column of the DataFrame as sampling weights
559
+
(provided you are sampling rows and not columns) by simply passing the name of the column
``sample`` also allows users to sample columns instead of rows using the ``axis`` argument.
568
+
569
+
.. ipython :: python
570
+
571
+
df3 = DataFrame({'col1':[1,2,3], 'col2':[2,3,4]})
572
+
df3.sample(n=1, axis=1)
573
+
574
+
Finally, one can also set a seed for ``sample``'s random number generator using the ``random_state`` argument, which will accept either an integer (as a seed) or a numpy RandomState object.
575
+
576
+
.. ipython :: python
577
+
578
+
df4 = DataFrame({'col1':[1,2,3], 'col2':[2,3,4]})
579
+
580
+
# With a given seed, the sample will always draw the same rows.
0 commit comments