@@ -7,9 +7,9 @@ pandas docstring guide
7
7
About docstrings and standards
8
8
------------------------------
9
9
10
- A Python docstring is a string used to document a Python function or method ,
11
- so programmers can understand what it does without having to read the details
12
- of the implementation.
10
+ A Python docstring is a string used to document a Python module, class ,
11
+ function or method, so programmers can understand what it does without having
12
+ to read the details of the implementation.
13
13
14
14
Also, it is a common practice to generate online (html) documentation
15
15
automatically from docstrings. `Sphinx <http://www.sphinx-doc.org >`_ serves
@@ -95,19 +95,29 @@ left before or after the docstring. The text starts in the next line after the
95
95
opening quotes. The closing quotes have their own line
96
96
(meaning that they are not at the end of the last sentence).
97
97
98
+ In rare occasions reST styles like bold text or itallics will be used in
99
+ docstrings, but is it common to have inline code, which is presented between
100
+ backticks. It is considered inline code:
101
+
102
+ - The name of a parameter
103
+ - Python code, a module, function, built-in, type, literal... (e.g. `os `, `list `, `numpy.abs `, `datetime.date `, `True `)
104
+ - A pandas class (in the form ``:class:`~pandas.Series` ``)
105
+ - A pandas method (in the form ``:meth:`pandas.Series.sum` ``)
106
+ - A pandas function (in the form ``:func:`pandas.to_datetime` ``)
107
+
98
108
**Good: **
99
109
100
110
.. code-block :: python
101
111
102
- def func ( ):
112
+ def add_values ( arr ):
103
113
"""
104
- Some function .
114
+ Add the values in `arr` .
105
115
106
- With a good docstring.
116
+ This is equivalent to Python `sum` of :meth:`pandas.Series.sum`.
117
+
118
+ Some sections are omitted here for simplicity.
107
119
"""
108
- foo = 1
109
- bar = 2
110
- return foo + bar
120
+ return sum (arr)
111
121
112
122
**Bad: **
113
123
@@ -121,7 +131,7 @@ opening quotes. The closing quotes have their own line
121
131
122
132
It has a blank like after the signature `def func():`.
123
133
124
- The text 'Some function' should go in the next line then the
134
+ The text 'Some function' should go in the line after the
125
135
opening quotes of the docstring, not in the same line.
126
136
127
137
There is a blank line between the docstring and the first line
@@ -141,9 +151,10 @@ Section 1: Short summary
141
151
The short summary is a single sentence that express what the function does in a
142
152
concise way.
143
153
144
- The short summary must start with a verb infinitive, end with a dot, and fit in
145
- a single line. It needs to express what the function does without providing
146
- details.
154
+ The short summary must start with a capital letter, end with a dot, and fit in
155
+ a single line. It needs to express what the object does without providing
156
+ details. For functions and methods, the short summary must start with an
157
+ infinitive verb.
147
158
148
159
**Good: **
149
160
@@ -247,20 +258,20 @@ required to have a line with the parameter description, which is indented, and
247
258
can have multiple lines. The description must start with a capital letter, and
248
259
finish with a dot.
249
260
250
- Keyword arguments with a default value, the default will be listed in brackets
251
- at the end of the type. The exact form of the type in this case would be for
252
- example "int ( default is 0) ". In some cases it may be useful to explain what
253
- the default argument means, which can be added after a comma "int ( default is
254
- -1, which means all cpus) ".
261
+ For keyword arguments with a default value, the default will be listed after a
262
+ comma at the end of the type. The exact form of the type in this case will be
263
+ "int, default 0 ". In some cases it may be useful to explain what the default
264
+ argument means, which can be added after a comma "int, default -1, meaning all
265
+ cpus".
255
266
256
267
In cases where the default value is `None `, meaning that the value will not be
257
- used, instead of "str ( default is None)" it is preferred to use "str, optional".
258
- When `None ` is a value being used, we will keep the form "str ( default None) .
259
- For example consider ` .fillna(value =None) `, in which `None ` is the value being
260
- used to replace missing values. This is different from
261
- ` .to_csv(compression=None) `, where ` None ` is not a value being used, but means
262
- that compression is optional, and will not be used, unless a compression type
263
- is provided. In this case we will use ` str, optional ` .
268
+ used. Instead of "str, default None" is preferred "str, optional".
269
+ When `None ` is a value being used, we will keep the form "str, default None" .
270
+ For example, in ` df.to_csv(compression =None) `, `None ` is not a value being used,
271
+ but means that compression is optional, and no compression is being used if not
272
+ provided. In this case we will use ` str, optional `. Only in cases like
273
+ ` func(value=None) ` and ` None ` is being used in the same way as ` 0 ` or ` foo `
274
+ would be used, then we will specify " str, int or None, default None" .
264
275
265
276
**Good: **
266
277
@@ -278,7 +289,7 @@ is provided. In this case we will use `str, optional`.
278
289
----------
279
290
kind : str
280
291
Kind of matplotlib plot.
281
- color : str ( default 'blue')
292
+ color : str, default 'blue'
282
293
Color name or rgb code.
283
294
**kwargs
284
295
These parameters will be passed to the matplotlib plotting
@@ -470,9 +481,9 @@ If the method yields its value:
470
481
Section 5: See Also
471
482
~~~~~~~~~~~~~~~~~~~
472
483
473
- This is an optional section, used to let users know about pandas functionality
474
- related to the one being documented. While optional, this section should exist
475
- in most cases, unless no related methods or functions can be found at all.
484
+ This section is used to let users know about pandas functionality
485
+ related to the one being documented. In rare cases, if no related methods
486
+ or functions can be found at all, this section can be skipped .
476
487
477
488
An obvious example would be the `head() ` and `tail() ` methods. As `tail() ` does
478
489
the equivalent as `head() ` but at the end of the `Series ` or `DataFrame `
@@ -586,13 +597,6 @@ The way to present examples is as follows:
586
597
4. Add examples with explanations that illustrate how the parameters can be
587
598
used for extended functionality
588
599
589
- .. note ::
590
- Which data should be used in examples is a topic still under discussion.
591
- We'll likely be importing a standard dataset from `pandas.io.samples `, but
592
- this still needs confirmation. You can work with the data from this pull
593
- request: https://github.com/pandas-dev/pandas/pull/19933/files but
594
- consider this could still change.
595
-
596
600
A simple example could be:
597
601
598
602
.. code-block :: python
@@ -640,6 +644,10 @@ A simple example could be:
640
644
"""
641
645
return self .iloc[:n]
642
646
647
+ The examples should be as concise as possible. In cases where the complexity of
648
+ the function requires long examples, is recommended to use blocks with headers
649
+ in bold. Use double star \*\* to make a text bold, like in \*\* this example\*\* .
650
+
643
651
.. _docstring.example_conventions :
644
652
645
653
Conventions for the examples
@@ -661,11 +669,11 @@ the standard library go first, followed by third-party libraries (like
661
669
matplotlib).
662
670
663
671
When illustrating examples with a single `Series ` use the name `s `, and if
664
- illustrating with a single `DataFrame ` use the name `df `. If a set of
665
- homogeneous `Series ` or `DataFrame ` is used, name them ` s1 `, ` s2 `, ` s3 `...
666
- or `df1 `, `df2 `, `df3 `... If the data is not homogeneous, and more than
667
- one structure is needed, name them with something meaningful, for example
668
- `df_main ` and `df_to_join `.
672
+ illustrating with a single `DataFrame ` use the name `df `. For indices, ` idx `
673
+ is the preferred name. If a set of homogeneous `Series ` or `DataFrame ` is used,
674
+ name them ` s1 `, ` s2 `, ` s3 `... or `df1 `, `df2 `, `df3 `... If the data is not
675
+ homogeneous, and more than one structure is needed, name them with something
676
+ meaningful, for example `df_main ` and `df_to_join `.
669
677
670
678
Data used in the example should be as compact as possible. The number of rows
671
679
is recommended to be around 4, but make it a number that makes sense for the
@@ -708,12 +716,12 @@ positional arguments `head(3)`.
708
716
Examples
709
717
--------
710
718
>>> s = pd.Series([1, np.nan, 3])
711
- >>> s.fillna(9 )
712
- [1, 9 , 3]
719
+ >>> s.fillna(0 )
720
+ [1, 0 , 3]
713
721
"""
714
722
pass
715
723
716
- def groupby_mean (df ):
724
+ def groupby_mean (self ):
717
725
"""
718
726
Group by index and return mean.
719
727
@@ -723,28 +731,113 @@ positional arguments `head(3)`.
723
731
... name='max_speed',
724
732
... index=['falcon', 'falcon', 'parrot', 'parrot'])
725
733
>>> s.groupby_mean()
726
- falcon 375.
727
- parrot 25.
734
+ index
735
+ falcon 375.0
736
+ parrot 25.0
737
+ Name: max_speed, dtype: float64
728
738
"""
729
739
pass
730
740
741
+ def contains (self , pattern , case_sensitive = True , na = numpy.nan):
742
+ """
743
+ Return whether each value contains `pattern`.
744
+
745
+ In this case, we are illustrating how to use sections, even
746
+ if the example is simple enough and does not require them.
747
+
748
+ Examples
749
+ --------
750
+ >>> s = pd.Series('Antelope', 'Lion', 'Zebra', numpy.nan)
751
+ >>> s.contains(pattern='a')
752
+ 0 False
753
+ 1 False
754
+ 2 True
755
+ 3 NaN
756
+ dtype: bool
757
+
758
+ **Case sensitivity**
759
+
760
+ With `case_sensitive` set to `False` we can match `a` with both
761
+ `a` and `A`:
762
+
763
+ >>> s.contains(pattern='a', case_sensitive=False)
764
+ 0 True
765
+ 1 False
766
+ 2 True
767
+ 3 NaN
768
+ dtype: bool
769
+
770
+ **Missing values**
771
+
772
+ We can fill missing values in the output using the `na` parameter:
773
+
774
+ >>> s.contains(pattern='a', na=False)
775
+ 0 False
776
+ 1 False
777
+ 2 True
778
+ 3 False
779
+ dtype: bool
780
+
731
781
**Bad:**
732
782
733
783
.. code-block:: python
734
784
735
- def method ():
785
+ def method(foo=None, bar=None ):
736
786
"""
737
787
A sample DataFrame method.
738
788
739
789
Do not import numpy and pandas.
740
790
741
- Try to use meaningful data, when it adds value.
791
+ Try to use meaningful data, when it makes the example easier
792
+ to understand.
793
+
794
+ Try to avoid positional arguments like in `df.method(1 )` . They
795
+ can be all right if previously defined with a meaningful name,
796
+ like in `present_value(interest_rate)` , but avoid them otherwise.
797
+
798
+ When presenting the behavior with different parameters, do not place
799
+ all the calls one next to the other. Instead, add a short sentence
800
+ explaining what the example shows.
742
801
743
802
Examples
744
803
--------
745
804
>> > import numpy as np
746
805
>> > import pandas as pd
747
806
>> > df = pd.DataFrame(numpy.random.randn(3 , 3 ),
748
807
... columns = (' a' , ' b' , ' c' ))
808
+ >> > df.method(1 )
809
+ 21
810
+ >> > df.method(bar = 14 )
811
+ 123
749
812
"""
750
813
pass
814
+
815
+
816
+ .. _docstring.example_plots:
817
+
818
+ Plots in examples
819
+ ^^^^^^^^^^^^^^^^^
820
+
821
+ There are some methods in pandas returning plots. To render the plots generated
822
+ by the examples in the documentation, the `.. plot::` directive exists.
823
+
824
+ To use it, place the next code after the "Examples" header as shown below. The
825
+ plot will be generated automatically when building the documentation.
826
+
827
+ .. code-block:: python
828
+
829
+ class Series:
830
+ def plot(self):
831
+ """
832
+ Generate a plot with the `Series` data.
833
+
834
+ Examples
835
+ --------
836
+
837
+ .. plot::
838
+ :context: close- figs
839
+
840
+ >> > s = pd.Series([1 , 2 , 3 ])
841
+ >> > s.plot()
842
+ """
843
+ pass
0 commit comments