1
- .. _ pandas_docstring :
1
+ .. _ docstring :
2
2
3
- ====================================
4
- How to write a good pandas docstring
5
- ====================================
3
+ ======================
4
+ pandas docstring guide
5
+ ======================
6
6
7
7
About docstrings and standards
8
8
------------------------------
@@ -38,6 +38,10 @@ Next example gives an idea on how a docstring looks like:
38
38
int
39
39
The sum of `num1` and `num2`
40
40
41
+ See Also
42
+ --------
43
+ subtract : Subtract one integer from another
44
+
41
45
Examples
42
46
--------
43
47
>>> add(2, 2)
@@ -56,11 +60,12 @@ The first conventions every Python docstring should follow are defined in
56
60
`PEP-257 <https://www.python.org/dev/peps/pep-0257/ >`_.
57
61
58
62
As PEP-257 is quite open, and some other standards exist on top of it. In the
59
- case of pandas, the numpy docstring convention is followed. There are two main
60
- documents that explain this convention :
63
+ case of pandas, the numpy docstring convention is followed. The conventions is
64
+ explained in this document :
61
65
62
- - `Guide to NumPy/SciPy documentation <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt >`_
63
66
- `numpydoc docstring guide <http://numpydoc.readthedocs.io/en/latest/format.html >`_
67
+ (which is based in the original `Guide to NumPy/SciPy documentation
68
+ <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt> `_)
64
69
65
70
numpydoc is a Sphinx extension to support the numpy docstring convention.
66
71
@@ -75,9 +80,13 @@ about reStructuredText can be found in:
75
80
The rest of this document will summarize all the above guides, and will
76
81
provide additional convention specific to the pandas project.
77
82
83
+ .. _docstring.tutorial :
84
+
78
85
Writing a docstring
79
86
-------------------
80
87
88
+ .. _docstring.general :
89
+
81
90
General rules
82
91
~~~~~~~~~~~~~
83
92
@@ -124,6 +133,8 @@ opening quotes (not in the next line). The closing quotes have their own line
124
133
bar = 2
125
134
return foo + bar
126
135
136
+ .. _docstring.short_summary :
137
+
127
138
Section 1: Short summary
128
139
~~~~~~~~~~~~~~~~~~~~~~~~
129
140
@@ -178,6 +189,8 @@ details.
178
189
"""
179
190
pass
180
191
192
+ .. _docstring.extended_summary :
193
+
181
194
Section 2: Extended summary
182
195
~~~~~~~~~~~~~~~~~~~~~~~~~~~
183
196
@@ -203,6 +216,8 @@ every paragraph in the extended summary is finished by a dot.
203
216
"""
204
217
pass
205
218
219
+ .. _docstring.parameters :
220
+
206
221
Section 3: Parameters
207
222
~~~~~~~~~~~~~~~~~~~~~
208
223
@@ -223,12 +238,19 @@ required to have a line with the parameter description, which is indented, and
223
238
can have multiple lines. The description must start with a capital letter, and
224
239
finish with a dot.
225
240
241
+ Keyword arguments with a default value, the default will be listed in brackets
242
+ at the end of the description (before the dot). The exact form of the
243
+ description in this case would be "Description of the arg (default is X).". In
244
+ some cases it may be useful to explain what the default argument means, which
245
+ can be added after a comma "Description of the arg (default is -1, which means
246
+ all cpus).".
247
+
226
248
**Good: **
227
249
228
250
.. code-block :: python
229
251
230
252
class Series :
231
- def plot (self , kind , ** kwargs ):
253
+ def plot (self , kind , color = ' blue ' , ** kwargs ):
232
254
""" Generate a plot.
233
255
234
256
Render the data in the Series as a matplotlib plot of the
@@ -238,6 +260,8 @@ finish with a dot.
238
260
----------
239
261
kind : str
240
262
Kind of matplotlib plot.
263
+ color : str
264
+ Color name or rgb code (default is 'blue').
241
265
**kwargs
242
266
These parameters will be passed to the matplotlib plotting
243
267
function.
@@ -272,6 +296,8 @@ finish with a dot.
272
296
"""
273
297
pass
274
298
299
+ .. _docstring.parameter_types :
300
+
275
301
Parameter types
276
302
^^^^^^^^^^^^^^^
277
303
@@ -281,6 +307,7 @@ directly:
281
307
- int
282
308
- float
283
309
- str
310
+ - bool
284
311
285
312
For complex types, define the subtypes:
286
313
@@ -290,7 +317,8 @@ For complex types, define the subtypes:
290
317
- set of {str}
291
318
292
319
In case there are just a set of values allowed, list them in curly brackets
293
- and separated by commas (followed by a space):
320
+ and separated by commas (followed by a space). If one of them is the default
321
+ value of a keyword argument, it should be listed first.:
294
322
295
323
- {0, 10, 25}
296
324
- {'simple', 'advanced'}
@@ -306,10 +334,21 @@ If the type is in a package, the module must be also specified:
306
334
- numpy.ndarray
307
335
- scipy.sparse.coo_matrix
308
336
309
- If the type is a pandas type, also specify pandas:
337
+ If the type is a pandas type, also specify pandas except for Series and
338
+ DataFrame:
339
+
340
+ - Series
341
+ - DataFrame
342
+ - pandas.Index
343
+ - pandas.Categorical
344
+ - pandas.SparseArray
310
345
311
- - pandas.Series
312
- - pandas.DataFrame
346
+ If the exact type is not relevant, but must be compatible with a numpy
347
+ array, array-like can be specified. If Any type that can be iterated is
348
+ accepted, iterable can be used:
349
+
350
+ - array-like
351
+ - iterable
313
352
314
353
If more than one type is accepted, separate them by commas, except the
315
354
last two types, that need to be separated by the word 'or':
@@ -321,6 +360,8 @@ last two types, that need to be separated by the word 'or':
321
360
If None is one of the accepted values, it always needs to be the last in
322
361
the list.
323
362
363
+ .. _docstring.returns :
364
+
324
365
Section 4: Returns or Yields
325
366
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
326
367
@@ -395,12 +436,14 @@ If the method yields its value:
395
436
while True :
396
437
yield random.random()
397
438
439
+ .. _docstring.see_also :
398
440
399
- Section 5: See also
441
+ Section 5: See Also
400
442
~~~~~~~~~~~~~~~~~~~
401
443
402
444
This is an optional section, used to let users know about pandas functionality
403
- related to the one being documented.
445
+ related to the one being documented. While optional, this section should exist
446
+ in most cases, unless no related methods or functions can be found at all.
404
447
405
448
An obvious example would be the `head() ` and `tail() ` methods. As `tail() ` does
406
449
the equivalent as `head() ` but at the end of the `Series ` or `DataFrame `
@@ -421,22 +464,30 @@ examples:
421
464
* `astype ` and `pandas.to_datetime `, as users may be reading the documentation
422
465
of `astype ` to know how to cast as a date, and the way to do it is with
423
466
`pandas.to_datetime `
467
+ * `where ` is related to `numpy.where `, as its functionality is based on it
424
468
425
469
When deciding what is related, you should mainly use your common sense and
426
470
think about what can be useful for the users reading the documentation,
427
471
especially the less experienced ones.
428
472
473
+ When relating to other libraries (mainly `numpy `), use the name of the module
474
+ first (not an alias like `np `). If the function is in a module which is not
475
+ the main one, like `scipy.sparse `, list the full module (e.g.
476
+ `scipy.sparse.coo_matrix `).
477
+
429
478
This section, as the previous, also has a header, "See Also" (note the capital
430
479
S and A). Also followed by the line with hyphens, and preceded by a blank line.
431
480
432
481
After the header, we will add a line for each related method or function,
433
482
followed by a space, a colon, another space, and a short description that
434
- illustrated what this method or function does, and why is it relevant in
435
- this context. The description must also finish with a dot.
483
+ illustrated what this method or function does, why is it relevant in this
484
+ context, and what are the key differences between the documented function and
485
+ the one referencing. The description must also finish with a dot.
436
486
437
487
Note that in "Returns" and "Yields", the description is located in the
438
488
following line than the type. But in this section it is located in the same
439
- line, with a colon in between.
489
+ line, with a colon in between. If the description does not fit in the same
490
+ line, it can continue in the next ones, but it has to be indenteted in them.
440
491
441
492
For example:
442
493
@@ -449,9 +500,9 @@ For example:
449
500
This function is mainly useful to preview the values of the
450
501
Series without displaying the whole of it.
451
502
452
- Return
453
- ------
454
- pandas. Series
503
+ Returns
504
+ -------
505
+ Series
455
506
Subset of the original series with the 5 first values.
456
507
457
508
See Also
@@ -460,6 +511,8 @@ For example:
460
511
"""
461
512
return self .iloc[:5 ]
462
513
514
+ .. _docstring.notes :
515
+
463
516
Section 6: Notes
464
517
~~~~~~~~~~~~~~~~
465
518
@@ -472,16 +525,18 @@ examples for the function.
472
525
473
526
This section follows the same format as the extended summary section.
474
527
528
+ .. _docstring.examples :
529
+
475
530
Section 7: Examples
476
531
~~~~~~~~~~~~~~~~~~~
477
532
478
533
This is one of the most important sections of a docstring, even if it is
479
534
placed in the last position. As often, people understand concepts better
480
535
with examples, than with accurate explanations.
481
536
482
- Examples in docstrings are also unit tests, and besides illustrating the
483
- usage of the function or method, they need to be valid Python code, that in a
484
- deterministic way returns the presented output.
537
+ Examples in docstrings, besides illustrating the usage of the function or
538
+ method, they must be valid Python code, that in a deterministic way returns
539
+ the presented output, and that can be copied and run by users .
485
540
486
541
They are presented as a session in the Python terminal. `>>> ` is used to
487
542
present code. `... ` is used for code continuing from the previous line.
@@ -491,14 +546,21 @@ be added with blank lines before and after them.
491
546
492
547
The way to present examples is as follows:
493
548
494
- 1. Import required libraries
549
+ 1. Import required libraries (except ` numpy ` and ` pandas `)
495
550
496
551
2. Create the data required for the example
497
552
498
553
3. Show a very basic example that gives an idea of the most common use case
499
554
500
- 4. Add commented examples that illustrate how the parameters can be used for
501
- extended functionality
555
+ 4. Add examples with explanations that illustrate how the parameters can be
556
+ used for extended functionality
557
+
558
+ .. note ::
559
+ Which data should be used in examples is a topic still under discussion.
560
+ We'll likely be importing a standard dataset from `pandas.io.samples `, but
561
+ this still needs confirmation. You can work with the data from this pull
562
+ request: https://github.com/pandas-dev/pandas/pull/19933/files but
563
+ consider this could still change.
502
564
503
565
A simple example could be:
504
566
@@ -527,9 +589,8 @@ A simple example could be:
527
589
528
590
Examples
529
591
--------
530
- >>> import pandas
531
- >>> s = pandas.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',
532
- ... 'Lion', 'Monkey', 'Rabbit', 'Zebra'])
592
+ >>> s = pd.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',
593
+ ... 'Lion', 'Monkey', 'Rabbit', 'Zebra'])
533
594
>>> s.head()
534
595
0 Ant
535
596
1 Bear
@@ -548,32 +609,25 @@ A simple example could be:
548
609
"""
549
610
return self .iloc[:n]
550
611
612
+ .. _docstring.example_conventions :
613
+
551
614
Conventions for the examples
552
615
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
553
616
554
- .. note ::
555
- numpydoc recommends avoiding "obvious" imports and importing them with
556
- aliases, so for example `import numpy as np `. While this is now an standard
557
- in the data ecosystem of Python, it doesn't seem a good practise, for the
558
- next reasons:
559
-
560
- * The code is not executable anymore (as doctests for example)
617
+ Code in examples is assumed to always start with these two lines which are not
618
+ shown:
561
619
562
- * New users not familiar with the convention can't simply copy and run it
620
+ .. code-block :: python
563
621
564
- * Users may use aliases (even if it is a bad Python practise except
565
- in rare cases), but if maintainers want to use `pd ` instead of `pandas `,
566
- why do not name the module `pd ` directly?
622
+ import numpy as np
623
+ import pandas as pd
567
624
568
- * As this is becoming more standard, there are an increasing number of
569
- aliases in scientific Python code, including `np `, `pd `, `plt `, `sp `,
570
- `pm `... which makes reading code harder
571
625
572
- All examples must start with the required imports , one per line (as
626
+ Any other module used in the examples must be explicitly imported , one per line (as
573
627
recommended in `PEP-8 <https://www.python.org/dev/peps/pep-0008/#imports >`_)
574
628
and avoiding aliases. Avoid excessive imports, but if needed, imports from
575
629
the standard library go first, followed by third-party libraries (like
576
- numpy) and importing pandas in the last place .
630
+ matplotlib) .
577
631
578
632
When illustrating examples with a single `Series ` use the name `s `, and if
579
633
illustrating with a single `DataFrame ` use the name `df `. If a set of
@@ -605,11 +659,9 @@ positional arguments `head(3)`.
605
659
606
660
Examples
607
661
--------
608
- >>> import numpy
609
- >>> import pandas
610
- >>> df = pandas.DataFrame([389., 24., 80.5, numpy.nan]
611
- ... columns=('max_speed'),
612
- ... index=['falcon', 'parrot', 'lion', 'monkey'])
662
+ >>> df = pd.DataFrame([389., 24., 80.5, numpy.nan]
663
+ ... columns=('max_speed'),
664
+ ... index=['falcon', 'parrot', 'lion', 'monkey'])
613
665
"""
614
666
pass
615
667
@@ -622,15 +674,9 @@ positional arguments `head(3)`.
622
674
623
675
Examples
624
676
--------
625
- >>> import numpy
626
- >>> import pandas
627
- >>> df = pandas .DataFrame(numpy.random.randn(3, 3),
628
- ... columns=('a', 'b', 'c'))
677
+ >>> import numpy as np
678
+ >>> import pandas as pd
679
+ >>> df = pd .DataFrame(numpy.random.randn(3, 3),
680
+ ... columns=('a', 'b', 'c'))
629
681
"""
630
682
pass
631
-
632
- Once you finished the docstring
633
- -------------------------------
634
-
635
- When you finished the changes to the docstring, go to the
636
- :ref: `instructions to submit your changes <pandas_pr >` to continue.
0 commit comments