Skip to content

Commit 5dab5af

Browse files
Merge remote-tracking branch 'origin/master'
2 parents 8ce10e9 + a64f46f commit 5dab5af

20 files changed

+544
-425
lines changed
1.71 KB
Loading
42.5 KB
Loading
9.43 KB
Loading

pandas/guide/_sources/pandas_docstring.rst.txt

+106-60
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
.. _pandas_docstring:
1+
.. _docstring:
22

3-
====================================
4-
How to write a good pandas docstring
5-
====================================
3+
======================
4+
pandas docstring guide
5+
======================
66

77
About docstrings and standards
88
------------------------------
@@ -38,6 +38,10 @@ Next example gives an idea on how a docstring looks like:
3838
int
3939
The sum of `num1` and `num2`
4040
41+
See Also
42+
--------
43+
subtract : Subtract one integer from another
44+
4145
Examples
4246
--------
4347
>>> add(2, 2)
@@ -56,11 +60,12 @@ The first conventions every Python docstring should follow are defined in
5660
`PEP-257 <https://www.python.org/dev/peps/pep-0257/>`_.
5761

5862
As PEP-257 is quite open, and some other standards exist on top of it. In the
59-
case of pandas, the numpy docstring convention is followed. There are two main
60-
documents that explain this convention:
63+
case of pandas, the numpy docstring convention is followed. The conventions is
64+
explained in this document:
6165

62-
- `Guide to NumPy/SciPy documentation <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_
6366
- `numpydoc docstring guide <http://numpydoc.readthedocs.io/en/latest/format.html>`_
67+
(which is based in the original `Guide to NumPy/SciPy documentation
68+
<https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_)
6469

6570
numpydoc is a Sphinx extension to support the numpy docstring convention.
6671

@@ -75,9 +80,13 @@ about reStructuredText can be found in:
7580
The rest of this document will summarize all the above guides, and will
7681
provide additional convention specific to the pandas project.
7782

83+
.. _docstring.tutorial:
84+
7885
Writing a docstring
7986
-------------------
8087

88+
.. _docstring.general:
89+
8190
General rules
8291
~~~~~~~~~~~~~
8392

@@ -124,6 +133,8 @@ opening quotes (not in the next line). The closing quotes have their own line
124133
bar = 2
125134
return foo + bar
126135
136+
.. _docstring.short_summary:
137+
127138
Section 1: Short summary
128139
~~~~~~~~~~~~~~~~~~~~~~~~
129140

@@ -178,6 +189,8 @@ details.
178189
"""
179190
pass
180191
192+
.. _docstring.extended_summary:
193+
181194
Section 2: Extended summary
182195
~~~~~~~~~~~~~~~~~~~~~~~~~~~
183196

@@ -203,6 +216,8 @@ every paragraph in the extended summary is finished by a dot.
203216
"""
204217
pass
205218
219+
.. _docstring.parameters:
220+
206221
Section 3: Parameters
207222
~~~~~~~~~~~~~~~~~~~~~
208223

@@ -223,12 +238,19 @@ required to have a line with the parameter description, which is indented, and
223238
can have multiple lines. The description must start with a capital letter, and
224239
finish with a dot.
225240

241+
Keyword arguments with a default value, the default will be listed in brackets
242+
at the end of the description (before the dot). The exact form of the
243+
description in this case would be "Description of the arg (default is X).". In
244+
some cases it may be useful to explain what the default argument means, which
245+
can be added after a comma "Description of the arg (default is -1, which means
246+
all cpus).".
247+
226248
**Good:**
227249

228250
.. code-block:: python
229251
230252
class Series:
231-
def plot(self, kind, **kwargs):
253+
def plot(self, kind, color='blue', **kwargs):
232254
"""Generate a plot.
233255
234256
Render the data in the Series as a matplotlib plot of the
@@ -238,6 +260,8 @@ finish with a dot.
238260
----------
239261
kind : str
240262
Kind of matplotlib plot.
263+
color : str
264+
Color name or rgb code (default is 'blue').
241265
**kwargs
242266
These parameters will be passed to the matplotlib plotting
243267
function.
@@ -272,6 +296,8 @@ finish with a dot.
272296
"""
273297
pass
274298
299+
.. _docstring.parameter_types:
300+
275301
Parameter types
276302
^^^^^^^^^^^^^^^
277303

@@ -281,6 +307,7 @@ directly:
281307
- int
282308
- float
283309
- str
310+
- bool
284311

285312
For complex types, define the subtypes:
286313

@@ -290,7 +317,8 @@ For complex types, define the subtypes:
290317
- set of {str}
291318

292319
In case there are just a set of values allowed, list them in curly brackets
293-
and separated by commas (followed by a space):
320+
and separated by commas (followed by a space). If one of them is the default
321+
value of a keyword argument, it should be listed first.:
294322

295323
- {0, 10, 25}
296324
- {'simple', 'advanced'}
@@ -306,10 +334,21 @@ If the type is in a package, the module must be also specified:
306334
- numpy.ndarray
307335
- scipy.sparse.coo_matrix
308336

309-
If the type is a pandas type, also specify pandas:
337+
If the type is a pandas type, also specify pandas except for Series and
338+
DataFrame:
339+
340+
- Series
341+
- DataFrame
342+
- pandas.Index
343+
- pandas.Categorical
344+
- pandas.SparseArray
310345

311-
- pandas.Series
312-
- pandas.DataFrame
346+
If the exact type is not relevant, but must be compatible with a numpy
347+
array, array-like can be specified. If Any type that can be iterated is
348+
accepted, iterable can be used:
349+
350+
- array-like
351+
- iterable
313352

314353
If more than one type is accepted, separate them by commas, except the
315354
last two types, that need to be separated by the word 'or':
@@ -321,6 +360,8 @@ last two types, that need to be separated by the word 'or':
321360
If None is one of the accepted values, it always needs to be the last in
322361
the list.
323362

363+
.. _docstring.returns:
364+
324365
Section 4: Returns or Yields
325366
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
326367

@@ -395,12 +436,14 @@ If the method yields its value:
395436
while True:
396437
yield random.random()
397438
439+
.. _docstring.see_also:
398440

399-
Section 5: See also
441+
Section 5: See Also
400442
~~~~~~~~~~~~~~~~~~~
401443

402444
This is an optional section, used to let users know about pandas functionality
403-
related to the one being documented.
445+
related to the one being documented. While optional, this section should exist
446+
in most cases, unless no related methods or functions can be found at all.
404447

405448
An obvious example would be the `head()` and `tail()` methods. As `tail()` does
406449
the equivalent as `head()` but at the end of the `Series` or `DataFrame`
@@ -421,22 +464,30 @@ examples:
421464
* `astype` and `pandas.to_datetime`, as users may be reading the documentation
422465
of `astype` to know how to cast as a date, and the way to do it is with
423466
`pandas.to_datetime`
467+
* `where` is related to `numpy.where`, as its functionality is based on it
424468

425469
When deciding what is related, you should mainly use your common sense and
426470
think about what can be useful for the users reading the documentation,
427471
especially the less experienced ones.
428472

473+
When relating to other libraries (mainly `numpy`), use the name of the module
474+
first (not an alias like `np`). If the function is in a module which is not
475+
the main one, like `scipy.sparse`, list the full module (e.g.
476+
`scipy.sparse.coo_matrix`).
477+
429478
This section, as the previous, also has a header, "See Also" (note the capital
430479
S and A). Also followed by the line with hyphens, and preceded by a blank line.
431480

432481
After the header, we will add a line for each related method or function,
433482
followed by a space, a colon, another space, and a short description that
434-
illustrated what this method or function does, and why is it relevant in
435-
this context. The description must also finish with a dot.
483+
illustrated what this method or function does, why is it relevant in this
484+
context, and what are the key differences between the documented function and
485+
the one referencing. The description must also finish with a dot.
436486

437487
Note that in "Returns" and "Yields", the description is located in the
438488
following line than the type. But in this section it is located in the same
439-
line, with a colon in between.
489+
line, with a colon in between. If the description does not fit in the same
490+
line, it can continue in the next ones, but it has to be indenteted in them.
440491

441492
For example:
442493

@@ -449,9 +500,9 @@ For example:
449500
This function is mainly useful to preview the values of the
450501
Series without displaying the whole of it.
451502
452-
Return
453-
------
454-
pandas.Series
503+
Returns
504+
-------
505+
Series
455506
Subset of the original series with the 5 first values.
456507
457508
See Also
@@ -460,6 +511,8 @@ For example:
460511
"""
461512
return self.iloc[:5]
462513
514+
.. _docstring.notes:
515+
463516
Section 6: Notes
464517
~~~~~~~~~~~~~~~~
465518

@@ -472,16 +525,18 @@ examples for the function.
472525

473526
This section follows the same format as the extended summary section.
474527

528+
.. _docstring.examples:
529+
475530
Section 7: Examples
476531
~~~~~~~~~~~~~~~~~~~
477532

478533
This is one of the most important sections of a docstring, even if it is
479534
placed in the last position. As often, people understand concepts better
480535
with examples, than with accurate explanations.
481536

482-
Examples in docstrings are also unit tests, and besides illustrating the
483-
usage of the function or method, they need to be valid Python code, that in a
484-
deterministic way returns the presented output.
537+
Examples in docstrings, besides illustrating the usage of the function or
538+
method, they must be valid Python code, that in a deterministic way returns
539+
the presented output, and that can be copied and run by users.
485540

486541
They are presented as a session in the Python terminal. `>>>` is used to
487542
present code. `...` is used for code continuing from the previous line.
@@ -491,14 +546,21 @@ be added with blank lines before and after them.
491546

492547
The way to present examples is as follows:
493548

494-
1. Import required libraries
549+
1. Import required libraries (except `numpy` and `pandas`)
495550

496551
2. Create the data required for the example
497552

498553
3. Show a very basic example that gives an idea of the most common use case
499554

500-
4. Add commented examples that illustrate how the parameters can be used for
501-
extended functionality
555+
4. Add examples with explanations that illustrate how the parameters can be
556+
used for extended functionality
557+
558+
.. note::
559+
Which data should be used in examples is a topic still under discussion.
560+
We'll likely be importing a standard dataset from `pandas.io.samples`, but
561+
this still needs confirmation. You can work with the data from this pull
562+
request: https://github.com/pandas-dev/pandas/pull/19933/files but
563+
consider this could still change.
502564

503565
A simple example could be:
504566

@@ -527,9 +589,8 @@ A simple example could be:
527589
528590
Examples
529591
--------
530-
>>> import pandas
531-
>>> s = pandas.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',
532-
... 'Lion', 'Monkey', 'Rabbit', 'Zebra'])
592+
>>> s = pd.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',
593+
... 'Lion', 'Monkey', 'Rabbit', 'Zebra'])
533594
>>> s.head()
534595
0 Ant
535596
1 Bear
@@ -548,32 +609,25 @@ A simple example could be:
548609
"""
549610
return self.iloc[:n]
550611
612+
.. _docstring.example_conventions:
613+
551614
Conventions for the examples
552615
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
553616

554-
.. note::
555-
numpydoc recommends avoiding "obvious" imports and importing them with
556-
aliases, so for example `import numpy as np`. While this is now an standard
557-
in the data ecosystem of Python, it doesn't seem a good practise, for the
558-
next reasons:
559-
560-
* The code is not executable anymore (as doctests for example)
617+
Code in examples is assumed to always start with these two lines which are not
618+
shown:
561619

562-
* New users not familiar with the convention can't simply copy and run it
620+
.. code-block:: python
563621
564-
* Users may use aliases (even if it is a bad Python practise except
565-
in rare cases), but if maintainers want to use `pd` instead of `pandas`,
566-
why do not name the module `pd` directly?
622+
import numpy as np
623+
import pandas as pd
567624
568-
* As this is becoming more standard, there are an increasing number of
569-
aliases in scientific Python code, including `np`, `pd`, `plt`, `sp`,
570-
`pm`... which makes reading code harder
571625
572-
All examples must start with the required imports, one per line (as
626+
Any other module used in the examples must be explicitly imported, one per line (as
573627
recommended in `PEP-8 <https://www.python.org/dev/peps/pep-0008/#imports>`_)
574628
and avoiding aliases. Avoid excessive imports, but if needed, imports from
575629
the standard library go first, followed by third-party libraries (like
576-
numpy) and importing pandas in the last place.
630+
matplotlib).
577631

578632
When illustrating examples with a single `Series` use the name `s`, and if
579633
illustrating with a single `DataFrame` use the name `df`. If a set of
@@ -605,11 +659,9 @@ positional arguments `head(3)`.
605659
606660
Examples
607661
--------
608-
>>> import numpy
609-
>>> import pandas
610-
>>> df = pandas.DataFrame([389., 24., 80.5, numpy.nan]
611-
... columns=('max_speed'),
612-
... index=['falcon', 'parrot', 'lion', 'monkey'])
662+
>>> df = pd.DataFrame([389., 24., 80.5, numpy.nan]
663+
... columns=('max_speed'),
664+
... index=['falcon', 'parrot', 'lion', 'monkey'])
613665
"""
614666
pass
615667
@@ -622,15 +674,9 @@ positional arguments `head(3)`.
622674
623675
Examples
624676
--------
625-
>>> import numpy
626-
>>> import pandas
627-
>>> df = pandas.DataFrame(numpy.random.randn(3, 3),
628-
... columns=('a', 'b', 'c'))
677+
>>> import numpy as np
678+
>>> import pandas as pd
679+
>>> df = pd.DataFrame(numpy.random.randn(3, 3),
680+
... columns=('a', 'b', 'c'))
629681
"""
630682
pass
631-
632-
Once you finished the docstring
633-
-------------------------------
634-
635-
When you finished the changes to the docstring, go to the
636-
:ref:`instructions to submit your changes <pandas_pr>` to continue.

0 commit comments

Comments
 (0)