Skip to content

DOC: general discussion of our docstring sharing system #19932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Feb 27, 2018 · 7 comments
Open

DOC: general discussion of our docstring sharing system #19932

jorisvandenbossche opened this issue Feb 27, 2018 · 7 comments
Labels
Docs Needs Discussion Requires discussion from core team before further action

Comments

@jorisvandenbossche
Copy link
Member

Currently in many places we write docstrings as string values inside a _shared_docs dictionary, and then add this docstring to the method with a decorator like @Appender(_shared_docs['where']), or possibly with substitution of variables in the template docstring with @Appender(_shared_docs['where'] % _shared_doc_kwargs).
Apart from that, we have some other custom ways to generate them, typically for a specific set of methods (eg all statistical methods like mean, sum, median, ..).

This system is important to not duplicate content too much, but also has some drawbacks:

  • I think it is a rather complex system for contributors to understand. It can be hard to know where to find the method's docstring, or how to edit it appropriately, and how it will look like eventually.
  • It gives rise to some errors (which can of course be solved). For example:
    • left-overs from shared_kwargs of parent docstrings that have not been substituted correctly. Eg NDFrame should never occur in the docs, but does quite a lot (eg DataFrame.where docstring)
    • left-overs of the template where we forgot to substitute at all (eg DataFrame.rename docstring)
  • We have examples that are not specific for the function (eg DataFrame.mask docstring with DataFrame.where examples) or for the object (eg Index.searchsorted docstring with Series examples).
    And in general in can be hard/complicated to add good examples, as those are typically specific to one method, so you need to write them in separate strings and then add to the appropriate docstring.

So therefore opening this issue, to gather ideas on whether we can improve this workflow, without sacrificing the deduplication and flexibility of the current system?

@jorisvandenbossche jorisvandenbossche added this to the doc milestone Feb 27, 2018
@jorisvandenbossche jorisvandenbossche removed this from the doc milestone Feb 27, 2018
@jorisvandenbossche
Copy link
Member Author

Some first thoughts:

  • Better document the workflow for contributors (this is actually issue DOC: Explain how our shared_docs system works #16446)
  • Remove % _shared_doc_kwargs when there is nothing to substitute, and only put values inside _shared_doc_kwargs if they are used multiple times: if it is only for one method, write the string there where it is substituted.
  • I was wondering, would @Appender(NDFrame.sort_values) be more clear for contributors compared to @Appender(generic._shared_docs['sort_values']) ? That would also mean that in generic.py the docstring can be written as a normal docstring (after the def line instead of before)
  • We use both @Substitution(...) and % _shared_doc_kwargs (within an Appender decorator). Might be good to harmonize this.

@jorisvandenbossche
Copy link
Member Author

A typical occurrence of a problem is a function that only lives in generic.py, and has a template with substituted values. But since it only lives in generic.py, it only gets substituted with 'NDFrame' et al.
Examples of this are pipe (see eg https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L3929), pct_change, mask, where, set_axis.

One possibility is to actually add them to frame.py and series.py, just to call the super method but to fill in the template. Like eg is done for isna and related methods:

pandas/pandas/core/frame.py

Lines 3482 to 3484 in 52559f5

@Appender(_shared_docs['isna'] % _shared_doc_kwargs)
def isna(self):
return super(DataFrame, self).isna()

Other possibility is to update the docstring to just be generic (replace 'klass' with Series/DataFrame). Although then we would also need to add examples of both.

Last possibility would also be to do this template filling without overriding the actual method, but only filling the docstring with eg

DataFrame.isna.__doc__ = DataFrame.isna.__doc__ % _shared_doc_kwargs

(or go with some fancy metaclassing to do this automatically ...)

cc @jreback @TomAugspurger

@jorisvandenbossche
Copy link
Member Author

Another aspect to consider: to what extent do we necessarily want separate docstrings for Series/DataFrame?
For certain methods it might be fine to just have a common docstring (without substituting things), with examples that show both Series and DataFrame?

Anyway, some feedback on all the above thoughts is welcome :-)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 4, 2018 via email

@jorisvandenbossche
Copy link
Member Author

Any opinions on this:

I was wondering, would @Appender(NDFrame.sort_values) be more clear for contributors compared to @Appender(generic._shared_docs['sort_values']) ? That would also mean that in generic.py the docstring can be written as a normal docstring (after the def line instead of before)

@TomAugspurger
Copy link
Contributor

Yes, I support that. I started the shared doc guide this morning, and noticed what you meant by that.

@max-sixty
Copy link
Contributor

FYI at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_axis.html there's:

Whether to return a new %(klass)s instance.

@mroeschke mroeschke added Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action and removed Closing Candidate May be closeable, needs more eyeballs labels Jun 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants