Skip to content

CLN: Replace Appender and Substitution with simpler doc decorator #31060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 12, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 29 additions & 34 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

from pandas._libs import Timestamp, algos, hashtable as htable, lib
from pandas._libs.tslib import iNaT
from pandas.util._decorators import Appender, Substitution
from pandas.util._decorators import doc

from pandas.core.dtypes.cast import (
construct_1d_object_array_from_listlike,
Expand Down Expand Up @@ -480,9 +480,32 @@ def _factorize_array(
return codes, uniques


_shared_docs[
"factorize"
] = """
@doc(
values=dedent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get rid of the need for those dedent calls ? (thinking about how to make this even easier)

I suppose one option is writing those dedented. Something like

@doc(
    values=(
        """\
values : sequence
    A 1-D sequence. Sequences that aren't pandas objects are
    coerced to ndarrays before factorization.
        """), 
...
)
def factorize(...

but we might consider this ugly? (and does Black allow it?)

And in another this dedent could be handled by the decorator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on adding indentation "magic" to the decorator. Not a big fan of automatic things, but I think making the docstring injection as simple as possible, so the important code is more readable, makes it worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to add this "magic" to decorator so we can keep doc simple.

I am considering change from

wrapper.__doc__ = wrapper._docstr_template.format(**kwargs)  # type: ignore

to add dedentto all kwargs like this

wrapper.__doc__ = wrapper._docstr_template.format(**{k: dedent(v) for k, v in kwargs.items()})  # type: ignore

Any thoughts? Comments? Please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases where we don't do a dedent call, like in https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L66. So you might need to ensure something automatic works there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)

Good point. The changes we made here should not change the actual docstring. Since there are mixed cases of usage, it seems hard for me to get a good solution for all of them. At least nothing comes out in my mind now.

I like the "magic", but for now, maybe it is better to keep it without a dedent call. Then, we can manually control what shows there. Also, if we do not add dedent we can change the use of Appender and Substitution to doc easily and straightforward, because the doc's behave is very close to simply combine Appender and Substitution together.

How about we just keep this unchanged now, and we can come back later when we have converted most decorator usages to doc? At that time, we might have more information to decided what could be the best way of solving that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I also think it's better to start simple, see how things work, and add more features later.

I think adding the automatic indentation will be worth, since it'll make things much simpler when documenting, and it's not so complex to implement in the decorator. But let's forget about it for now, this is already a nice improvement.

Thanks!

"""\
values : sequence
A 1-D sequence. Sequences that aren't pandas objects are
coerced to ndarrays before factorization.
"""
),
sort=dedent(
"""\
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
"""
),
size_hint=dedent(
"""\
size_hint : int, optional
Hint to the hashtable sizer.
"""
),
)
def factorize(
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None
) -> Tuple[np.ndarray, Union[np.ndarray, ABCIndex]]:
"""
Encode the object as an enumerated type or categorical variable.

This method is useful for obtaining a numeric representation of an
Expand All @@ -492,10 +515,10 @@ def _factorize_array(

Parameters
----------
%(values)s%(sort)s
{values}{sort}
na_sentinel : int, default -1
Value to mark "not found".
%(size_hint)s\
{size_hint}\

Returns
-------
Expand Down Expand Up @@ -573,34 +596,6 @@ def _factorize_array(
>>> uniques
Index(['a', 'c'], dtype='object')
"""


@Substitution(
values=dedent(
"""\
values : sequence
A 1-D sequence. Sequences that aren't pandas objects are
coerced to ndarrays before factorization.
"""
),
sort=dedent(
"""\
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
"""
),
size_hint=dedent(
"""\
size_hint : int, optional
Hint to the hashtable sizer.
"""
),
)
@Appender(_shared_docs["factorize"])
def factorize(
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None
) -> Tuple[np.ndarray, Union[np.ndarray, ABCIndex]]:
# Implementation notes: This method is responsible for 3 things
# 1.) coercing data to array-like (ndarray, Index, extension array)
# 2.) factorizing codes and uniques
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from pandas.compat import PYPY
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender, Substitution, cache_readonly
from pandas.util._decorators import Appender, Substitution, cache_readonly, doc
from pandas.util._validators import validate_bool_kwarg

from pandas.core.dtypes.cast import is_nested_object
Expand Down Expand Up @@ -1389,7 +1389,8 @@ def memory_usage(self, deep=False):
v += lib.memory_usage_of_objects(self.array)
return v

@Substitution(
@doc(
algorithms.factorize,
values="",
order="",
size_hint="",
Expand All @@ -1401,7 +1402,6 @@ def memory_usage(self, deep=False):
"""
),
)
@Appender(algorithms._shared_docs["factorize"])
def factorize(self, sort=False, na_sentinel=-1):
return algorithms.factorize(self, sort=sort, na_sentinel=na_sentinel)

Expand Down
30 changes: 30 additions & 0 deletions pandas/util/_decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,36 @@ def wrapper(*args, **kwargs) -> Callable[..., Any]:
return decorate


def doc(*args: Union[str, Callable], **kwargs: str) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable:
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[F], F]:

"""
A decorator take docstring templates, concatenate them and perform string
substitution on it.

This decorator should be robust even if func.__doc__ is None.
"""

def decorator(func: Callable) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use a TypeVar here - I think @simonjayhawkins might have done something similar with a Callable in another module

@wraps(func)
def wrapper(*args, **kwargs) -> Callable:
return func(*args, **kwargs)

templates = [func.__doc__ if func.__doc__ else ""]
for arg in args:
if isinstance(arg, str):
templates.append(arg)
elif hasattr(arg, "_docstr_template"):
templates.append(arg._docstr_template) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all of the type: ignore comments for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems mypy didn't let us declare and use attributes of function objects. They have a issue for this, but seems there are no official solutions.

We might be able to use some trick to handle this, but I am not sure if it worth. It will introducing more "unrelated" code. Also, this call of _docstr_template has been checked by hasattr, so I think it is safe here.

Just to be clear, I am open to make the adjustment to avoid this ignore comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding https://github.com/python/mypy/issues/2087 as a comment on a line preceding the ignores would be fine.

elif arg.__doc__:
templates.append(arg.__doc__)

wrapper._docstr_template = "".join(dedent(t) for t in templates) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain / add a comment why the _docstr_template is needed?

Copy link
Contributor Author

@HH-MWB HH-MWB Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain / add a comment why the _docstr_template is needed?

The doc decorator will format docstring template and use the formatted string to replace __doc__. Using the decorator will cause us to lose the original template.

Take this fake code as an example:

@doc(method='cummax', operation='maximum')
def cummax(whatever):
    """
    This is the {method} method.

    It computes the cumulative {operation}.
    """

@doc(cummax, method='cummin', operation='minimum')
def cummin(whatever):
    pass

In cummin, we would like to re-use the docstring template from cummax. However, when calling @doc() for cummax, the __doc__ will no longer be the template. So, we might want to save the original template somewhere and use it later.

I think it is a good idea to commenting this code because, at least for me, this is not intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain / add a comment why the _docstr_template is needed?

@jorisvandenbossche I have add more detail doc under the decorator. May I have more feedback please?

wrapper.__doc__ = wrapper._docstr_template.format(**kwargs) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only do this if there actually are kwargs? (otherwise you need to use double { also when you are not doing any formatting)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for current approach, we have to use {{ and }} for { and }. I have a slight preference to keep it that way because I believe it can help keep behavior consistent and make it easier to use.

There are two kinds of docstring, in my opinion. We might name one "doc decorator friendly" and the other "doc decorator indeterminate":

  1. The "doc decorator friendly" means the docstring has been modified to use format, such as using {{ and }} for { and }. We can safely put the function into a doc decorator without checking formatting details.
  2. For "doc decorator indeterminate", it may have { or } there. If we want to use it in doc decorator, we might need to convert it to "doc decorator friendly" first.

For the current approach, all template was guaranteed to be formatted before putting it in __doc__. Therefore, if there's a doc decorator, we can easily tell the function is "doc decorator friendly". In addition, the user won't have to worry about using {{ and }} or { and } in the docstring, because of the consistent behavior.


return wrapper

return decorator


# Substitution and Appender are derived from matplotlib.docstring (1.1.0)
# module http://matplotlib.org/users/license.html

Expand Down