Skip to content

CLN: @doc - base.py & indexing.py #31970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 17, 2020
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
Substitution,
cache_readonly,
deprecate_kwarg,
doc,
)
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs

Expand Down Expand Up @@ -51,7 +52,7 @@
_extension_array_shared_docs,
try_cast_to_ea,
)
from pandas.core.base import NoNewAttributesMixin, PandasObject, _shared_docs
from pandas.core.base import IndexOpsMixin, NoNewAttributesMixin, PandasObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I am in the minority on this view and don't want to be overly difficult, but can you refactor the IndexOpsMixin as a pre-cursor to this, or leave this module separate from the rest of changes (which look good btw)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a good idea. I would convert this back to using _shared_docs. This PR is already too large, and I should really put them separately.

import pandas.core.common as com
from pandas.core.construction import array, extract_array, sanitize_array
from pandas.core.indexers import check_array_indexer, deprecate_ndim_indexing
Expand Down Expand Up @@ -1352,8 +1353,7 @@ def memory_usage(self, deep=False):
"""
return self._codes.nbytes + self.dtype.categories.memory_usage(deep=deep)

@Substitution(klass="Categorical")
@Appender(_shared_docs["searchsorted"])
@doc(IndexOpsMixin.searchsorted, klass="Categorical")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be inheriting from core.arrays.base.ExtensionArray instead? Also I can't see this docstring in the published docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment here - this seems strange to use IndexOpsMixin since it isn't part of this classes hierarchy. Any reason not to address this?

def searchsorted(self, value, side="left", sorter=None):
# searchsorted is very performance sensitive. By converting codes
# to same dtype as self.codes, we get much faster performance.
Expand Down
21 changes: 8 additions & 13 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import builtins
import textwrap
from typing import Dict, FrozenSet, List, Optional, Union
from typing import FrozenSet, List, Optional, Union

import numpy as np

Expand All @@ -13,7 +13,7 @@
from pandas.compat import PYPY
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender, Substitution, cache_readonly, doc
from pandas.util._decorators import cache_readonly, doc
from pandas.util._validators import validate_bool_kwarg

from pandas.core.dtypes.cast import is_nested_object
Expand All @@ -36,7 +36,6 @@
from pandas.core.construction import create_series_with_explicit_dtype
import pandas.core.nanops as nanops

_shared_docs: Dict[str, str] = dict()
_indexops_doc_kwargs = dict(
klass="IndexOpsMixin",
inplace="",
Expand Down Expand Up @@ -1402,26 +1401,26 @@ def memory_usage(self, deep=False):
def factorize(self, sort=False, na_sentinel=-1):
return algorithms.factorize(self, sort=sort, na_sentinel=na_sentinel)

_shared_docs[
"searchsorted"
] = """
@doc(klass="Index")
def searchsorted(self, value, side="left", sorter=None) -> np.ndarray:
"""
Find indices where elements should be inserted to maintain order.

Find the indices into a sorted %(klass)s `self` such that, if the
Find the indices into a sorted {klass} `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.

.. note::

The %(klass)s *must* be monotonically sorted, otherwise
The {klass} *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.

Parameters
----------
value : array_like
Values to insert into `self`.
side : {'left', 'right'}, optional
side : {{'left', 'right'}}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
Expand Down Expand Up @@ -1488,10 +1487,6 @@ def factorize(self, sort=False, na_sentinel=-1):
>>> x.searchsorted(1)
0 # wrong result, correct would be 1
"""

@Substitution(klass="Index")
@Appender(_shared_docs["searchsorted"])
def searchsorted(self, value, side="left", sorter=None) -> np.ndarray:
return algorithms.searchsorted(self._values, value, side=side, sorter=sorter)

def drop_duplicates(self, keep="first", inplace=False):
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pandas._libs.tslibs import timezones
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender, cache_readonly
from pandas.util._decorators import Appender, cache_readonly, doc

from pandas.core.dtypes.common import (
ensure_int64,
Expand All @@ -31,7 +31,7 @@
from pandas.core import algorithms
from pandas.core.arrays import DatetimeArray, PeriodArray, TimedeltaArray
from pandas.core.arrays.datetimelike import DatetimeLikeArrayMixin
from pandas.core.base import _shared_docs
from pandas.core.base import IndexOpsMixin
import pandas.core.indexes.base as ibase
from pandas.core.indexes.base import Index, _index_shared_docs
from pandas.core.indexes.extension import (
Expand Down Expand Up @@ -206,7 +206,7 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None, **kwargs):
self, indices, axis, allow_fill, fill_value, **kwargs
)

@Appender(_shared_docs["searchsorted"])
@doc(IndexOpsMixin.searchsorted, klass="Datetime-like Index")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment - this is very confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WillAyd, I agree with you and @ simonjayhawkins here. It might be confusing. However, the original docstring is not extended from the base class. It seems like the original code obscures the problem because it does not explicitly indicate the source of the docstring. I try to keep it as it is but use @doc.

It looks like we all agree this docstring template will confuse other developers, but do you feel needed to fix this issue in this PR? If so, what will be your suggestion? One option that comes in my mind will be using the docstring from the base class, and modify them to fit in this case. I was trying to avoid that because it will change the original docstring relations, and I am not sure if we did this on purpose.

Just to be clarified, I am very willing to make the additional change to solve this confusion. I just don't know what will be the best way of doing that. One more thing, I have a comment related to this. You might also be interested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more suitable location for the docstring then? Importing the IndexOpsMixin here is strange

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think this should be addressed in a follow up. It's here in IndexOpsMixin because that's where the _shared_docs was. I think this PR is already too complex to make the change here. And if we change _shared_docs before merging this, the conflict here will be quite annoying to fix. Does it make sense?

Copy link
Contributor Author

@HH-MWB HH-MWB Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more suitable location for the docstring then? Importing the IndexOpsMixin here is strange

For me, I feel this situation might be caused by they are sharing the same interface, but don't have a common ancestor.

If we have an interface declaration, I would say putting docstring there would be a good choice. However, Python as a dynamic programming language, don't have to declare the interface.

Now, I don't have a good solution in mind, but I would like to look for it and see if we can find somewhere that makes more sense.

def searchsorted(self, value, side="left", sorter=None):
if isinstance(value, str):
raise TypeError(
Expand Down
12 changes: 6 additions & 6 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from pandas._libs.indexing import _NDFrameIndexerBase
from pandas._libs.lib import item_from_zerodim
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender
from pandas.util._decorators import doc

from pandas.core.dtypes.common import (
is_float,
Expand Down Expand Up @@ -872,7 +872,7 @@ def _getbool_axis(self, key, axis: int):
return self.obj._take_with_is_copy(inds, axis=axis)


@Appender(IndexingMixin.loc.__doc__)
@doc(IndexingMixin.loc)
class _LocIndexer(_LocationIndexer):
_takeable: bool = False
_valid_types = (
Expand All @@ -884,7 +884,7 @@ class _LocIndexer(_LocationIndexer):
# -------------------------------------------------------------------
# Key Checks

@Appender(_LocationIndexer._validate_key.__doc__)
@doc(_LocationIndexer._validate_key)
def _validate_key(self, key, axis: int):

# valid for a collection of labels (we check their presence later)
Expand Down Expand Up @@ -1343,7 +1343,7 @@ def _validate_read_indexer(
)


@Appender(IndexingMixin.iloc.__doc__)
@doc(IndexingMixin.iloc)
class _iLocIndexer(_LocationIndexer):
_valid_types = (
"integer, integer slice (START point is INCLUDED, END "
Expand Down Expand Up @@ -2079,7 +2079,7 @@ def __setitem__(self, key, value):
self.obj._set_value(*key, value=value, takeable=self._takeable)


@Appender(IndexingMixin.at.__doc__)
@doc(IndexingMixin.at)
class _AtIndexer(_ScalarAccessIndexer):
_takeable = False

Expand All @@ -2099,7 +2099,7 @@ def _convert_key(self, key, is_setter: bool = False):
return tuple(lkey)


@Appender(IndexingMixin.iat.__doc__)
@doc(IndexingMixin.iat)
class _iAtIndexer(_ScalarAccessIndexer):
_takeable = True

Expand Down
3 changes: 1 addition & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2470,8 +2470,7 @@ def __rmatmul__(self, other):
"""
return self.dot(np.transpose(other))

@Substitution(klass="Series")
@Appender(base._shared_docs["searchsorted"])
@doc(base.IndexOpsMixin.searchsorted, klass="Series")
def searchsorted(self, value, side="left", sorter=None):
return algorithms.searchsorted(self._values, value, side=side, sorter=sorter)

Expand Down
14 changes: 8 additions & 6 deletions pandas/tests/util/test_doc.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@ def cumsum(whatever):

@doc(
cumsum,
"""
Examples
--------
dedent(
"""
Examples
--------

>>> cumavg([1, 2, 3])
2
""",
>>> cumavg([1, 2, 3])
2
"""
),
method="cumavg",
operation="average",
)
Expand Down
36 changes: 23 additions & 13 deletions pandas/util/_decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,10 @@ def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[F], F]:
A decorator take docstring templates, concatenate them and perform string
substitution on it.

This decorator is robust even if func.__doc__ is None. This decorator will
add a variable "_docstr_template" to the wrapped function to save original
docstring template for potential usage.
This decorator will add a variable "_doc_args" to the wrapped function to
keep track the original docstring template for potential usage. If it should
be consider as a template, it will be saved as a string. Otherwise, it will
be saved as callable, and later user __doc__ and dedent to get docstring.

Parameters
----------
Expand All @@ -268,17 +269,26 @@ def decorator(func: F) -> F:
def wrapper(*args, **kwargs) -> Callable:
return func(*args, **kwargs)

templates = [func.__doc__ if func.__doc__ else ""]
# collecting and docstring templates
wrapper._doc_args: List[Union[str, Callable]] = [] # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name _doc_args is misleading. In the context of this function makes sense, since it's the value of the args parameters of the doc decorator, but if I see a _doc_args attribute of for example pandas.Series.head, I don't think it really tells what's in it. Something like _docstring_components could be clearer?

Also, when can it contain a callable? Even when we have something like @doc(pandas.Series.head) we're extending it with the content templates of pandas.Series.head, not with the function.

Last thing, what about using a docstring_components variable to avoid all the type: ignore, and just set wrapper._docstring_components = _docstring_components at the end? Probably cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista Thanks for this suggestion, especially for the new variable name. I strongly agree with you here.

if func.__doc__:
wrapper._doc_args.append(dedent(func.__doc__)) # type: ignore

for arg in args:
if isinstance(arg, str):
templates.append(arg)
elif hasattr(arg, "_docstr_template"):
templates.append(arg._docstr_template) # type: ignore
elif arg.__doc__:
templates.append(arg.__doc__)

wrapper._docstr_template = "".join(dedent(t) for t in templates) # type: ignore
wrapper.__doc__ = wrapper._docstr_template.format(**kwargs) # type: ignore
if hasattr(arg, "_doc_args"):
wrapper._doc_args.extend(arg._doc_args) # type: ignore
elif isinstance(arg, str) or arg.__doc__:
wrapper._doc_args.append(arg) # type: ignore

# formatting templates and concatenating docstring
wrapper.__doc__ = "".join(
[
arg.format(**kwargs)
if isinstance(arg, str)
else dedent(arg.__doc__) # type: ignore
for arg in wrapper._doc_args # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm that these # type: ignore stem from the fact that we are adding attributes to the wrapper function?

I guess this was discussed originally, but would using a class for the doc decorator help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point here! Most # type: ignore is because we are adding attributes to function.

The only one not for this reason is else dedent(arg.__doc__) # type: ignore (line 288). This one is because theoretically, arg.__doc__ can be str or None. However, we have checked arg.__doc__ is not None before putting it in, so that we are safe here.

Back to the most # type: ignore cases. I strongly agree with you that we could consider solving this issue without using # type: ignore, because this looks like an abuse. I would like to try putting it under a class. Thanks for this advice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one not for this reason is else dedent(arg.__doc__) # type: ignore (line 288). This one is because theoretically, arg.__doc__ can be str or None. However, we have checked arg.__doc__ is not None before putting it in, so that we are safe here.

i think ok to remove

$ mypy pandas --warn-unused-ignores
pandas\util\_decorators.py:288: error: unused 'type: ignore' comment

Copy link
Contributor Author

@HH-MWB HH-MWB Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins Good catch here! Thanks. Fixing is here.

]
)

return cast(F, wrapper)

Expand Down