Skip to content

CLN: @doc - base.py & indexing.py #31970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 17, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
Substitution,
cache_readonly,
deprecate_kwarg,
doc,
)
from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs

Expand Down Expand Up @@ -51,7 +52,7 @@
_extension_array_shared_docs,
try_cast_to_ea,
)
from pandas.core.base import NoNewAttributesMixin, PandasObject, _shared_docs
from pandas.core.base import IndexOpsMixin, NoNewAttributesMixin, PandasObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I am in the minority on this view and don't want to be overly difficult, but can you refactor the IndexOpsMixin as a pre-cursor to this, or leave this module separate from the rest of changes (which look good btw)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a good idea. I would convert this back to using _shared_docs. This PR is already too large, and I should really put them separately.

import pandas.core.common as com
from pandas.core.construction import array, extract_array, sanitize_array
from pandas.core.indexers import check_array_indexer, deprecate_ndim_indexing
Expand Down Expand Up @@ -1352,8 +1353,7 @@ def memory_usage(self, deep=False):
"""
return self._codes.nbytes + self.dtype.categories.memory_usage(deep=deep)

@Substitution(klass="Categorical")
@Appender(_shared_docs["searchsorted"])
@doc(IndexOpsMixin.searchsorted, klass="Categorical")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be inheriting from core.arrays.base.ExtensionArray instead? Also I can't see this docstring in the published docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment here - this seems strange to use IndexOpsMixin since it isn't part of this classes hierarchy. Any reason not to address this?

def searchsorted(self, value, side="left", sorter=None):
# searchsorted is very performance sensitive. By converting codes
# to same dtype as self.codes, we get much faster performance.
Expand Down
21 changes: 8 additions & 13 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import builtins
import textwrap
from typing import Dict, FrozenSet, List, Optional, Union
from typing import FrozenSet, List, Optional, Union

import numpy as np

Expand All @@ -13,7 +13,7 @@
from pandas.compat import PYPY
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender, Substitution, cache_readonly, doc
from pandas.util._decorators import cache_readonly, doc
from pandas.util._validators import validate_bool_kwarg

from pandas.core.dtypes.cast import is_nested_object
Expand All @@ -36,7 +36,6 @@
from pandas.core.construction import create_series_with_explicit_dtype
import pandas.core.nanops as nanops

_shared_docs: Dict[str, str] = dict()
_indexops_doc_kwargs = dict(
klass="IndexOpsMixin",
inplace="",
Expand Down Expand Up @@ -1402,26 +1401,26 @@ def memory_usage(self, deep=False):
def factorize(self, sort=False, na_sentinel=-1):
return algorithms.factorize(self, sort=sort, na_sentinel=na_sentinel)

_shared_docs[
"searchsorted"
] = """
@doc(klass="Index")
def searchsorted(self, value, side="left", sorter=None) -> np.ndarray:
"""
Find indices where elements should be inserted to maintain order.

Find the indices into a sorted %(klass)s `self` such that, if the
Find the indices into a sorted {klass} `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.

.. note::

The %(klass)s *must* be monotonically sorted, otherwise
The {klass} *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.

Parameters
----------
value : array_like
Values to insert into `self`.
side : {'left', 'right'}, optional
side : {{'left', 'right'}}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
Expand Down Expand Up @@ -1488,10 +1487,6 @@ def factorize(self, sort=False, na_sentinel=-1):
>>> x.searchsorted(1)
0 # wrong result, correct would be 1
"""

@Substitution(klass="Index")
@Appender(_shared_docs["searchsorted"])
def searchsorted(self, value, side="left", sorter=None) -> np.ndarray:
return algorithms.searchsorted(self._values, value, side=side, sorter=sorter)

def drop_duplicates(self, keep="first", inplace=False):
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pandas._libs.tslibs import timezones
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender, cache_readonly
from pandas.util._decorators import Appender, cache_readonly, doc

from pandas.core.dtypes.common import (
ensure_int64,
Expand All @@ -31,7 +31,7 @@
from pandas.core import algorithms
from pandas.core.arrays import DatetimeArray, PeriodArray, TimedeltaArray
from pandas.core.arrays.datetimelike import DatetimeLikeArrayMixin
from pandas.core.base import _shared_docs
from pandas.core.base import IndexOpsMixin
import pandas.core.indexes.base as ibase
from pandas.core.indexes.base import Index, _index_shared_docs
from pandas.core.indexes.extension import (
Expand Down Expand Up @@ -206,7 +206,7 @@ def take(self, indices, axis=0, allow_fill=True, fill_value=None, **kwargs):
self, indices, axis, allow_fill, fill_value, **kwargs
)

@Appender(_shared_docs["searchsorted"])
@doc(IndexOpsMixin.searchsorted, klass="Datetime-like Index")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment - this is very confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WillAyd, I agree with you and @ simonjayhawkins here. It might be confusing. However, the original docstring is not extended from the base class. It seems like the original code obscures the problem because it does not explicitly indicate the source of the docstring. I try to keep it as it is but use @doc.

It looks like we all agree this docstring template will confuse other developers, but do you feel needed to fix this issue in this PR? If so, what will be your suggestion? One option that comes in my mind will be using the docstring from the base class, and modify them to fit in this case. I was trying to avoid that because it will change the original docstring relations, and I am not sure if we did this on purpose.

Just to be clarified, I am very willing to make the additional change to solve this confusion. I just don't know what will be the best way of doing that. One more thing, I have a comment related to this. You might also be interested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more suitable location for the docstring then? Importing the IndexOpsMixin here is strange

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think this should be addressed in a follow up. It's here in IndexOpsMixin because that's where the _shared_docs was. I think this PR is already too complex to make the change here. And if we change _shared_docs before merging this, the conflict here will be quite annoying to fix. Does it make sense?

Copy link
Contributor Author

@HH-MWB HH-MWB Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more suitable location for the docstring then? Importing the IndexOpsMixin here is strange

For me, I feel this situation might be caused by they are sharing the same interface, but don't have a common ancestor.

If we have an interface declaration, I would say putting docstring there would be a good choice. However, Python as a dynamic programming language, don't have to declare the interface.

Now, I don't have a good solution in mind, but I would like to look for it and see if we can find somewhere that makes more sense.

def searchsorted(self, value, side="left", sorter=None):
if isinstance(value, str):
raise TypeError(
Expand Down
12 changes: 6 additions & 6 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from pandas._libs.indexing import _NDFrameIndexerBase
from pandas._libs.lib import item_from_zerodim
from pandas.errors import AbstractMethodError
from pandas.util._decorators import Appender
from pandas.util._decorators import doc

from pandas.core.dtypes.common import (
is_float,
Expand Down Expand Up @@ -871,7 +871,7 @@ def _getbool_axis(self, key, axis: int):
return self.obj._take_with_is_copy(inds, axis=axis)


@Appender(IndexingMixin.loc.__doc__)
@doc(IndexingMixin.loc)
class _LocIndexer(_LocationIndexer):
_takeable: bool = False
_valid_types = (
Expand All @@ -883,7 +883,7 @@ class _LocIndexer(_LocationIndexer):
# -------------------------------------------------------------------
# Key Checks

@Appender(_LocationIndexer._validate_key.__doc__)
@doc(_LocationIndexer._validate_key)
def _validate_key(self, key, axis: int):

# valid for a collection of labels (we check their presence later)
Expand Down Expand Up @@ -1342,7 +1342,7 @@ def _validate_read_indexer(
)


@Appender(IndexingMixin.iloc.__doc__)
@doc(IndexingMixin.iloc)
class _iLocIndexer(_LocationIndexer):
_valid_types = (
"integer, integer slice (START point is INCLUDED, END "
Expand Down Expand Up @@ -2078,7 +2078,7 @@ def __setitem__(self, key, value):
self.obj._set_value(*key, value=value, takeable=self._takeable)


@Appender(IndexingMixin.at.__doc__)
@doc(IndexingMixin.at)
class _AtIndexer(_ScalarAccessIndexer):
_takeable = False

Expand All @@ -2098,7 +2098,7 @@ def _convert_key(self, key, is_setter: bool = False):
return tuple(lkey)


@Appender(IndexingMixin.iat.__doc__)
@doc(IndexingMixin.iat)
class _iAtIndexer(_ScalarAccessIndexer):
_takeable = True

Expand Down
3 changes: 1 addition & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2470,8 +2470,7 @@ def __rmatmul__(self, other):
"""
return self.dot(np.transpose(other))

@Substitution(klass="Series")
@Appender(base._shared_docs["searchsorted"])
@doc(base.IndexOpsMixin.searchsorted, klass="Series")
def searchsorted(self, value, side="left", sorter=None):
return algorithms.searchsorted(self._values, value, side=side, sorter=sorter)

Expand Down
3 changes: 2 additions & 1 deletion pandas/util/_decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,8 @@ def wrapper(*args, **kwargs) -> Callable:
elif hasattr(arg, "_docstr_template"):
templates.append(arg._docstr_template) # type: ignore
elif arg.__doc__:
templates.append(arg.__doc__)
doc_tmp = arg.__doc__.replace("{", "{{").replace("}", "}}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, do you mind explaining why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, a short quick answer:
This change solves the error when we want to reference the docstring from function which not using doc decorator.

You must already know this, but just want to clarify this again so that helps others catch up:
When we format string in Python, we use { and } as a place holder, and use the arguments to replace the place holder. For example, 'Hello {name}!'.format(name='Tom') will become 'Hello Tom!'. However, if we want to show { or } from a formatted string, we have to put {{ and }}. For example, 'Hello {{name}}!'.format(name='Tom') will become 'Hello {Tom}!'.

Back to our case:
In the docstring, { and } are very commonly used to represent set. The old implementation will just take the original docstring, and then { and } will be considered as a place holder. This will cause an error. We can fix that by switching { and } to {{ and }} in the doc decorator. Then, it will be shown as { and }.

Please take the changes in this file as an example:
The docstring of IndexingMixin.iloc has { and }, these changes will allow us to do something like this:

@doc(IndexingMixin.iloc)
class _iLocIndexer(_LocationIndexer):

I am not sure if I explain the reason clear enough, please feel free to come with follow up questions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't get it. I see these cases:

Not using @doc:

  • Enumeration: {'a', 'b'}
  • Variable: Doesn't apply

Using @doc:

  • Enumeration: {{'a', 'b'}}
  • Variable: {name}

I guess what this solves is taking a docstring from a non-decorated function, and use it with the decorator, converting it from the first case to the second. But this feels a bit hacky, and I'm not sure if it would make more sense to apply .format() only when it makes sense instead. I'm a bit worried on making this too complex and tricky, and having unexpected side-cases.

And if we need to keep this implementation, we need to add comments so a reader can understand what's going on, and what these if cases mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what this solves is taking a docstring from a non-decorated function, and use it with the decorator, converting it from the first case to the second.

Yes, exactly! Thanks for helping me explain this. That is want I try to say.

... I'm not sure if it would make more sense to apply .format() only when it makes sense instead.

That make sense to me. I would like to give it a try.

I'm a bit worried on making this too complex and tricky, and having unexpected side-cases.

I agree with you here. I would like to try the idea above first, and a backup solution might be using string template to avoid this case.

Copy link
Contributor Author

@HH-MWB HH-MWB Feb 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I'm not sure if it would make more sense to apply .format() only when it makes sense instead.

Hi @datapythonista, I have modified the @doc. Now it won't apply .format() to the docstring from non-decorated function.

templates.append(doc_tmp)

wrapper._docstr_template = "".join(dedent(t) for t in templates) # type: ignore
wrapper.__doc__ = wrapper._docstr_template.format(**kwargs) # type: ignore
Expand Down