Skip to content

DOC: Improve docstring for pandas.Index.repeat #19985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 9, 2018
26 changes: 22 additions & 4 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -695,13 +695,31 @@ def memory_usage(self, deep=False):
# ops compat
@deprecate_kwarg(old_arg_name='n', new_arg_name='repeats')
def repeat(self, repeats, *args, **kwargs):
"""
Repeat elements of an Index. Refer to `numpy.ndarray.repeat`
for more information about the `repeats` argument.
"""Repeat elements of an Index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be useful to have an extended summary explaining why this method is useful. I mean, in which real cases this method could be used. I'm not sure myself, I don't think I've ever used it, but I think it should be very valuable to let users quickly understand whether this was the method they were looking for or not.

Also, the extended summary could explain how they are repeated. In the examples it's shown that they are repeated like A, A, B, B and not A, B, A, B. And I think it'd be nice to also have a bit more information about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista how does this look? Definitely contrived, but I don't really see this function being that useful. Series.repeat on the other hand...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, as you say yourself it is rather contrived.
If we don't find a good example, I would rather leave it out than adding in one that is a bit strange.

The main reason this exists is because of numpy compatibility (Index was previously a numpy array subclass).
I think the method can be useful, but in very specific cases where you need something like that, too specific for the docstring. I think showing a generic example from which it is clear what it does (but without a why) is good enough in this case (like your first example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will change and send another commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche, It's just a personal opinion, but if I ever end up in the doc of this method, I'd love to find something like exactly what you said, The main reason this exists is because of numpy compatibility (Index was previously a numpy array subclass)..

May be it's just me, but if after reading the documentation I still don't get what a function is for, and in which cases it should be useful for me, I think the documentation is not doing a good job. If a function still exists for legacy/compatibility reasons, I personally find it very useful to let the users know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it's just me, but if after reading the documentation I still don't get what a function is for, and in which cases it should be useful for me, I think the documentation is not doing a good job.

I think with the current example, it should be clear what it does, but I agree the why I would use it is less clear.

If a function still exists for legacy/compatibility reasons, I personally find it very useful to let the users know.

That's the orginal reason it is there, but nonetheless it has probably a use case (in contrast with some other methods/attributes like Series.imag which are really only there for historical reasons, but those are also not listed in api.rst).
I know I already used it (or numpy's couterpart, not sure), but it was probably a little part of a complex analysis flow and can't remember exactly why I used it. And anyhow without the full context, it might not make much sense or be difficult to understand.

I understand your point, and it would indeed be good to include a good example illustrating a real-world use case, but then we need to come up with one ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this docstring look? Anything I need to change?


Parameters
----------
repeats : int
The number of repetitions for each element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still some discussion on whether we want to document args and kwargs, but I think we should. If they are ignored, I think it'd be useful for users (and also developers) to know it, and why they are present. If they are being sent to numpy, that's also something worth mentioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are for numpy compat (only a small set of methods are like this). see the validate_repeat below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the *kwargs are not ignored, but you get an informative error message if you try to pass any additional keyword (also in case of the numpy ones like axis). So therefore I would personally not document them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that mentioned **kwargs are for numpy compat is not a bad idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was only for the nice error message, but indeed it is also accepting (not raising) the argument when it is the default value (in this case axis=None).
In that case, it's fine for me as well to actually document this


Returns
-------
pandas.Index
Newly created Index with repeated elements.

See also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, "See Also" should have capital "A"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct (although that numpydoc will still recognize it correctly, as it tries to capitalize the words if they are not)

--------
numpy.ndarray.repeat
numpy.repeat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some information on why numpy.repeat is in the See Also, and differences among them if they exist would be useful.

Also, that's more an open question, but could make sense to list here something like Index.is_unique for example? Probably if we describe better real use-cases, we'll find other related methods.


Examples
--------
>>> idx = pd.Index([1, 2, 3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
>>> idx.repeat(2)
Int64Index([1, 1, 2, 2, 3, 3], dtype='int64')
>>> idx.repeat(3)
Int64Index([1, 1, 1, 2, 2, 2, 3, 3, 3], dtype='int64')
"""
nv.validate_repeat(args, kwargs)
return self._shallow_copy(self._values.repeat(repeats))
Expand Down