-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improve the docstring of pd.Index.contains and closes PR #20211 #23100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @Tanya-Jain! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, I added some comments about minor things.
pandas/core/indexes/base.py
Outdated
|
||
See Also | ||
-------- | ||
CategoricalIndex : Returns a bool if the 1-dimensional, Categorical key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CategoricalIndex
is the class, do you mean CategoricalIndex.contains
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean this. Apparently, there isn't any docstring written for the same. Should I write something like
CategoricalIndex : Its :method: contains returns a bool if the key is in index. The key can be of the type same as this class, like the Index.contains
or should I still refer to CategoricalIndex.contains : Returns a bool if the key is in index . . .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Index
class is a parent of CategoricalIndex
as well as all the other index classes like Float64Index
... It could make sense to list CategoricalIndex.contains
, Float64Index.contains
... but I don't think it adds that much value in this case, and it adds noise to the docstring.
I think the best is just leave Index.isin
, but it's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree as CategoricalIndex.contains
behaves similarly to Index.contains
. Let's just keep Index.isin
.
pandas/core/indexes/base.py
Outdated
|
||
Examples | ||
-------- | ||
>>> l1 = pd.Index([1, 2, (3, 4), 5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we use idx
for variables soring an index in the examples (df
for dataframes, and s
for series). Can you use that?
Also, can you show the content of the index after you store it (i.e. >>> idx
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can change the variable naming and show the content, it would be better.
pandas/core/indexes/base.py
Outdated
>>> l1 = pd.Index([1, 2, (3, 4), 5]) | ||
>>> t = (3, 4) | ||
>>> num1 = 1 | ||
>>> num2 = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd save a bit of space by using the values directly in the calls, instead of saving variables first (e.g. idx.contains(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will improve this.
pandas/core/indexes/base.py
Outdated
|
||
Parameters | ||
---------- | ||
key : object | ||
key : label | ||
The key requested. Immutable-like, 1-dimensional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While true, I think users will find this a bit confusing. I think it probably makes more sense to avoid saying that is 1-dimensional, as it gives the impression that more than one key can be provided.
Personally, I think the users should look for what a label can be (immutable, 1-dimensional if a tuple...) in some other part of the documentation, and here focus more on what contains
exactly expects (the key that will be searched in the index). But if you prefer to be specific, I'd start this by something like The key can be of the same type as the labels in index, so it needs to be immutable...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I tried to explain, thank you!
Any reason to close this? I guess it's been by mistake? |
Codecov Report
@@ Coverage Diff @@
## master #23100 +/- ##
==========================================
- Coverage 92.2% 92.2% -0.01%
==========================================
Files 162 162
Lines 51701 51700 -1
==========================================
- Hits 47671 47670 -1
Misses 4030 4030
Continue to review full report at Codecov.
|
1d83db4
to
12a0dc4
Compare
I tried to get the problem fixed but ended up having 0 commits being shown as well as having this PR automatically closed. |
- Description for key has more clarity on its behaviour. - Removed :class: `CategoricalIndex` from the See Also section as the methods: CategoricalIndex.contains, Float64Index.contains and like, behave similarly to `Index.contains`. Hence, keeping only `Index.isin` - Use `idx` for naming variables for objects of :class: `Index` instead of `l` for passing list-like key, to encourage pandas docstring standards and clarity. - Directly call the values in :method: `Index.contains` to prevent additional memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks very good. Just couple of small things, and it's ready to be merged.
pandas/core/indexes/base.py
Outdated
@@ -2005,15 +2005,35 @@ def __contains__(self, key): | |||
return False | |||
|
|||
_index_shared_docs['contains'] = """ | |||
return a boolean if this key is IN the index | |||
Return a boolean if this key is in the index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, and I know this is the original description and not yours, but this sounds like it a boolean is returned only if the key is in the index. Not that a boolean is always returned, on whether the key is in the index. Do you mind rephrasing it to be more correct please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll rephrase it in the next commit as a boolean indicator. Apparently, there are few more docstrings with similar phrasing like Series.is_unique
, Series.is_monotonic
, Series.is_monotonic_increasing
, Series.is_monotonic_decreasing
. These too can be corrected in future if seems appropriate.
pandas/core/indexes/base.py
Outdated
|
||
Parameters | ||
---------- | ||
key : object | ||
key : label | ||
The key can be of the same type as the label of :class: `Index`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the space after :class:
is not needed. Did you check if this is rendered correctly in the html version? You should be able to generate this page with ./doc/make.py html --single=pandas.Index.contains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it renders correctly after removing the space after :class:
pandas/core/indexes/base.py
Outdated
>>> idx | ||
Index([1, 2, (3, 4), 5], dtype='object') | ||
>>> idx.contains((3,4)) | ||
True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, but do you mind moving this example after the other two? I think it helps if we show the simple/easy cases first, and the more complex later.
Also, you there is a missing space after the comma.
When you're done with the changes, it may be helpful to run ./scripts/validate_docstrings.py pandas.Index.contains
to make sure there are not errors (I don't think there are).
And also flake8 --doctests pandas/core/indexes/base.py
. Note that this will report all doctests problems in all the docstrings of the file. Just look that there is no problem in the lines of this docstring 2008-2036.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the said changes.
Validation tests passed with a warning of absence of an extended description which has been suggested to avoid in the review of the original PR. Tested via command ./scripts/validate_docstrings.py pandas.Index.contains
No doctests problem were reported for lines 2008-2036 by testing via command flake8 --doctests pandas/core/indexes/base.py
- Rephrased description as a boolean indicator - Removed a space after :class: for correct rendering checked by the command `./doc/make.py html --single=pandas.Index.contains` - Moved the easier example cases first. - Added the missing space after comma following PEP8 - Validation tests passed with a warning of absence of an extended description. Tested via command `./scripts/validate_docstrings.py pandas.Index.contains` - No doctests problem were reported for lines 2008-2036 by testing via command `flake8 --doctests pandas/core/indexes/base.py`
Validation Script:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thanks @Tanya-Jain
@jreback if you don't mind reviewing and merging this one too |
pandas/core/indexes/base.py
Outdated
|
||
Examples | ||
-------- | ||
>>> idx = pd.Index([1, 2, (3, 4), 5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer an example with a simple values like pd.Index(list('abcd'))
. This is a rather esoteric case.
@Tanya-Jain do you have time to address the comments of the last review and fix the conflicts? |
@jreback addressed your comments, and made some other fixes to this docstring, if you can take a look. |
thanks @Tanya-Jain and @datapythonista |
git diff upstream/master -u -- "*.py" | flake8 --diff
This PR is an update to PR #20211, improving the
Index.contains
docstring.