Skip to content

DOC: Improve the docstring of pd.Index.contains and closes PR #20211 #23100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 7, 2018

Conversation

TanyaaCJain
Copy link
Contributor

This PR is an update to PR #20211, improving the Index.contains docstring.

@pep8speaks
Copy link

Hello @Tanya-Jain! Thanks for submitting the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, I added some comments about minor things.


See Also
--------
CategoricalIndex : Returns a bool if the 1-dimensional, Categorical key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoricalIndex is the class, do you mean CategoricalIndex.contains here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean this. Apparently, there isn't any docstring written for the same. Should I write something like
CategoricalIndex : Its :method: contains returns a bool if the key is in index. The key can be of the type same as this class, like the Index.contains
or should I still refer to CategoricalIndex.contains : Returns a bool if the key is in index . . .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Index class is a parent of CategoricalIndex as well as all the other index classes like Float64Index... It could make sense to list CategoricalIndex.contains, Float64Index.contains... but I don't think it adds that much value in this case, and it adds noise to the docstring.

I think the best is just leave Index.isin, but it's up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree as CategoricalIndex.contains behaves similarly to Index.contains. Let's just keep Index.isin.


Examples
--------
>>> l1 = pd.Index([1, 2, (3, 4), 5])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention we use idx for variables soring an index in the examples (df for dataframes, and s for series). Can you use that?

Also, can you show the content of the index after you store it (i.e. >>> idx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can change the variable naming and show the content, it would be better.

>>> l1 = pd.Index([1, 2, (3, 4), 5])
>>> t = (3, 4)
>>> num1 = 1
>>> num2 = 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd save a bit of space by using the values directly in the calls, instead of saving variables first (e.g. idx.contains(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will improve this.


Parameters
----------
key : object
key : label
The key requested. Immutable-like, 1-dimensional.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While true, I think users will find this a bit confusing. I think it probably makes more sense to avoid saying that is 1-dimensional, as it gives the impression that more than one key can be provided.

Personally, I think the users should look for what a label can be (immutable, 1-dimensional if a tuple...) in some other part of the documentation, and here focus more on what contains exactly expects (the key that will be searched in the index). But if you prefer to be specific, I'd start this by something like The key can be of the same type as the labels in index, so it needs to be immutable....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I tried to explain, thank you!

@datapythonista datapythonista added Docs Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 12, 2018
@datapythonista
Copy link
Member

Any reason to close this? I guess it's been by mistake?

@codecov
Copy link

codecov bot commented Oct 12, 2018

Codecov Report

Merging #23100 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23100      +/-   ##
==========================================
- Coverage    92.2%    92.2%   -0.01%     
==========================================
  Files         162      162              
  Lines       51701    51700       -1     
==========================================
- Hits        47671    47670       -1     
  Misses       4030     4030
Flag Coverage Δ
#multiple 90.6% <100%> (-0.01%) ⬇️
#single 43.02% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.55% <100%> (ø) ⬆️
pandas/core/indexes/category.py 97.9% <100%> (ø) ⬆️
pandas/core/indexes/period.py 93.06% <100%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 97.29% <100%> (ø) ⬆️
pandas/core/indexes/base.py 96.27% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03134cb...93dcd9d. Read the comment docs.

@TanyaaCJain
Copy link
Contributor Author

#23100 contained an additional commit for merging commits from the pandas-dev/pandas master.
To have only the commit for the file changes made in base.py, I closed #23100 and opened #23107

@TanyaaCJain
Copy link
Contributor Author

I tried to get the problem fixed but ended up having 0 commits being shown as well as having this PR automatically closed.

- Description for key has more clarity on its behaviour.
- Removed :class: `CategoricalIndex` from the See Also section as
  the methods: CategoricalIndex.contains, Float64Index.contains and
  like, behave similarly to `Index.contains`. Hence, keeping only
  `Index.isin`
- Use `idx` for naming variables for objects of :class: `Index`
  instead of `l` for passing list-like key, to encourage pandas
  docstring standards and clarity.
- Directly call the values in :method: `Index.contains` to prevent
  additional memory usage.
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good. Just couple of small things, and it's ready to be merged.

@@ -2005,15 +2005,35 @@ def __contains__(self, key):
return False

_index_shared_docs['contains'] = """
return a boolean if this key is IN the index
Return a boolean if this key is in the index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, and I know this is the original description and not yours, but this sounds like it a boolean is returned only if the key is in the index. Not that a boolean is always returned, on whether the key is in the index. Do you mind rephrasing it to be more correct please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll rephrase it in the next commit as a boolean indicator. Apparently, there are few more docstrings with similar phrasing like Series.is_unique, Series.is_monotonic, Series.is_monotonic_increasing, Series.is_monotonic_decreasing. These too can be corrected in future if seems appropriate.


Parameters
----------
key : object
key : label
The key can be of the same type as the label of :class: `Index`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the space after :class: is not needed. Did you check if this is rendered correctly in the html version? You should be able to generate this page with ./doc/make.py html --single=pandas.Index.contains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it renders correctly after removing the space after :class:

>>> idx
Index([1, 2, (3, 4), 5], dtype='object')
>>> idx.contains((3,4))
True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, but do you mind moving this example after the other two? I think it helps if we show the simple/easy cases first, and the more complex later.

Also, you there is a missing space after the comma.

When you're done with the changes, it may be helpful to run ./scripts/validate_docstrings.py pandas.Index.contains to make sure there are not errors (I don't think there are).

And also flake8 --doctests pandas/core/indexes/base.py. Note that this will report all doctests problems in all the docstrings of the file. Just look that there is no problem in the lines of this docstring 2008-2036.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the said changes.

Validation tests passed with a warning of absence of an extended description which has been suggested to avoid in the review of the original PR. Tested via command ./scripts/validate_docstrings.py pandas.Index.contains

No doctests problem were reported for lines 2008-2036 by testing via command flake8 --doctests pandas/core/indexes/base.py

- Rephrased description as a boolean indicator
- Removed a space after :class: for correct rendering checked by
  the command `./doc/make.py html --single=pandas.Index.contains`
- Moved the easier example cases first.
- Added the missing space after comma following PEP8
- Validation tests passed with a warning of absence of an extended
  description. Tested via command `./scripts/validate_docstrings.py
  pandas.Index.contains`
- No doctests problem were reported for lines 2008-2036 by testing
  via command `flake8 --doctests pandas/core/indexes/base.py`
@TanyaaCJain
Copy link
Contributor Author

  • The html page generates correctly with ./doc/make.py html --single=pandas.Index.contains
  • Validation tests passed with a warning of absence of an extended description as asked in the DOC: update the pd.Index.contains docstring #20211 PR review. Tested via command ./scripts/validate_docstrings.py pandas.Index.contains
  • No doctests problem were reported for lines 2008-2036 by testing via command flake8 --doctests pandas/core/indexes/base.py

Validation Script:

################################################################################
###################### Docstring (pandas.Index.contains)  ######################
################################################################################

Return a boolean indicating whether this key is in the index.

Parameters
----------
key : label
    The key can be of the same type as the label of :class:`Index`,
    hence immutable-like and 1-dimensional if it is a tuple.

Returns
-------
bool
    Result indicating whether the key search is in the index.

See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
    list-like key is in the index.

Examples
--------
>>> idx = pd.Index([1, 2, (3, 4), 5])
>>> idx
Index([1, 2, (3, 4), 5], dtype='object')
>>> idx.contains(1)
True
>>> idx.contains(6)
False
>>> idx.contains((3, 4))
True

################################################################################
################################## Validation ##################################
################################################################################

Warnings found:
	No extended summary found
Docstring for "pandas.Index.contains" correct. :)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks @Tanya-Jain

@datapythonista
Copy link
Member

@jreback if you don't mind reviewing and merging this one too

@jreback jreback added this to the 0.24.0 milestone Nov 20, 2018

Examples
--------
>>> idx = pd.Index([1, 2, (3, 4), 5])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer an example with a simple values like pd.Index(list('abcd')). This is a rather esoteric case.

@jreback jreback removed this from the 0.24.0 milestone Nov 27, 2018
@datapythonista
Copy link
Member

@Tanya-Jain do you have time to address the comments of the last review and fix the conflicts?

@datapythonista datapythonista self-assigned this Dec 7, 2018
@datapythonista
Copy link
Member

@jreback addressed your comments, and made some other fixes to this docstring, if you can take a look.

@jreback jreback added this to the 0.24.0 milestone Dec 7, 2018
@jreback jreback merged commit f492be6 into pandas-dev:master Dec 7, 2018
@jreback
Copy link
Contributor

jreback commented Dec 7, 2018

thanks @Tanya-Jain and @datapythonista

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants