Skip to content

DOC: Add missing docstrings #31047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1163,6 +1163,9 @@ def to_frame(self, index=True, name=None):

@property
def name(self):
"""
Return Index or MultiIndex name.
"""
return self._name

@name.setter
Expand Down Expand Up @@ -1644,21 +1647,184 @@ def is_unique(self) -> bool:

@property
def has_duplicates(self) -> bool:
"""
Check if the Index has duplicate values.

Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not the Index has duplicate values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a case like this, this might be a bit duplicative with the first line.

Do we (or the validation script) always require an explanation of the return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the validation script requires a description for return values. Related error code: 'RT03': 'Return value has no description'. I confirmed this by removing one of the explanations, leaving only the return type, and the error appears when I ran python3 scripts/validate_docstrings.py --errors=RT03

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this discussion is certainly not a blocker for this PR, to be clear)

@datapythonista what's your view on this? It's of course easiest to be consistent / have a clear rule in the validation. But personally, I find that it doesn't add any value in this specific case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's probably a bit repetitive. I think it may add value, even if from the short summary and the the name of the function, it should be easier for most people to infer what is the output (what True and False mean), I guess beginners can appreciate having it explicit. It's difficult sometimes to know if what is obvious for us it's for other people.

In any case, assuming it literally doesn't add any value, with all the work we've got with docstrings, I would just simply move forward, since there are so many other things that I think are more important and worth more our time. I think this looks fine to me, even if the repetition is not ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate content can also add noise, as you might need to read both to ensure you don't miss something.

Anyway, not a discussion to continue on this PR


Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True

>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False

>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True

>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
"""
return not self.is_unique

def is_boolean(self) -> bool:
"""
Check if the Index only consists of booleans.

Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index only consists of booleans.

Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True

>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False

>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
"""
return self.inferred_type in ["boolean"]

def is_integer(self) -> bool:
"""
Check if the Index only consists of integers.

Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Returns" section above the "See also"

-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index only consists of integers.

Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True

>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False

>>> idx = pd.Index([1, 2, 3, 4.0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, this is the same as above, as this will be parsed into a FloatIndex. So if we want a third example, I would maybe rather shows strings (or just leave it out)

>>> idx.is_integer()
False
"""
return self.inferred_type in ["integer"]

def is_floating(self) -> bool:
"""
Check if the Index only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First line should be a single line. More information can be added later, in other paragraphs. This is for the index pages, to display correctly.

https://pandas.io/docs/development/contributing_docstring.html#section-1-short-summary


Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.

Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True

>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_floating()
True

>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as above.

I understand that from looking at the lists that are used to create the index, it looks like different cases, but all those are Float64Index objects. So for this case, I find it makes it actually more confusing (it would be rather an example to show in the main Index docstring to illustrate the constructor).

Thoughts?

Showing that it can contain NaN is of course useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; I was thinking by showing it, people who are not familiar with Float64Index objects will now what to expect, but I do agree that it's probably better to be an example to show in the main Index docstring. Will modify it & leave the NaNs

>>> idx.is_floating()
True

>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True

>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> idx.is_integer()
>>> idx.is_floating()

I think this is a typo

False
"""
return self.inferred_type in ["floating", "mixed-integer-float", "integer-na"]

def is_numeric(self) -> bool:
"""
Check if the Index only consists of numeric
data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fit in one line.


Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index only only consists of numeric
data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data.
data.


Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True

>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True

>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True

>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True

>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
"""
return self.inferred_type in ["integer", "floating"]

def is_object(self) -> bool:
"""
Check if the Index is of the object dtype.

Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index is of the object dtype.

Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True

>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True

>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.object()
False

>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
"""
return is_object_dtype(self.dtype)

def is_categorical(self) -> bool:
Expand Down Expand Up @@ -1698,9 +1864,50 @@ def is_categorical(self) -> bool:
return self.inferred_type in ["categorical"]

def is_interval(self) -> bool:
"""
Check if the Index holds Interval objects.

Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index holds Interval objects.

See Also
--------
IntervalIndex : Index for Interval objects.

Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True

>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
"""
return self.inferred_type in ["interval"]

def is_mixed(self) -> bool:
"""
Check if the Index holds data with mixed data types.

Returns
-------
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean
bool

Whether or not the Index holds data with mixed data types.

Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True

>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
"""
return self.inferred_type in ["mixed"]

def holds_integer(self):
Expand All @@ -1718,6 +1925,9 @@ def inferred_type(self):

@cache_readonly
def is_all_dates(self) -> bool:
"""
Whether or not the index values only consist of dates.
"""
return is_datetime_array(ensure_object(self.values))

# --------------------------------------------------------------------
Expand Down