Skip to content

DOC: update the Series.str.ismethods docstring #20913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 128 additions & 2 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -2401,11 +2401,137 @@ def rindex(self, sub, start=0, end=None):

_shared_docs['ismethods'] = ("""
Check whether all characters in each string in the Series/Index
are %(type)s. Equivalent to :meth:`str.%(method)s`.
are %(type)s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short summary should fit in the first line. I'd get rid of in the Series/Index.


This is equivalent to running the Python string method
:meth:`str.%(method)s` for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.

Returns
-------
is : Series/array of boolean values
Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in most cases we use Series or Index of bool as the return type. I'd keep the same for consistency.

Series of boolean values with the same length as the original
Series/Index.

See Also
--------
Series.str.isalpha : Check whether all characters are alphabetic.
Series.str.isnumeric : Check whether all characters are numeric.
Series.str.isalnum : Check whether all characters are alphanumeric.
Series.str.isdigit : Check whether all characters are digits.
Series.str.isdecimal : Check whether all characters are decimal.
Series.str.isspace : Check whether all characters are whitespace.
Series.str.islower : Check whether all characters are lowercase.
Series.str.isupper : Check whether all characters are uppercase.
Series.str.istitle : Check whether all characters are titlecase.

Examples
--------
**Checks for Alphabetic and Numeric Characters**

>>> s1 = pd.Series(['AB', 'C12', '42', ''])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't done it anywhere yet, but I think it could make sense to add an index with the same values. It'd make very easy to follow the examples.

As a personal opinion, I find a bit distracting using too arbitrary examples. I'd prefer a real-world example, but as seems difficult for this case, I'd prefer something like ['one', 'one1', '1', ''], which makes it obvious what is being shown, and don't let the users guessing why 42 and not 43.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking I understand, do you mean having the series values and index be the same so the values are displayed side by side like below?

>>> s1 = pd.Series(data=['one', 'one1', '1', ''], 
                   index=['one', 'one1', '1', ''])

>>> s1.str.isalpha()
one      True
one1    False
1       False
        False
dtype: bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an idea, but yes, that's what I meant. We haven't done it in any docstring yet, afaik. But I think it makes very easy to see which value is true for each method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make comparisons easier, though I find having a blank in the last index position spoils it a bit. Could explicitly label it with 'empty string' as below?

>>> s1.str.isalpha()
one              True
one1            False
1               False
empty string    False
dtype: bool


>>> # All are alphabetic characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find these comments before each function unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can strip them out where it's more self-explanatory, though I'm thinking to keep the ones in the section on More Detailed Checks for Numeric Characters as I found this the most confusing part. i.e. why there are so many checks for numeric values and what the difference between them is. Or I could put these as text instead of comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, the ones that you think the example is not clear by itself I think it's better to have the explanations as text and not as code comments.

>>> s1.str.isalpha()
0 True
1 False
2 False
3 False
dtype: bool

>>> # All are numeric characters
>>> s1.str.isnumeric()
0 False
1 False
2 True
3 False
dtype: bool

>>> # All are either alphabetic characters or numeric characters
>>> s1.str.isalnum()
0 True
1 True
2 True
3 False
dtype: bool

Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.

>>> s2 = pd.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum()
0 False
1 False
2 False
dtype: bool

**More Detailed Checks for Numeric Characters**

>>> s3 = pd.Series(['23', '³', '⅕', ''])

>>> # All are characters used to form numbers in base 10
>>> s3.str.isdecimal()
0 True
1 False
2 False
3 False
dtype: bool

>>> # Same as s.str.isdecimal, but also includes special
>>> # digits, like superscripted/subscripted digits
>>> s3.str.isdigit()
0 True
1 True
2 False
3 False
dtype: bool

>>> # Same as s.str.isdigit, but also includes other characters
>>> # that can represent quantities such as unicode fractions
>>> s3.str.isnumeric()
0 True
1 True
2 True
3 False
dtype: bool

**Checks for Whitespace**

>>> # All characters represent whitespace
>>> s4 = pd.Series([' ','\\t\\r\\n ', ''])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find a bit clearer to use r'\t\r\n' than '\\t\\r\\n'

Copy link
Contributor Author

@MrKriss MrKriss May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using r'\t\r\n' seems to interfere with the validation script causing an error, likely because the newline and whitespace characters are still rendered as the docstring is parsed. I can correct for this if I make the whole docstring a raw string, but that seems a bit drastic, and not sure if it would have any other consequences. Would sticking with \\t\\r\\n be preferable over this?

>>> s4.str.isspace()
0 True
1 True
2 False
dtype: bool

**Checks for Character Case**

>>> s5 = pd.Series(['leopard', 'Golden Eagal', 'SNAKE', ''])

>>> # All characters are lowercase
>>> s5.str.islower()
0 True
1 False
2 False
3 False
dtype: bool

>>> # All characters are uppercase
>>> s5.str.isupper()
0 False
1 False
2 True
3 False
dtype: bool

>>> # All words are in title case (first letter of each word capitalized)
>>> s5.str.istitle()
0 False
1 True
2 False
3 False
dtype: bool
""")
_shared_docs['isalnum'] = dict(type='alphanumeric', method='isalnum')
_shared_docs['isalpha'] = dict(type='alphabetic', method='isalpha')
Expand Down