Skip to content

DOC: update the pandas.Series.str.startswith docstring #20458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 25, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 31 additions & 7 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,19 +328,43 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):

def str_startswith(arr, pat, na=np.nan):
"""
Return boolean Series/``array`` indicating whether each string in the
Series/Index starts with passed pattern. Equivalent to
:meth:`str.startswith`.
Test if the start of each string element matches a pattern.

Equivalent to :meth:`str.startswith`.

Parameters
----------
pat : string
Character sequence
na : bool, default NaN
pat : str
Character sequence. Regular expressions are not accepted.
na : object, default NaN
Object shown if element tested is not a string.

Returns
-------
startswith : Series/array of boolean values
startswith : Series or array-like of bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking the other methods of the .str. accessor, and seems like we're a bit inconsistent with the type. In some cases we use Series/array or Series or array-like, and in some others Series or Index of whatever. I think the latter is more specific, as I don't think currently this method can return a numpy array or a Python list.

We can change it at a later point, as some unifying would be nice. But if you want to change it now to Series or Index of bool we'd have this right already.

Also, the startswith : here is something that we don't want to use anymore, unless the function/method returns more than one value. So you can leave just the type.

A Series of booleans indicating whether the given pattern matches
the start of each string element.

See Also
--------
str_endswith : Same as startswith, but tests the end of string.
str.startswith : Python standard library string method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some bug preventing the links to be generated when you build the html with the --single option. If you build the whole documenation doc/make.py html, it will take something like 5 minutes to complete, but I think you should have the links working.

Replacing str_endswith with Series.str.endswith is the right way.

Also, I'd personally add Series.str.contains, which also looks for the pattern, but in any position. So, I think it could be useful for some users visiting the page to know about it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thanks.
The entire build one shows the links properly.
Added contains


Examples
--------
>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan])
>>> s.str.startswith('b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I wasn't clear in part of my last comment. The explanation you added looks great, that part is perfect.

But what I meant with adding >>> s is:

>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan])
>>> s
(the user can see the series here)

So, you'd have a first block (ended with a blank line to make it a different box), where the user can see the data you'll be using in both examples.

Then you show the basic usage (line 357 currently), then the explanation of the second example you added. And for the second example you don't need to create the series again, as you had before.

So it'd be simply adding >>> s, its output and a blank linne after L356. And removing L366. That would be the standard way use in most examples, and IMO the clearest.

Sorry my previous comment was confusing.

0 True
1 False
2 False
3 NaN
dtype: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reuse the same Series with the NaN for both examples, with the na by default, and with a value. I think the example would be a bit more realistic with na=False.

Also, I think it could help some users with one of the animals starts with a capital B, which is not matched, so they see that this is case-senstitive.

>>> s.str.startswith('b', na=False)
0 True
1 False
2 False
3 False
dtype: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in more cases we show >>> s after defining it, and we leave a blank line between the different examples, so they are in different boxes in the html. Also, for the last example a short explanation of what you are doing could be useful for users.

You can see an example of what I mean here: https://github.com/dcreekp/pandas/blob/39f76413374109d8c34021a6b61d121d3d05c9a0/pandas/core/strings.py#L1347 probably we don't need many explanations in this case, as the example is quite obvious, but a short sentence for the last case could help users see what's going on faster.

"""
f = lambda x: x.startswith(pat)
return _na_map(f, arr, na, dtype=bool)
Expand Down