Skip to content

DOC: update the pandas.Series.str.startswith docstring #20458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 25, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 28 additions & 5 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,19 +328,42 @@ def str_contains(arr, pat, case=True, flags=0, na=np.nan, regex=True):

def str_startswith(arr, pat, na=np.nan):
"""
Return boolean Series/``array`` indicating whether each string in the
Series/Index starts with passed pattern. Equivalent to
:meth:`str.startswith`.
Test if the start of each string element matches a pattern.

Return a Series of booleans indicating whether the given pattern matches
the start of each string element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the description of the Return section for this comment.

Equivalent to :meth: `str.startswith`.

Parameters
----------
pat : string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use str instead of string. I know it was like this, but can you please update it?

Character sequence
na : bool, default NaN
Character sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pat be a regex? I don't think it can. Best to state that explicitly.

na : object, default NaN
Character sequence shown if element tested is not a string.

Returns
-------
startswith : Series/array of boolean values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would better follow the convention used in other docstrings Series or array-like of bool


Examples
--------
>>> s = pd.Series(['bat', 'bear', 'cat'])
>>> s.str.startswith('b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I wasn't clear in part of my last comment. The explanation you added looks great, that part is perfect.

But what I meant with adding >>> s is:

>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan])
>>> s
(the user can see the series here)

So, you'd have a first block (ended with a blank line to make it a different box), where the user can see the data you'll be using in both examples.

Then you show the basic usage (line 357 currently), then the explanation of the second example you added. And for the second example you don't need to create the series again, as you had before.

So it'd be simply adding >>> s, its output and a blank linne after L356. And removing L366. That would be the standard way use in most examples, and IMO the clearest.

Sorry my previous comment was confusing.

0 True
1 True
2 False
dtype: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in more cases we show >>> s after defining it, and we leave a blank line between the different examples, so they are in different boxes in the html. Also, for the last example a short explanation of what you are doing could be useful for users.

You can see an example of what I mean here: https://github.com/dcreekp/pandas/blob/39f76413374109d8c34021a6b61d121d3d05c9a0/pandas/core/strings.py#L1347 probably we don't need many explanations in this case, as the example is quite obvious, but a short sentence for the last case could help users see what's going on faster.

>>> s = pd.Series(['bat', 'bear', 'cat', np.nan])
>>> s.str.startswith('b', na='not_a_string')
0 True
1 True
2 False
3 not_a_string
dtype: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reuse the same Series with the NaN for both examples, with the na by default, and with a value. I think the example would be a bit more realistic with na=False.

Also, I think it could help some users with one of the animals starts with a capital B, which is not matched, so they see that this is case-senstitive.


See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the See Also before the Examples.

--------
endswith : same as startswith, but tests the end of string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add str.startswith.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, I think your endswith is linked by Sphinx to pandas.endswith which doesn't exist. I think the right way would be Series.str.endswith. Similar examples link to the Python standard library version, str.startswith (the one @TomAugspurger says).

If I'm not wrong, the descriptions in the See Also section should start with a capital letter and finish with a period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to get a link, have tried:
str_endswith
Series.str_endswith
Series.str.endswith
Series.string.endswith
endswith
Series.str.str_endswith

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry we should document this better. Two things are going on

  1. Sphinx / numpydoc has some weird rules around name resolution. I can't find the docs right now, but I think it's same class, same module, elsewhere.
  2. The items will only become links if both API pages are built.

So I think either endswith or Series.str.endswith will work, but Series.str.endswith may be a bit clearer. To validate that, you can build endswith first, then startswith.

"""
f = lambda x: x.startswith(pat)
return _na_map(f, arr, na, dtype=bool)
Expand Down