-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.Series.str.startswith docstring #20458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.Series.str.startswith docstring #20458
Conversation
pandas/core/strings.py
Outdated
3 not_a_string | ||
dtype: object | ||
|
||
See Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the See Also before the Examples.
pandas/core/strings.py
Outdated
|
||
See Also | ||
-------- | ||
endswith : same as startswith, but tests the end of string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add str.startswith
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong, I think your endswith
is linked by Sphinx to pandas.endswith
which doesn't exist. I think the right way would be Series.str.endswith
. Similar examples link to the Python standard library version, str.startswith
(the one @TomAugspurger says).
If I'm not wrong, the descriptions in the See Also
section should start with a capital letter and finish with a period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't seem to get a link, have tried:
str_endswith
Series.str_endswith
Series.str.endswith
Series.string.endswith
endswith
Series.str.str_endswith
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry we should document this better. Two things are going on
- Sphinx / numpydoc has some weird rules around name resolution. I can't find the docs right now, but I think it's same class, same module, elsewhere.
- The items will only become links if both API pages are built.
So I think either endswith
or Series.str.endswith
will work, but Series.str.endswith
may be a bit clearer. To validate that, you can build endswith first, then startswith.
pandas/core/strings.py
Outdated
|
||
Parameters | ||
---------- | ||
pat : string | ||
Character sequence | ||
na : bool, default NaN | ||
Character sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can pat
be a regex? I don't think it can. Best to state that explicitly.
Codecov Report
@@ Coverage Diff @@
## master #20458 +/- ##
==========================================
+ Coverage 91.8% 91.85% +0.04%
==========================================
Files 152 152
Lines 49223 49231 +8
==========================================
+ Hits 45191 45220 +29
+ Misses 4032 4011 -21
Continue to review full report at Codecov.
|
pandas/core/strings.py
Outdated
Test if the start of each string element matches a pattern. | ||
|
||
Return a Series of booleans indicating whether the given pattern matches | ||
the start of each string element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use the description of the Return
section for this comment.
pandas/core/strings.py
Outdated
|
||
Return a Series of booleans indicating whether the given pattern matches | ||
the start of each string element. | ||
Equivalent to :meth: `str.startswith`. | ||
|
||
Parameters | ||
---------- | ||
pat : string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use str
instead of string
. I know it was like this, but can you please update it?
pandas/core/strings.py
Outdated
na : bool, default NaN | ||
Character sequence. | ||
na : object, default NaN | ||
Character sequence shown if element tested is not a string. | ||
|
||
Returns | ||
------- | ||
startswith : Series/array of boolean values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would better follow the convention used in other docstrings Series or array-like of bool
pandas/core/strings.py
Outdated
1 True | ||
2 False | ||
3 not_a_string | ||
dtype: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reuse the same Series
with the NaN
for both examples, with the na
by default, and with a value. I think the example would be a bit more realistic with na=False
.
Also, I think it could help some users with one of the animals starts with a capital B
, which is not matched, so they see that this is case-senstitive.
pandas/core/strings.py
Outdated
|
||
See Also | ||
-------- | ||
endswith : same as startswith, but tests the end of string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong, I think your endswith
is linked by Sphinx to pandas.endswith
which doesn't exist. I think the right way would be Series.str.endswith
. Similar examples link to the Python standard library version, str.startswith
(the one @TomAugspurger says).
If I'm not wrong, the descriptions in the See Also
section should start with a capital letter and finish with a period.
@TomAugspurger @datapythonista |
pandas/core/strings.py
Outdated
See Also | ||
-------- | ||
str_endswith : Same as startswith, but tests the end of string. | ||
str.startswith : Python standard library string method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some bug preventing the links to be generated when you build the html with the --single
option. If you build the whole documenation doc/make.py html
, it will take something like 5 minutes to complete, but I think you should have the links working.
Replacing str_endswith
with Series.str.endswith
is the right way.
Also, I'd personally add Series.str.contains
, which also looks for the pattern, but in any position. So, I think it could be useful for some users visiting the page to know about it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep thanks.
The entire build one shows the links properly.
Added contains
pandas/core/strings.py
Outdated
|
||
Returns | ||
------- | ||
startswith : Series/array of boolean values | ||
startswith : Series or array-like of bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was checking the other methods of the .str.
accessor, and seems like we're a bit inconsistent with the type. In some cases we use Series/array
or Series or array-like
, and in some others Series or Index of whatever
. I think the latter is more specific, as I don't think currently this method can return a numpy array or a Python list.
We can change it at a later point, as some unifying would be nice. But if you want to change it now to Series or Index of bool
we'd have this right already.
Also, the startswith :
here is something that we don't want to use anymore, unless the function/method returns more than one value. So you can leave just the type.
1 False | ||
2 False | ||
3 False | ||
dtype: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in more cases we show >>> s
after defining it, and we leave a blank line between the different examples, so they are in different boxes in the html. Also, for the last example a short explanation of what you are doing could be useful for users.
You can see an example of what I mean here: https://github.com/dcreekp/pandas/blob/39f76413374109d8c34021a6b61d121d3d05c9a0/pandas/core/strings.py#L1347 probably we don't need many explanations in this case, as the example is quite obvious, but a short sentence for the last case could help users see what's going on faster.
@datapythonista changed as per suggestions:) |
pandas/core/strings.py
Outdated
Examples | ||
-------- | ||
>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan]) | ||
>>> s.str.startswith('b') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I wasn't clear in part of my last comment. The explanation you added looks great, that part is perfect.
But what I meant with adding >>> s
is:
>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan])
>>> s
(the user can see the series here)
So, you'd have a first block (ended with a blank line to make it a different box), where the user can see the data you'll be using in both examples.
Then you show the basic usage (line 357 currently), then the explanation of the second example you added. And for the second example you don't need to create the series again, as you had before.
So it'd be simply adding >>> s
, its output and a blank linne after L356. And removing L366. That would be the standard way use in most examples, and IMO the clearest.
Sorry my previous comment was confusing.
@datapythonista ok please check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect to me, really good job. Thanks!
Thanks! |
Cool. Thanks for all the feedback! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.