Skip to content

DOC: update the pandas.Series.str.startswith docstring #20458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 25, 2018

Conversation

dcreekp
Copy link
Contributor

@dcreekp dcreekp commented Mar 22, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################### Docstring (pandas.Series.str.startswith) ###################
################################################################################

Test if the start of each string element matches a pattern.

Return a Series of booleans indicating whether the given pattern matches
the start of each string element.
Equivalent to :meth: `str.startswith`.

Parameters
----------
pat : string
    Character sequence.
na : object, default NaN
    Character sequence shown if element tested is not a string.

Returns
-------
startswith : Series/array of boolean values

Examples
--------
>>> s = pd.Series(['bat', 'bear', 'cat'])
>>> s.str.startswith('b')
0     True
1     True
2    False
dtype: bool
>>> s = pd.Series(['bat', 'bear', 'cat', np.nan])
>>> s.str.startswith('b', na='not_a_string')
0            True
1            True
2           False
3    not_a_string
dtype: object

See Also
--------
endswith : same as startswith, but tests the end of string

################################################################################
################################## Validation ##################################
################################################################################

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

3 not_a_string
dtype: object

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the See Also before the Examples.


See Also
--------
endswith : same as startswith, but tests the end of string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add str.startswith.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, I think your endswith is linked by Sphinx to pandas.endswith which doesn't exist. I think the right way would be Series.str.endswith. Similar examples link to the Python standard library version, str.startswith (the one @TomAugspurger says).

If I'm not wrong, the descriptions in the See Also section should start with a capital letter and finish with a period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to get a link, have tried:
str_endswith
Series.str_endswith
Series.str.endswith
Series.string.endswith
endswith
Series.str.str_endswith

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry we should document this better. Two things are going on

  1. Sphinx / numpydoc has some weird rules around name resolution. I can't find the docs right now, but I think it's same class, same module, elsewhere.
  2. The items will only become links if both API pages are built.

So I think either endswith or Series.str.endswith will work, but Series.str.endswith may be a bit clearer. To validate that, you can build endswith first, then startswith.


Parameters
----------
pat : string
Character sequence
na : bool, default NaN
Character sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pat be a regex? I don't think it can. Best to state that explicitly.

@codecov
Copy link

codecov bot commented Mar 23, 2018

Codecov Report

Merging #20458 into master will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20458      +/-   ##
==========================================
+ Coverage    91.8%   91.85%   +0.04%     
==========================================
  Files         152      152              
  Lines       49223    49231       +8     
==========================================
+ Hits        45191    45220      +29     
+ Misses       4032     4011      -21
Flag Coverage Δ
#multiple 90.23% <ø> (+0.04%) ⬆️
#single 41.83% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/strings.py 98.32% <ø> (ø) ⬆️
pandas/core/arrays/categorical.py 96.2% <0%> (-0.02%) ⬇️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
pandas/core/frame.py 97.18% <0%> (ø) ⬆️
pandas/plotting/_core.py 82.5% <0%> (ø) ⬆️
pandas/io/formats/csvs.py 98.13% <0%> (+0.08%) ⬆️
pandas/io/parsers.py 95.45% <0%> (+0.12%) ⬆️
pandas/core/groupby.py 92.55% <0%> (+0.41%) ⬆️
pandas/util/testing.py 84.73% <0%> (+0.61%) ⬆️
pandas/io/common.py 70.04% <0%> (+1.26%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02477da...9a6d018. Read the comment docs.

Test if the start of each string element matches a pattern.

Return a Series of booleans indicating whether the given pattern matches
the start of each string element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the description of the Return section for this comment.


Return a Series of booleans indicating whether the given pattern matches
the start of each string element.
Equivalent to :meth: `str.startswith`.

Parameters
----------
pat : string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use str instead of string. I know it was like this, but can you please update it?

na : bool, default NaN
Character sequence.
na : object, default NaN
Character sequence shown if element tested is not a string.

Returns
-------
startswith : Series/array of boolean values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would better follow the convention used in other docstrings Series or array-like of bool

1 True
2 False
3 not_a_string
dtype: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reuse the same Series with the NaN for both examples, with the na by default, and with a value. I think the example would be a bit more realistic with na=False.

Also, I think it could help some users with one of the animals starts with a capital B, which is not matched, so they see that this is case-senstitive.


See Also
--------
endswith : same as startswith, but tests the end of string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, I think your endswith is linked by Sphinx to pandas.endswith which doesn't exist. I think the right way would be Series.str.endswith. Similar examples link to the Python standard library version, str.startswith (the one @TomAugspurger says).

If I'm not wrong, the descriptions in the See Also section should start with a capital letter and finish with a period.

@dcreekp
Copy link
Contributor Author

dcreekp commented Mar 24, 2018

@TomAugspurger @datapythonista
incorporated the feedback, I left a comment on the one I couldn't work out

See Also
--------
str_endswith : Same as startswith, but tests the end of string.
str.startswith : Python standard library string method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some bug preventing the links to be generated when you build the html with the --single option. If you build the whole documenation doc/make.py html, it will take something like 5 minutes to complete, but I think you should have the links working.

Replacing str_endswith with Series.str.endswith is the right way.

Also, I'd personally add Series.str.contains, which also looks for the pattern, but in any position. So, I think it could be useful for some users visiting the page to know about it too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thanks.
The entire build one shows the links properly.
Added contains


Returns
-------
startswith : Series/array of boolean values
startswith : Series or array-like of bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking the other methods of the .str. accessor, and seems like we're a bit inconsistent with the type. In some cases we use Series/array or Series or array-like, and in some others Series or Index of whatever. I think the latter is more specific, as I don't think currently this method can return a numpy array or a Python list.

We can change it at a later point, as some unifying would be nice. But if you want to change it now to Series or Index of bool we'd have this right already.

Also, the startswith : here is something that we don't want to use anymore, unless the function/method returns more than one value. So you can leave just the type.

1 False
2 False
3 False
dtype: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in more cases we show >>> s after defining it, and we leave a blank line between the different examples, so they are in different boxes in the html. Also, for the last example a short explanation of what you are doing could be useful for users.

You can see an example of what I mean here: https://github.com/dcreekp/pandas/blob/39f76413374109d8c34021a6b61d121d3d05c9a0/pandas/core/strings.py#L1347 probably we don't need many explanations in this case, as the example is quite obvious, but a short sentence for the last case could help users see what's going on faster.

@dcreekp
Copy link
Contributor Author

dcreekp commented Mar 25, 2018

@datapythonista changed as per suggestions:)

Examples
--------
>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan])
>>> s.str.startswith('b')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I wasn't clear in part of my last comment. The explanation you added looks great, that part is perfect.

But what I meant with adding >>> s is:

>>> s = pd.Series(['bat', 'Bear', 'cat', np.nan])
>>> s
(the user can see the series here)

So, you'd have a first block (ended with a blank line to make it a different box), where the user can see the data you'll be using in both examples.

Then you show the basic usage (line 357 currently), then the explanation of the second example you added. And for the second example you don't need to create the series again, as you had before.

So it'd be simply adding >>> s, its output and a blank linne after L356. And removing L366. That would be the standard way use in most examples, and IMO the clearest.

Sorry my previous comment was confusing.

@dcreekp
Copy link
Contributor Author

dcreekp commented Mar 25, 2018

@datapythonista ok please check

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect to me, really good job. Thanks!

@TomAugspurger TomAugspurger merged commit 0fb2eaa into pandas-dev:master Mar 25, 2018
@TomAugspurger
Copy link
Contributor

Thanks!

@dcreekp
Copy link
Contributor Author

dcreekp commented Mar 26, 2018

Cool. Thanks for all the feedback!

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
)

* DOC: update the pandas.Series.str.startswith docstring

* DOC: update the pandas.Series.str.startswith docstring

* DOC: update the pandas.Series.str.startswith docstring 2

* DOC: update the pandas.Series.str.startswith docstring 3

* DOC: update the pandas.Series.str.startswith docstring 4
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
)

* DOC: update the pandas.Series.str.startswith docstring

* DOC: update the pandas.Series.str.startswith docstring

* DOC: update the pandas.Series.str.startswith docstring 2

* DOC: update the pandas.Series.str.startswith docstring 3

* DOC: update the pandas.Series.str.startswith docstring 4
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
)

* DOC: update the pandas.Series.str.startswith docstring

* DOC: update the pandas.Series.str.startswith docstring

* DOC: update the pandas.Series.str.startswith docstring 2

* DOC: update the pandas.Series.str.startswith docstring 3

* DOC: update the pandas.Series.str.startswith docstring 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants