Skip to content

DOC: update the pandas.Series.str.strip docstring #20863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 7, 2018

Conversation

Cheukting
Copy link
Contributor

@Cheukting Cheukting commented Apr 29, 2018

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Can you post the results of the doc string validator?


Parameters
----------
to_strip : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_strip : str, optional

Parameters
----------
to_strip : str
Specifying the set of characters to be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just simplify this and say "The set of characters to be removed". It matches the standard Python string method so no need to be too wordy


>>> s = pd.Series([' ant', 'bee ', ' cat '])

>>> s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simple enough that you don't need to print the Series as is


Examples
--------
Striping whitespaces for Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Striping" is a typo here and below, though these examples are simple enough that I don't think you need to explain them


Striping a set of characters for Index

>>> df = pd.DataFrame(index=['1.ant ','2._bee__','3. cat_'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just create the index directly instead of creating a DataFrame

index = pd.Index(['1.ant ']) ...


>>> df = pd.DataFrame(index=['1.ant ','2._bee__','3. cat_'])

>>> df.index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again simple enough you don't need to print

2 cat
dtype: object

Striping a set of characters for Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here, but as mentioned above these are basic enough that I personally don't think you need to explain in detail

@pep8speaks
Copy link

pep8speaks commented Apr 30, 2018

Hello @Cheukting! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 07, 2018 at 20:05 Hours UTC

@codecov
Copy link

codecov bot commented Apr 30, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@dcbf8b5). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20863   +/-   ##
=========================================
  Coverage          ?    91.8%           
=========================================
  Files             ?      153           
  Lines             ?    49411           
  Branches          ?        0           
=========================================
  Hits              ?    45361           
  Misses            ?     4050           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.19% <ø> (?)
#single 41.91% <ø> (?)
Impacted Files Coverage Δ
pandas/core/strings.py 98.34% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcbf8b5...0d6a4cc. Read the comment docs.

@Cheukting
Copy link
Contributor Author

###############################################################################

################################## Validation #################################

###############################################################################

Docstring for "pandas.Series.str.strip" correct. :)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

Also, #20628 was working on this too. I think your changes look better, but you can take a look at that PR and see if there is any idea that you can copy here.


See Also
--------
str.slice : Slice substrings from each element in the Series/Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice doesn't seem very related to strip.

Not sure if there is anything else related, but we should have here Series.str.strip, Series.str.lstrip and Series.str.rstrip, so they link each other.

to_strip : str, optional
Specifying the set of characters to be removed.
All combinations of this set of characters will be stripped.
Default value is None, which means whaitspaces will be removed.

Returns
-------
stripped : Series/Index of objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of the name stripped, just leave the type in that line, and in the next add a description of what is being returned?


Parameters
----------
to_strip : str, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not optional, the default is Ǹone which means whitespaces are stripped You can add to_strip : str or None, default None (whitespaces are removed). And take it out from the description.

Stripping whitespaces for Series

>>> s = pd.Series([' ant', 'bee ', ' cat '])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this blank line is unnecessary.

Index(['1.ant ', '2._bee__', '3. cat_'], dtype='object')

>>> idx.str.strip('123._ ')
Index(['ant', 'bee', 'cat'], dtype='object')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a single example, with a single Series with multiple values that illustrate:

  • The method without parameters s.str.strip() and how it strips not only spaces but new lines and tabs
  • The characters to strip in the left, the right and both sides
  • The use of the 3 methods being documented lstrip, rstrip, rstrip
  • The use of another character to strip (something that looks like a real-world example)
  • Stripping more than a single character (also something a bit more real-world if possible would be great.

Stripping a set of characters for Index

>>> idx = pd.Index(['1.ant ','2._bee__','3. cat_'])
>>> s.str.strip('123.!? \n\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find this example overcomplicated. I think something more realistic and simple could be:

  • Illustrate the default to_strip parameter (whitespaces) with rstrip in a value like ` text \n'
  • lstrip(to_strip='0') for a value like 0010010)
  • And then you can show strip(to_strip=' \n0') with both values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wants to keep the 'Animal' theme in the example but I will simplify the lstrip and rstrip example

@alsobay
Copy link

alsobay commented May 1, 2018

Hi @Cheukting, great work! As @datapythonista pointed out in #20897, it seems that our wires crossed during the sprint and we ended up working on the same thing.

Seeing as this PR got in earlier, please feel free to grab anything from our PR that might help address Marc's notes, and we'll close ours. If you don't plan to refine this PR, please let us know and we'll continue on #20897.

@Cheukting
Copy link
Contributor Author

Cheukting commented May 1, 2018

Thanks @alsobay and @jalammar! I really like your PR and I think I may borrow some ideas and put it here. Feel free to keep commenting and reviewing this PR as it is our collaboration effort :-)

@jreback jreback added Docs Strings String extension data type and string data labels May 4, 2018

Parameters
----------
to_strip : str or None, default None (whitespaces are removed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the coment about whitespaces are removed should be on the next line with a case, e.g.
If None then whitespaces are removed


Examples
--------
>>> s = pd.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a np.nan in the example (in place of an entry or in addition is ok)

@mroeschke mroeschke added this to the 0.24.0 milestone Jul 7, 2018
@mroeschke
Copy link
Member

Looks great! Will merge on green.

@jreback
Copy link
Contributor

jreback commented Jul 7, 2018

@mroeschke merge away

@mroeschke
Copy link
Member

Thanks @Cheukting!

@mroeschke mroeschke merged commit 0add019 into pandas-dev:master Jul 7, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* DOC: update the pandas.Series.str.strip docstring

* DOC: update the pandas.Series.str.strip docstring

* typo fixing and improve Index example

* Fix PEP8

* Improve parameter explaination

* Improve documentation, adding lstrip and rstrip example

* simplify lstrip and rstrip

* added np.nan example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants