DOC: update the pandas.Series.str.split docstring #20307

mananpal1997 · 2018-03-12T16:34:14Z

Following up from discussion on #20282

PR title is "DOC: update the pandas.Series.str.split docstring"
The validation script passes: scripts/validate_docstrings.py pandas.Series.str.split docstring
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single pandas.Series.str.split docstring

################################################################################
##################### Docstring (pandas.Series.str.split)  #####################
################################################################################

Split strings around given separator/delimiter.

Split each string in the caller's values by given
pattern, propagating NaN values. Equivalent to :meth:`str.split`.

Parameters
----------
pat : str, optional
    String or regular expression to split on.
    If not specified, split on whitespace.
n : int, default -1 (all)
    Limit number of splits in output.
    ``None``, 0 and -1 will be interpreted as return all splits.
expand : bool, default False
    Expand the splitted strings into separate columns.

    * If ``True``, return DataFrame/MultiIndex expanding dimensionality.
    * If ``False``, return Series/Index, containing lists of strings.

Returns
-------
split : Series/Index or DataFrame/MultiIndex of objects
    Type matches caller unless ``expand=True`` (return type is DataFrame or
MultiIndex)

Notes
-----
The handling of the `n` keyword depends on the number of found splits:

- If found splits > `n`,  make first `n` splits only
- If found splits <= `n`, make all splits
- If for a certain row the number of found splits < `n`,
  append `None` for padding up to `n` if ``expand=True``

Examples
--------
>>> s = pd.Series(["this is good text", "but this is even better"])

By default, split will return an object of the same size
having lists containing the split elements

>>> s.str.split()
0           [this, is, good, text]
1    [but, this, is, even, better]
dtype: object
>>> s.str.split("random")
0          [this is good text]
1    [but this is even better]
dtype: object

When using ``expand=True``, the split elements will expand out into
separate columns.

For Series object, output return type is DataFrame.

>>> s.str.split(expand=True)
      0     1     2     3       4
0  this    is  good  text    None
1   but  this    is  even  better
>>> s.str.split(" is ", expand=True)
          0            1
0      this    good text
1  but this  even better

For Index object, output return type is MultiIndex.

>>> i = pd.Index(["ba 100 001", "ba 101 002", "ba 102 003"])
>>> i.str.split(expand=True)
MultiIndex(levels=[['ba'], ['100', '101', '102'], ['001', '002', '003']],
       labels=[[0, 0, 0], [0, 1, 2], [0, 1, 2]])

Parameter `n` can be used to limit the number of splits in the output.

>>> s.str.split("is", n=1)
0          [th,  is good text]
1    [but th,  is even better]
dtype: object
>>> s.str.split("is", n=1, expand=True)
        0                1
0      th     is good text
1  but th   is even better

If NaN is present, it is propagated throughout the columns
during the split.

>>> s = pd.Series(["this is good text", "but this is even better", np.nan])
>>> s.str.split(n=3, expand=True)
      0     1     2            3
0  this    is  good         text
1   but  this    is  even better
2   NaN   NaN   NaN          NaN

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	See Also section not found

@jorisvandenbossche @WillAyd

jorisvandenbossche · 2018-03-12T16:49:56Z

@mananpal1997 Thanks!

Additional comment: on line 1121, the indentation should be the same as the line above, can you fix that as well? (cannot comment on that line :-))

@WillAyd did you have a proposal to write the Returns section more clear?

WillAyd · 2018-03-12T17:46:11Z

Nice job @mananpal1997. I suggest the Return type line just say Series, DataFrame, Index or MultiIndex and the description for the return should say Type matches caller unless ``expand=True`` (see Notes). Then in the notes say something like if using ``expand=True``, Series and Index callers will return DataFrame and MultiIndex objects, respectively.

WillAyd · 2018-03-12T18:01:24Z

pandas/core/strings.py

-    split : Series/Index or DataFrame/MultiIndex of objects
-        Type matches caller unless ``expand=True`` (return type is DataFrame or
-        MultiIndex)
+    split : Series, Index, DataFrame or MultiIndex of objects


Since you are only returning one item you don't need to list the variable name, so get rid of "split : ". Also chop " of objects" off the end

@WillAyd I was trying to resolve for coverage and I think I messed up with PR. Does everything look good to you? 😅

codecov · 2018-03-12T18:02:04Z

Codecov Report

Merging #20307 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #20307   +/-   ##
=======================================
  Coverage    91.7%    91.7%           
=======================================
  Files         150      150           
  Lines       49165    49165           
=======================================
  Hits        45087    45087           
  Misses       4078     4078

Flag	Coverage Δ
#multiple	`90.09% <ø> (ø)`	⬆️
#single	`41.86% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/strings.py	`98.32% <ø> (ø)`	⬆️
pandas/core/series.py	`93.84% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7169830...27548e8. Read the comment docs.

…ethod

TomAugspurger · 2018-03-13T11:57:57Z

Added a see also. Thanks @mananpal1997.

jorisvandenbossche added the Docs label Mar 12, 2018

mananpal1997 force-pushed the docstring_pandas.Series.str.split branch from 9184934 to 52fe248 Compare March 12, 2018 16:52

WillAyd reviewed Mar 12, 2018

View reviewed changes

mananpal1997 force-pushed the docstring_pandas.Series.str.split branch from b57f570 to c9bf3e8 Compare March 12, 2018 18:01

mananpal1997 force-pushed the docstring_pandas.Series.str.split branch from c9bf3e8 to 1a2efb0 Compare March 12, 2018 18:04

mananpal1997 closed this Mar 12, 2018

mananpal1997 force-pushed the docstring_pandas.Series.str.split branch from 1a2efb0 to 7169830 Compare March 12, 2018 18:18

docstring update: added more examples for pandas.Series.str.split() m…

2af1954

…ethod

mananpal1997 reopened this Mar 12, 2018

Added See Also

27548e8

TomAugspurger merged commit edaa112 into pandas-dev:master Mar 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: update the pandas.Series.str.split docstring #20307

DOC: update the pandas.Series.str.split docstring #20307

Uh oh!

mananpal1997 commented Mar 12, 2018 •

edited

Loading

Uh oh!

jorisvandenbossche commented Mar 12, 2018

Uh oh!

WillAyd commented Mar 12, 2018

Uh oh!

WillAyd Mar 12, 2018

Uh oh!

mananpal1997 Mar 12, 2018

Uh oh!

mananpal1997 Mar 12, 2018

Uh oh!

codecov bot commented Mar 12, 2018 •

edited

Loading

Uh oh!

TomAugspurger commented Mar 13, 2018

Uh oh!

Uh oh!

Uh oh!

DOC: update the pandas.Series.str.split docstring #20307

DOC: update the pandas.Series.str.split docstring #20307

Uh oh!

Conversation

mananpal1997 commented Mar 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Mar 12, 2018

Uh oh!

WillAyd commented Mar 12, 2018

Uh oh!

WillAyd Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

mananpal1997 Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

mananpal1997 Mar 12, 2018

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger commented Mar 13, 2018

Uh oh!

Uh oh!

mananpal1997 commented Mar 12, 2018 •

edited

Loading

codecov bot commented Mar 12, 2018 •

edited

Loading