DOC: Improved the docstring of str.extract() (Delhi) #20141

arjunsharma97 · 2018-03-10T14:02:51Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
#################### Docstring (pandas.Series.str.extract)  ####################
################################################################################

Return the match object corresponding to regex `pat`.

For each subject string in the Series, extract groups from the
first match of regular expression `pat`.

Parameters
----------
pat : string
    Regular expression pattern with capturing groups.
flags : int, default 0 (no flags)
    Re module flags, e.g. re.IGNORECASE.
expand : bool, default True
    If True, return DataFrame, else return Series/Index/DataFrame.

    .. versionadded:: 0.18.0.

Returns
-------
DataFrame with one row for each subject string, and one column for
each group. Any capture group names in regular expression pat will
be used for column names; otherwise capture group numbers will be
used. The dtype of each result column is always object, even when
no match is found. If expand=False and pat has only one capture group,
then return a Series (if subject is a Series) or Index (if subject
is an Index).

See Also
--------
extractall : returns all matches (not just the first match)

Examples
--------
A pattern with two groups will return a DataFrame with two columns.
Non-matches will be NaN.

>>> s = pd.Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')
     0    1
0    a    1
1    b    2
2  NaN  NaN

A pattern may contain optional groups.

>>> s.str.extract(r'([ab])?(\d)')
     0  1
0    a  1
1    b  2
2  NaN  3

Named groups will become column names in the result.

>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')
  letter digit
0      a     1
1      b     2
2    NaN   NaN

A pattern with one group will return a DataFrame with one column
if expand=True.

>>> s.str.extract(r'[ab](\d)', expand=True)
     0
0    1
1    2
2  NaN

A pattern with one group will return a Series if expand=False.

>>> s.str.extract(r'[ab](\d)', expand=False)
0      1
1      2
2    NaN
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.str.extract" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

rafarui · 2018-03-10T14:27:42Z

pandas/core/strings.py

@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0):

 def str_extract(arr, pat, flags=0, expand=True):
    r"""
+    Return the match object corresponding to regex `pat`.


Is this r before the doc string right?

Yes, it is to ensure correct rendering for used \ in the examples

This description isn't quite right, is it? We aren't returning a Match object.

Maybe

Extract capture groups in the regex `pat` as columns in a DataFrame.

jorisvandenbossche

Thanks for the PR! Added some comments

jorisvandenbossche · 2018-03-12T22:50:08Z

pandas/core/strings.py

@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0):

 def str_extract(arr, pat, flags=0, expand=True):
    r"""
+    Return the match object corresponding to regex `pat`.


Yes, it is to ensure correct rendering for used \ in the examples

jorisvandenbossche · 2018-03-12T22:51:14Z

pandas/core/strings.py

    flags : int, default 0 (no flags)
-        re module flags, e.g. re.IGNORECASE
-
+        Re module flags, e.g. re.IGNORECASE.


In this case it is fine to not capitalize the first word, as it refers to a python module. Therefore, would you actually quote it like ``re`` ? (and the same for re.IGNORECASE)

jorisvandenbossche · 2018-03-12T22:52:16Z

pandas/core/strings.py

-    Equivalent to applying :func:`re.findall` to all the elements in the
-    Series/Index.
+    Find all occurrences of pattern or regular expression in the
+    Series/Index. Equivalent to :func:`re.findall`.


The summary line should only be one line. Can you try to condense it or split it up?

jorisvandenbossche · 2018-03-12T22:52:27Z

pandas/core/strings.py

-        no flags).
+        Pattern or regular expression
+    flags : int, default 0 (no flags)
+        re module flags, e.g. re.IGNORECASE


similar comment here as above

jorisvandenbossche · 2018-03-12T22:52:59Z

pandas/core/strings.py

-
-    The search for the pattern 'Monkey' returns one match:
-
-    >>> s.str.findall('Monkey')


Is there a reason you removed those examples ?

codecov · 2018-03-15T16:45:10Z

Codecov Report

❗ No coverage uploaded for pull request base (master@32ee973). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #20141   +/-   ##
=========================================
  Coverage          ?   91.72%           
=========================================
  Files             ?      150           
  Lines             ?    49152           
  Branches          ?        0           
=========================================
  Hits              ?    45086           
  Misses            ?     4066           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.11% <ø> (?)`
#single	`41.84% <ø> (?)`

Impacted Files	Coverage Δ
pandas/core/strings.py	`98.32% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32ee973...ec8bd44. Read the comment docs.

TomAugspurger · 2018-03-16T13:00:29Z

pandas/core/strings.py

@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0):

 def str_extract(arr, pat, flags=0, expand=True):
    r"""
+    Return the match object corresponding to regex `pat`.


This description isn't quite right, is it? We aren't returning a Match object.

Maybe

Extract capture groups in the regex `pat` as columns in a DataFrame.

TomAugspurger · 2018-03-16T13:01:11Z

pandas/core/strings.py

    flags : int, default 0 (no flags)
-        re module flags, e.g. re.IGNORECASE
-
+        ``re`` module flags, e.g. ``re.IGNORECASE``.


Add a llink to https://docs.python.org/3/library/re.html#contents-of-module-re ?

I think this can be done by using :mod:`re`

TomAugspurger · 2018-03-16T13:01:59Z

pandas/core/strings.py

    expand : bool, default True
-        * If True, return DataFrame.
-        * If False, return Series/Index/DataFrame.
+        If True, return DataFrame, else return Series/Index/DataFrame.


When is a DataFrame returned if expand=False?

It still returns a DataFrame when you have multiple groups (only if there is only one group, it gives a Series). Probably good to explain this a bit more.

(I find this a bit strange behaviour, but that's what it currently is. In other case we then return a Series with lists I think)

jreback · 2018-07-07T16:26:22Z

thanks @arjunsharma97 and @mroeschke for fixups!

arjunsharma97 added 3 commits March 10, 2018 19:06

Modified docstring of str.extract

effaede

Updated docstring of str.extract

c9f4ac8

Updated docstring of str.extract

3f7aadf

rafarui reviewed Mar 10, 2018

View reviewed changes

jorisvandenbossche added the Docs label Mar 12, 2018

jorisvandenbossche reviewed Mar 12, 2018

View reviewed changes

arjunsharma97 added 2 commits March 15, 2018 21:09

Merge remote-tracking branch 'upstream/master' into strextract

b0ead62

Minor changes to str.extract

3900c1a

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

Matt Roeschke added 2 commits July 7, 2018 10:07

Merge remote-tracking branch 'upstream/master' into strextract

2977c41

small cleanup

ec8bd44

jreback added this to the 0.24.0 milestone Jul 7, 2018

jreback merged commit b12b7ae into pandas-dev:master Jul 7, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: Improved the docstring of str.extract() (Delhi) (pandas-dev#20141)

3fb695f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Improved the docstring of str.extract() (Delhi) #20141

DOC: Improved the docstring of str.extract() (Delhi) #20141

arjunsharma97 commented Mar 10, 2018

rafarui Mar 10, 2018

jorisvandenbossche Mar 12, 2018

TomAugspurger Mar 16, 2018

jorisvandenbossche left a comment

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

jorisvandenbossche Mar 12, 2018

codecov bot commented Mar 15, 2018 •

edited

Loading

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

jorisvandenbossche Mar 17, 2018

TomAugspurger Mar 16, 2018

jorisvandenbossche Mar 17, 2018

jreback commented Jul 7, 2018


		The search for the pattern 'Monkey' returns one match:

		>>> s.str.findall('Monkey')

DOC: Improved the docstring of str.extract() (Delhi) #20141

DOC: Improved the docstring of str.extract() (Delhi) #20141

Conversation

arjunsharma97 commented Mar 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 15, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 7, 2018

codecov bot commented Mar 15, 2018 •

edited

Loading