Skip to content

DOC: Improved the docstring of str.extract() (Delhi) #20141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 7, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -840,19 +840,22 @@ def _str_extract_frame(arr, pat, flags=0):

def str_extract(arr, pat, flags=0, expand=True):
r"""
Extract capture groups in the regex `pat` as columns in a DataFrame.

For each subject string in the Series, extract groups from the
first match of regular expression pat.
first match of regular expression `pat`.

Parameters
----------
pat : string
Regular expression pattern with capturing groups
Regular expression pattern with capturing groups.
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE

``re`` module flags, e.g. ``re.IGNORECASE``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done by using :mod:`re`

See :mod:`re`
expand : bool, default True
* If True, return DataFrame.
* If False, return Series/Index/DataFrame.
If True, return DataFrame with one column per capture group.
If False, return a Series/Index if there is one capture group
or DataFrame if there are multiple capture groups.

.. versionadded:: 0.18.0

Expand All @@ -875,7 +878,7 @@ def str_extract(arr, pat, flags=0, expand=True):
A pattern with two groups will return a DataFrame with two columns.
Non-matches will be NaN.

>>> s = Series(['a1', 'b2', 'c3'])
>>> s = pd.Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')
0 1
0 a 1
Expand Down Expand Up @@ -914,7 +917,6 @@ def str_extract(arr, pat, flags=0, expand=True):
1 2
2 NaN
dtype: object

"""
if not isinstance(expand, bool):
raise ValueError("expand must be True or False")
Expand Down