-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improved the docstring of str.extract() (Delhi) #20141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
effaede
c9f4ac8
3f7aadf
b0ead62
3900c1a
2977c41
ec8bd44
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0): | |
|
||
def str_extract(arr, pat, flags=0, expand=True): | ||
r""" | ||
Return the match object corresponding to regex `pat`. | ||
|
||
For each subject string in the Series, extract groups from the | ||
first match of regular expression pat. | ||
first match of regular expression `pat`. | ||
|
||
Parameters | ||
---------- | ||
pat : string | ||
Regular expression pattern with capturing groups | ||
Regular expression pattern with capturing groups. | ||
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
|
||
Re module flags, e.g. re.IGNORECASE. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case it is fine to not capitalize the first word, as it refers to a python module. Therefore, would you actually quote it like |
||
expand : bool, default True | ||
* If True, return DataFrame. | ||
* If False, return Series/Index/DataFrame. | ||
If True, return DataFrame, else return Series/Index/DataFrame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When is a DataFrame returned if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It still returns a DataFrame when you have multiple groups (only if there is only one group, it gives a Series). Probably good to explain this a bit more. (I find this a bit strange behaviour, but that's what it currently is. In other case we then return a Series with lists I think) |
||
|
||
.. versionadded:: 0.18.0 | ||
.. versionadded:: 0.18.0. | ||
|
||
Returns | ||
------- | ||
|
@@ -668,7 +668,7 @@ def str_extract(arr, pat, flags=0, expand=True): | |
A pattern with two groups will return a DataFrame with two columns. | ||
Non-matches will be NaN. | ||
|
||
>>> s = Series(['a1', 'b2', 'c3']) | ||
>>> s = pd.Series(['a1', 'b2', 'c3']) | ||
>>> s.str.extract(r'([ab])(\d)') | ||
0 1 | ||
0 a 1 | ||
|
@@ -707,7 +707,6 @@ def str_extract(arr, pat, flags=0, expand=True): | |
1 2 | ||
2 NaN | ||
dtype: object | ||
|
||
""" | ||
if not isinstance(expand, bool): | ||
raise ValueError("expand must be True or False") | ||
|
@@ -898,94 +897,23 @@ def str_join(arr, sep): | |
|
||
def str_findall(arr, pat, flags=0): | ||
""" | ||
Find all occurrences of pattern or regular expression in the Series/Index. | ||
|
||
Equivalent to applying :func:`re.findall` to all the elements in the | ||
Series/Index. | ||
Find all occurrences of pattern or regular expression in the | ||
Series/Index. Equivalent to :func:`re.findall`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The summary line should only be one line. Can you try to condense it or split it up? |
||
|
||
Parameters | ||
---------- | ||
pat : string | ||
Pattern or regular expression. | ||
flags : int, default 0 | ||
``re`` module flags, e.g. `re.IGNORECASE` (default is 0, which means | ||
no flags). | ||
Pattern or regular expression | ||
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similar comment here as above |
||
|
||
Returns | ||
------- | ||
Series/Index of lists of strings | ||
All non-overlapping matches of pattern or regular expression in each | ||
string of this Series/Index. | ||
matches : Series/Index of lists | ||
|
||
See Also | ||
-------- | ||
count : Count occurrences of pattern or regular expression in each string | ||
of the Series/Index. | ||
extractall : For each string in the Series, extract groups from all matches | ||
of regular expression and return a DataFrame with one row for each | ||
match and one column for each group. | ||
re.findall : The equivalent ``re`` function to all non-overlapping matches | ||
of pattern or regular expression in string, as a list of strings. | ||
|
||
Examples | ||
-------- | ||
|
||
>>> s = pd.Series(['Lion', 'Monkey', 'Rabbit']) | ||
|
||
The search for the pattern 'Monkey' returns one match: | ||
|
||
>>> s.str.findall('Monkey') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason you removed those examples ? |
||
0 [] | ||
1 [Monkey] | ||
2 [] | ||
dtype: object | ||
|
||
On the other hand, the search for the pattern 'MONKEY' doesn't return any | ||
match: | ||
|
||
>>> s.str.findall('MONKEY') | ||
0 [] | ||
1 [] | ||
2 [] | ||
dtype: object | ||
|
||
Flags can be added to the pattern or regular expression. For instance, | ||
to find the pattern 'MONKEY' ignoring the case: | ||
|
||
>>> import re | ||
>>> s.str.findall('MONKEY', flags=re.IGNORECASE) | ||
0 [] | ||
1 [Monkey] | ||
2 [] | ||
dtype: object | ||
|
||
When the pattern matches more than one string in the Series, all matches | ||
are returned: | ||
|
||
>>> s.str.findall('on') | ||
0 [on] | ||
1 [on] | ||
2 [] | ||
dtype: object | ||
|
||
Regular expressions are supported too. For instance, the search for all the | ||
strings ending with the word 'on' is shown next: | ||
|
||
>>> s.str.findall('on$') | ||
0 [on] | ||
1 [] | ||
2 [] | ||
dtype: object | ||
|
||
If the pattern is found more than once in the same string, then a list of | ||
multiple strings is returned: | ||
|
||
>>> s.str.findall('b') | ||
0 [] | ||
1 [] | ||
2 [b, b] | ||
dtype: object | ||
|
||
extractall : returns DataFrame with one column per capture group | ||
""" | ||
regex = re.compile(pat, flags=flags) | ||
return _na_map(regex.findall, arr) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this
r
before the doc string right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is to ensure correct rendering for used
\
in the examplesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description isn't quite right, is it? We aren't returning a
Match
object.Maybe