-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improved the docstring of str.extract() (Delhi) #20141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/strings.py
Outdated
@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0): | |||
|
|||
def str_extract(arr, pat, flags=0, expand=True): | |||
r""" | |||
Return the match object corresponding to regex `pat`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this r
before the doc string right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is to ensure correct rendering for used \
in the examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description isn't quite right, is it? We aren't returning a Match
object.
Maybe
Extract capture groups in the regex `pat` as columns in a DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added some comments
pandas/core/strings.py
Outdated
@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0): | |||
|
|||
def str_extract(arr, pat, flags=0, expand=True): | |||
r""" | |||
Return the match object corresponding to regex `pat`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is to ensure correct rendering for used \
in the examples
pandas/core/strings.py
Outdated
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
|
||
Re module flags, e.g. re.IGNORECASE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is fine to not capitalize the first word, as it refers to a python module. Therefore, would you actually quote it like ``re``
? (and the same for re.IGNORECASE
)
pandas/core/strings.py
Outdated
Equivalent to applying :func:`re.findall` to all the elements in the | ||
Series/Index. | ||
Find all occurrences of pattern or regular expression in the | ||
Series/Index. Equivalent to :func:`re.findall`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary line should only be one line. Can you try to condense it or split it up?
pandas/core/strings.py
Outdated
no flags). | ||
Pattern or regular expression | ||
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment here as above
pandas/core/strings.py
Outdated
|
||
The search for the pattern 'Monkey' returns one match: | ||
|
||
>>> s.str.findall('Monkey') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you removed those examples ?
Codecov Report
@@ Coverage Diff @@
## master #20141 +/- ##
=========================================
Coverage ? 91.72%
=========================================
Files ? 150
Lines ? 49152
Branches ? 0
=========================================
Hits ? 45086
Misses ? 4066
Partials ? 0
Continue to review full report at Codecov.
|
pandas/core/strings.py
Outdated
@@ -633,21 +633,21 @@ def _str_extract_frame(arr, pat, flags=0): | |||
|
|||
def str_extract(arr, pat, flags=0, expand=True): | |||
r""" | |||
Return the match object corresponding to regex `pat`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description isn't quite right, is it? We aren't returning a Match
object.
Maybe
Extract capture groups in the regex `pat` as columns in a DataFrame.
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
|
||
``re`` module flags, e.g. ``re.IGNORECASE``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a llink to https://docs.python.org/3/library/re.html#contents-of-module-re ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be done by using :mod:`re`
pandas/core/strings.py
Outdated
expand : bool, default True | ||
* If True, return DataFrame. | ||
* If False, return Series/Index/DataFrame. | ||
If True, return DataFrame, else return Series/Index/DataFrame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is a DataFrame returned if expand=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still returns a DataFrame when you have multiple groups (only if there is only one group, it gives a Series). Probably good to explain this a bit more.
(I find this a bit strange behaviour, but that's what it currently is. In other case we then return a Series with lists I think)
thanks @arjunsharma97 and @mroeschke for fixups! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.