-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix for .str.replace with repl function #15056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix for .str.replace with repl function #15056
Conversation
.str.replace now accepts a callable (function) as replacement string. It now raises a TypeError when repl is not string like nor a callable. Docstring updated accordingly.
Looks good, can you add tests and a whatsnew entry? |
repl : string | ||
Replacement sequence | ||
repl : string or function | ||
Replacement string or a function, which passed the match object and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function -> callable
Thanks, @chris-b1 and @jreback. Where should I put the whatsnew entry? Can I just pick an empty spot in doc/source/whatsnew/v0.20.0.txt?
This maybe a new feature or improvement rather than a bug, right? |
Yeah, I think new features is fine - I had marked the issue a bug because this would have worked in older versions, although basically undocumented. |
Dull mistake
# GH 15055, callable repl | ||
repl = lambda m: m.group(0).swapcase() | ||
result = values.str.replace('[a-z][A-Z]{2}', repl, n=2) | ||
exp = Series([u('foObaD__baRbaD'), NA]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test where the callable has different than 1 arg (should be error)
test using named field in re (and callable works correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jreback . Named field/group is a good idea.
If the callable raises a TypeError
because of a wrong number of arguments, this is caught in pandas.core.strings._map
, returning a Series of NaN
s.
We could overcome this by either inspecting the callable signature: inspect.signature
or trying with a dummy Regex Match object. But this might have the side effect that the callable is called once before doing the replace on the array, confusing users with some kind of counter within the callable.
Sorry about the checks going wrong. It seems I am too careless with checking my commits. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With inspect
I could do
n_empty_args = sum(1 for p in inspect.signature(repl).parameters.values()
if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY)
and p.default is p.empty)
# or better readable syntax
signature = inspect.signature(repl)
empty_args = [p for p in signature.parameters.values()
if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY)
and p.default is p.empty]
n_empty_args = len(empty_args)
# followed by
if n_empty_args > 1:
raise TypeError("callable should have at most one positional without a default")
n_empty_args != 1
might work, unless the first arg has a default for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too complicated, instead maybe you can introspect the actual error message in _map
and if its a certain class of errors, just re-raise the TypeError
I copied a previous test as a draft, assumed that I could reuse `values` but did not mention it was redefined in the unicode and GH 13438 tests. Now corrected. Just redefine `values`. I could have put my test before the unicode tests but this keeps the order.
Current coverage is 85.98% (diff: 100%)@@ master #15056 diff @@
==========================================
Files 145 140 -5
Lines 51146 51261 +115
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43351 44075 +724
+ Misses 7795 7186 -609
Partials 0 0
|
This influences all StringMethods were a used passes a callable to a StringMethod, e.g. `.str.replace`, that expects a wrong number of arguments. See pandas-dev#15056
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just some added documenations / minor comments. ping on green.
@@ -435,6 +435,31 @@ def test_replace(self): | |||
for data in (['a', 'b', None], ['a', 'b', 'c', 'ad']): | |||
values = klass(data) | |||
self.assertRaises(TypeError, values.str.replace, 'a', repl) | |||
|
|||
## GH 15055 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a separate test
@@ -167,7 +167,14 @@ def _map(f, arr, na_mask=False, na_value=np.nan, dtype=object): | |||
try: | |||
convert = not all(mask) | |||
result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert) | |||
except (TypeError, AttributeError): | |||
except (TypeError, AttributeError) as e: | |||
re_missing = (r'missing \d+ required (positional|keyword-only) ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a 1-line comment on what types of exceptions you are re-raising here (imagine you are a future reader and are like what?)
@@ -23,6 +23,7 @@ New features | |||
~~~~~~~~~~~~ | |||
|
|||
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`. | |||
- ``.str.replace`` now accepts a callable replacement which is passed to ``re.sub`` (:issue:`15055`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
callable, as a replacement, which ....
raise TypeError("repl must be a string") | ||
use_re = not case or len(pat) > 1 or flags | ||
if not (is_string_like(repl) or callable(repl)): | ||
raise TypeError("repl must be a string or function") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function -> callable
Replacement sequence | ||
repl : string or callable | ||
Replacement string or a callable, it's passed the match object and | ||
must return a replacement string to be used. See :func:`re.sub`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the version (0.20.0) after callable
Replacement sequence | ||
repl : string or callable | ||
Replacement string or a callable, it's passed the match object and | ||
must return a replacement string to be used. See :func:`re.sub`. | ||
n : int, default -1 (all) | ||
Number of replacements to make from start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some examples to the doc-string?
also would be nice to have an example in text.rst
- separate test function - fix missing assertRaisesRegex(p)
- add inline comments - add examples with string and callable in __doc__ and text.rst doc - extend method summary Series.str.replace in text.rst doc
Thanks. I added changes based on your review. Only building with python=2.7 fails. I have little to no experience with Python 2x. I didn't caught the TypeError in Python 2: |
so
Generally python 3 does the right thing with exceptions, while py2 may not be re-raising exceptions correctly (so your catching may not work here). So you may need to have a slightly different strategy in py2 and py3 for catching the invalid args. If you get stuck, ping. |
Currently I'm designing new regexes for both py27 and py3k. I'll look into the linting. |
what do you mean? you can prob do something very straightforward, you don't even need regexes, you are just trying to find out if the error message if related to the calling of the function (and not the dtype of the data). |
The regex filtering TypeErrors with argument number problems is now unified and suitable for Python 2.7
@jreback earlier you reviewed a part of my code:
In tl;dr the tests should pass, as this now raises an error. |
@hzpc-joostk lgtm. ping on green. |
|
- trailing whitespace - whitespace empty line - too many '#' in comment
@jreback Thanks for your input and help. All changes made. Waiting for CI... Besides that: all green! |
https://travis-ci.org/pandas-dev/pandas/jobs/194493716 so you can use
and as I said you can do a minimal matching, because sometimes these exception messages change in versions (e.g. 3.5 -> 3.6 for example, maybe not in this case, but the wording can change). You are just generically interested in a specific type of error message (and not its specific wording per se) |
thanks @hzpc-joostk nicely done! |
Thanks @jreback Awesome. My first (accepted) PR on GitHub. 🍾 Cheers |
xref #15056 Author: Joris Van den Bossche <[email protected]> Closes #15210 from jorisvandenbossche/doc-replace-callable and squashes the following commits: 811cc34 [Joris Van den Bossche] DOC: clean-up docstring str.replace (GH15056)
.str.replace now accepts a callable (function) as replacement string. It now raises a TypeError when repl is not string like nor a callable. Docstring updated accordingly. closes pandas-dev#15055 Author: Joost Kranendonk <[email protected]> Author: Joost Kranendonk <[email protected]> Closes pandas-dev#15056 from hzpc-joostk/pandas-GH15055-patch-1 and squashes the following commits: 826730c [Joost Kranendonk] simplify .str.replace TypeError reraising and test 90779ce [Joost Kranendonk] fix linting issues c2cc13a [Joost Kranendonk] Update v0.20.0.txt e15dcdf [Joost Kranendonk] fix bug catch TypeError with wrong number of args 40c0d97 [Joost Kranendonk] improve .str.replace with callable f15ee2a [Joost Kranendonk] improve test .str.replace with callable 14beb21 [Joost Kranendonk] Add test for .str.replace with regex named groups 27065a2 [Joost Kranendonk] Reraise TypeError only with wrong number of args ae04a3e [Joost Kranendonk] Add whatsnew for .str.replace with callable repl 067a7a8 [Joost Kranendonk] Fix testing bug for .str.replace 30d4727 [Joost Kranendonk] Bug fix in .str.replace type checking done wrong 4baa0a7 [Joost Kranendonk] add tests for .str.replace with callable repl 91c883d [Joost Kranendonk] Update .str.replace docstring 6ecc43d [Joost Kranendonk] BUG: Fix for .str.replace with repl function
xref pandas-dev#15056 Author: Joris Van den Bossche <[email protected]> Closes pandas-dev#15210 from jorisvandenbossche/doc-replace-callable and squashes the following commits: 811cc34 [Joris Van den Bossche] DOC: clean-up docstring str.replace (GH15056)
.str.replace now accepts a callable (function) as replacement string. It now raises a TypeError when repl is not string like nor a callable. Docstring updated accordingly.
git diff upstream/master | flake8 --diff