Skip to content

BUG: Fix for .str.replace with repl function #15056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
11 changes: 6 additions & 5 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,8 +303,9 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
----------
pat : string
Character sequence or regular expression
repl : string
Replacement sequence
repl : string or callable
Replacement string or a callable, it's passed the match object and
must return a replacement string to be used. See :func:`re.sub`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the version (0.20.0) after callable

n : int, default -1 (all)
Number of replacements to make from start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some examples to the doc-string?

also would be nice to have an example in text.rst

case : boolean, default True
Expand All @@ -318,9 +319,9 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
"""

# Check whether repl is valid (GH 13438)
if not is_string_like(repl):
raise TypeError("repl must be a string")
use_re = not case or len(pat) > 1 or flags
if not (is_string_like(repl) or callable(repl)):
raise TypeError("repl must be a string or function")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function -> callable

use_re = not case or len(pat) > 1 or flags or callable(repl)

if use_re:
if not case:
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/test_strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,6 +435,12 @@ def test_replace(self):
for data in (['a', 'b', None], ['a', 'b', 'c', 'ad']):
values = klass(data)
self.assertRaises(TypeError, values.str.replace, 'a', repl)

# GH 15055, callable repl
repl = lambda m: m.group(0).swapcase()
result = values.str.replace('[a-z][A-Z]{2}', repl, n=2)
exp = Series([u('foObaD__baRbaD'), NA])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test where the callable has different than 1 arg (should be error)

test using named field in re (and callable works correctly)

Copy link
Author

@hzpc-joostk hzpc-joostk Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jreback . Named field/group is a good idea.

If the callable raises a TypeError because of a wrong number of arguments, this is caught in pandas.core.strings._map, returning a Series of NaNs.

We could overcome this by either inspecting the callable signature: inspect.signature or trying with a dummy Regex Match object. But this might have the side effect that the callable is called once before doing the replace on the array, confusing users with some kind of counter within the callable.

Sorry about the checks going wrong. It seems I am too careless with checking my commits. 😕

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With inspect I could do

n_empty_args = sum(1 for p in inspect.signature(repl).parameters.values() 
                   if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY) 
                   and p.default is p.empty)

# or better readable syntax

signature = inspect.signature(repl)
empty_args = [p for p in signature.parameters.values() 
              if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY) 
              and p.default is p.empty]
n_empty_args = len(empty_args)

# followed by

if n_empty_args > 1:
    raise TypeError("callable should have at most one positional without a default")

n_empty_args != 1 might work, unless the first arg has a default for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too complicated, instead maybe you can introspect the actual error message in _map and if its a certain class of errors, just re-raise the TypeError

tm.assert_series_equal(result, exp)

def test_repeat(self):
values = Series(['a', 'b', NA, 'c', NA, 'd'])
Expand Down