Skip to content

Commit be3f2ae

Browse files
Joost Kranendonkjreback
Joost Kranendonk
authored andcommitted
BUG: Fix for .str.replace with repl function
.str.replace now accepts a callable (function) as replacement string. It now raises a TypeError when repl is not string like nor a callable. Docstring updated accordingly. closes pandas-dev#15055 Author: Joost Kranendonk <[email protected]> Author: Joost Kranendonk <[email protected]> Closes pandas-dev#15056 from hzpc-joostk/pandas-GH15055-patch-1 and squashes the following commits: 826730c [Joost Kranendonk] simplify .str.replace TypeError reraising and test 90779ce [Joost Kranendonk] fix linting issues c2cc13a [Joost Kranendonk] Update v0.20.0.txt e15dcdf [Joost Kranendonk] fix bug catch TypeError with wrong number of args 40c0d97 [Joost Kranendonk] improve .str.replace with callable f15ee2a [Joost Kranendonk] improve test .str.replace with callable 14beb21 [Joost Kranendonk] Add test for .str.replace with regex named groups 27065a2 [Joost Kranendonk] Reraise TypeError only with wrong number of args ae04a3e [Joost Kranendonk] Add whatsnew for .str.replace with callable repl 067a7a8 [Joost Kranendonk] Fix testing bug for .str.replace 30d4727 [Joost Kranendonk] Bug fix in .str.replace type checking done wrong 4baa0a7 [Joost Kranendonk] add tests for .str.replace with callable repl 91c883d [Joost Kranendonk] Update .str.replace docstring 6ecc43d [Joost Kranendonk] BUG: Fix for .str.replace with repl function
1 parent a1b6587 commit be3f2ae

File tree

4 files changed

+121
-8
lines changed

4 files changed

+121
-8
lines changed

doc/source/text.rst

+20-1
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,25 @@ following code will cause trouble because of the regular expression meaning of
146146
# We need to escape the special character (for >1 len patterns)
147147
dollars.str.replace(r'-\$', '-')
148148
149+
The ``replace`` method can also take a callable as replacement. It is called
150+
on every ``pat`` using :func:`re.sub`. The callable should expect one
151+
positional argument (a regex object) and return a string.
152+
153+
.. versionadded:: 0.20.0
154+
155+
.. ipython:: python
156+
157+
# Reverse every lowercase alphabetic word
158+
pat = r'[a-z]+'
159+
repl = lambda m: m.group(0)[::-1]
160+
pd.Series(['foo 123', 'bar baz', np.nan]).str.replace(pat, repl)
161+
162+
# Using regex groups
163+
pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
164+
repl = lambda m: m.group('two').swapcase()
165+
pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
166+
167+
149168
Indexing with ``.str``
150169
----------------------
151170

@@ -406,7 +425,7 @@ Method Summary
406425
:meth:`~Series.str.join`;Join strings in each element of the Series with passed separator
407426
:meth:`~Series.str.get_dummies`;Split strings on the delimiter returning DataFrame of dummy variables
408427
:meth:`~Series.str.contains`;Return boolean array if each string contains pattern/regex
409-
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string
428+
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string or the return value of a callable given the occurrence
410429
:meth:`~Series.str.repeat`;Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
411430
:meth:`~Series.str.pad`;"Add whitespace to left, right, or both sides of strings"
412431
:meth:`~Series.str.center`;Equivalent to ``str.center``

doc/source/whatsnew/v0.20.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ New features
2424
~~~~~~~~~~~~
2525

2626
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
27+
- ``.str.replace`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
2728

2829

2930

pandas/core/strings.py

+63-7
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,17 @@ def _map(f, arr, na_mask=False, na_value=np.nan, dtype=object):
167167
try:
168168
convert = not all(mask)
169169
result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert)
170-
except (TypeError, AttributeError):
170+
except (TypeError, AttributeError) as e:
171+
# Reraise the exception if callable `f` got wrong number of args.
172+
# The user may want to be warned by this, instead of getting NaN
173+
if compat.PY2:
174+
p_err = r'takes (no|(exactly|at (least|most)) ?\d+) arguments?'
175+
else:
176+
p_err = (r'((takes)|(missing)) (?(2)from \d+ to )?\d+ '
177+
r'(?(3)required )positional arguments?')
178+
179+
if len(e.args) >= 1 and re.search(p_err, e.args[0]):
180+
raise e
171181

172182
def g(x):
173183
try:
@@ -303,8 +313,13 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
303313
----------
304314
pat : string
305315
Character sequence or regular expression
306-
repl : string
307-
Replacement sequence
316+
repl : string or callable
317+
Replacement string or a callable. The callable is passed the regex
318+
match object and must return a replacement string to be used.
319+
See :func:`re.sub`.
320+
321+
.. versionadded:: 0.20.0
322+
308323
n : int, default -1 (all)
309324
Number of replacements to make from start
310325
case : boolean, default True
@@ -315,12 +330,53 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
315330
Returns
316331
-------
317332
replaced : Series/Index of objects
333+
334+
Examples
335+
--------
336+
When ``repl`` is a string, every ``pat`` is replaced as with
337+
:meth:`str.replace`. NaN value(s) in the Series are left as is.
338+
339+
>>> Series(['foo', 'fuz', np.nan]).str.replace('f', 'b')
340+
0 boo
341+
1 buz
342+
2 NaN
343+
dtype: object
344+
345+
When ``repl`` is a callable, it is called on every ``pat`` using
346+
:func:`re.sub`. The callable should expect one positional argument
347+
(a regex object) and return a string.
348+
349+
To get the idea:
350+
351+
>>> Series(['foo', 'fuz', np.nan]).str.replace('f', repr)
352+
0 <_sre.SRE_Match object; span=(0, 1), match='f'>oo
353+
1 <_sre.SRE_Match object; span=(0, 1), match='f'>uz
354+
2 NaN
355+
dtype: object
356+
357+
Reverse every lowercase alphabetic word:
358+
359+
>>> repl = lambda m: m.group(0)[::-1]
360+
>>> Series(['foo 123', 'bar baz', np.nan]).str.replace(r'[a-z]+', repl)
361+
0 oof 123
362+
1 rab zab
363+
2 NaN
364+
dtype: object
365+
366+
Using regex groups:
367+
368+
>>> pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
369+
>>> repl = lambda m: m.group('two').swapcase()
370+
>>> Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
371+
0 bAR
372+
1 NaN
373+
dtype: object
318374
"""
319375

320-
# Check whether repl is valid (GH 13438)
321-
if not is_string_like(repl):
322-
raise TypeError("repl must be a string")
323-
use_re = not case or len(pat) > 1 or flags
376+
# Check whether repl is valid (GH 13438, GH 15055)
377+
if not (is_string_like(repl) or callable(repl)):
378+
raise TypeError("repl must be a string or callable")
379+
use_re = not case or len(pat) > 1 or flags or callable(repl)
324380

325381
if use_re:
326382
if not case:

pandas/tests/test_strings.py

+37
Original file line numberDiff line numberDiff line change
@@ -436,6 +436,43 @@ def test_replace(self):
436436
values = klass(data)
437437
self.assertRaises(TypeError, values.str.replace, 'a', repl)
438438

439+
def test_replace_callable(self):
440+
# GH 15055
441+
values = Series(['fooBAD__barBAD', NA])
442+
443+
# test with callable
444+
repl = lambda m: m.group(0).swapcase()
445+
result = values.str.replace('[a-z][A-Z]{2}', repl, n=2)
446+
exp = Series(['foObaD__baRbaD', NA])
447+
tm.assert_series_equal(result, exp)
448+
449+
# test with wrong number of arguments, raising an error
450+
if compat.PY2:
451+
p_err = r'takes (no|(exactly|at (least|most)) ?\d+) arguments?'
452+
else:
453+
p_err = (r'((takes)|(missing)) (?(2)from \d+ to )?\d+ '
454+
r'(?(3)required )positional arguments?')
455+
456+
repl = lambda: None
457+
with tm.assertRaisesRegexp(TypeError, p_err):
458+
values.str.replace('a', repl)
459+
460+
repl = lambda m, x: None
461+
with tm.assertRaisesRegexp(TypeError, p_err):
462+
values.str.replace('a', repl)
463+
464+
repl = lambda m, x, y=None: None
465+
with tm.assertRaisesRegexp(TypeError, p_err):
466+
values.str.replace('a', repl)
467+
468+
# test regex named groups
469+
values = Series(['Foo Bar Baz', NA])
470+
pat = r"(?P<first>\w+) (?P<middle>\w+) (?P<last>\w+)"
471+
repl = lambda m: m.group('middle').swapcase()
472+
result = values.str.replace(pat, repl)
473+
exp = Series(['bAR', NA])
474+
tm.assert_series_equal(result, exp)
475+
439476
def test_repeat(self):
440477
values = Series(['a', 'b', NA, 'c', NA, 'd'])
441478

0 commit comments

Comments
 (0)