Skip to content

Commit 40c0d97

Browse files
author
Joost Kranendonk
committed
improve .str.replace with callable
- add inline comments - add examples with string and callable in __doc__ and text.rst doc - extend method summary Series.str.replace in text.rst doc
1 parent f15ee2a commit 40c0d97

File tree

2 files changed

+71
-5
lines changed

2 files changed

+71
-5
lines changed

doc/source/text.rst

+20-1
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,25 @@ following code will cause trouble because of the regular expression meaning of
146146
# We need to escape the special character (for >1 len patterns)
147147
dollars.str.replace(r'-\$', '-')
148148
149+
The ``replace`` method can also take a callable as replacement. It is called
150+
on every ``pat`` using :func:`re.sub`. The callable should expect one
151+
positional argument (a regex object) and return a string.
152+
153+
.. versionadded:: 0.20.0
154+
155+
.. ipython:: python
156+
157+
# Reverse every lowercase alphabetic word
158+
pat = r'[a-z]+'
159+
repl = lambda m: m.group(0)[::-1]
160+
pd.Series(['foo 123', 'bar baz', np.nan]).str.replace(pat, repl)
161+
162+
# Using regex groups
163+
pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
164+
repl = lambda m: m.group('two').swapcase()
165+
pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
166+
167+
149168
Indexing with ``.str``
150169
----------------------
151170

@@ -406,7 +425,7 @@ Method Summary
406425
:meth:`~Series.str.join`;Join strings in each element of the Series with passed separator
407426
:meth:`~Series.str.get_dummies`;Split strings on the delimiter returning DataFrame of dummy variables
408427
:meth:`~Series.str.contains`;Return boolean array if each string contains pattern/regex
409-
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string
428+
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string or the return value of a callable given the occurrence
410429
:meth:`~Series.str.repeat`;Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
411430
:meth:`~Series.str.pad`;"Add whitespace to left, right, or both sides of strings"
412431
:meth:`~Series.str.center`;Equivalent to ``str.center``

pandas/core/strings.py

+51-4
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ def _map(f, arr, na_mask=False, na_value=np.nan, dtype=object):
168168
convert = not all(mask)
169169
result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert)
170170
except (TypeError, AttributeError) as e:
171+
# Reraise the exception if callable `f` got wrong number of args.
172+
# The user may want to be warned by this, instead of getting NaN
171173
re_missing = (r'missing \d+ required (positional|keyword-only) '
172174
'arguments?')
173175
re_takes = (r'takes (from)?\d+ (to \d+)?positional arguments? '
@@ -311,8 +313,12 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
311313
pat : string
312314
Character sequence or regular expression
313315
repl : string or callable
314-
Replacement string or a callable, it's passed the match object and
315-
must return a replacement string to be used. See :func:`re.sub`.
316+
Replacement string or a callable. The callable is passed the regex
317+
match object and must return a replacement string to be used.
318+
See :func:`re.sub`.
319+
320+
.. versionadded:: 0.20.0
321+
316322
n : int, default -1 (all)
317323
Number of replacements to make from start
318324
case : boolean, default True
@@ -323,11 +329,52 @@ def str_replace(arr, pat, repl, n=-1, case=True, flags=0):
323329
Returns
324330
-------
325331
replaced : Series/Index of objects
332+
333+
Examples
334+
--------
335+
When ``repl`` is a string, every ``pat`` is replaced as with
336+
:meth:`str.replace`. NaN value(s) in the Series are left as is.
337+
338+
>>> Series(['foo', 'fuz', np.nan]).str.replace('f', 'b')
339+
0 boo
340+
1 buz
341+
2 NaN
342+
dtype: object
343+
344+
When ``repl`` is a callable, it is called on every ``pat`` using
345+
:func:`re.sub`. The callable should expect one positional argument
346+
(a regex object) and return a string.
347+
348+
To get the idea:
349+
350+
>>> Series(['foo', 'fuz', np.nan]).str.replace('f', repr)
351+
0 <_sre.SRE_Match object; span=(0, 1), match='f'>oo
352+
1 <_sre.SRE_Match object; span=(0, 1), match='f'>uz
353+
2 NaN
354+
dtype: object
355+
356+
Reverse every lowercase alphabetic word:
357+
358+
>>> repl = lambda m: m.group(0)[::-1]
359+
>>> Series(['foo 123', 'bar baz', np.nan]).str.replace(r'[a-z]+', repl)
360+
0 oof 123
361+
1 rab zab
362+
2 NaN
363+
dtype: object
364+
365+
Using regex groups:
366+
367+
>>> pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
368+
>>> repl = lambda m: m.group('two').swapcase()
369+
>>> Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
370+
0 bAR
371+
1 NaN
372+
dtype: object
326373
"""
327374

328-
# Check whether repl is valid (GH 13438)
375+
# Check whether repl is valid (GH 13438, GH 15055)
329376
if not (is_string_like(repl) or callable(repl)):
330-
raise TypeError("repl must be a string or function")
377+
raise TypeError("repl must be a string or callable")
331378
use_re = not case or len(pat) > 1 or flags or callable(repl)
332379

333380
if use_re:

0 commit comments

Comments
 (0)