Skip to content

Commit 6ef4be3

Browse files
Liam3851jreback
authored andcommitted
ENH: Allow literal (non-regex) replacement using .str.replace pandas-dev#16808 (pandas-dev#19584)
1 parent 318a287 commit 6ef4be3

File tree

4 files changed

+105
-37
lines changed

4 files changed

+105
-37
lines changed

doc/source/text.rst

+21-7
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,8 @@ i.e., from the end of the string to the beginning of the string:
118118
119119
s2.str.rsplit('_', expand=True, n=1)
120120
121-
Methods like ``replace`` and ``findall`` take `regular expressions
122-
<https://docs.python.org/3/library/re.html>`__, too:
121+
``replace`` by default replaces `regular expressions
122+
<https://docs.python.org/3/library/re.html>`__:
123123

124124
.. ipython:: python
125125
@@ -146,12 +146,25 @@ following code will cause trouble because of the regular expression meaning of
146146
# We need to escape the special character (for >1 len patterns)
147147
dollars.str.replace(r'-\$', '-')
148148
149+
.. versionadded:: 0.23.0
150+
151+
If you do want literal replacement of a string (equivalent to
152+
:meth:`str.replace`), you can set the optional ``regex`` parameter to
153+
``False``, rather than escaping each character. In this case both ``pat``
154+
and ``repl`` must be strings:
155+
156+
.. ipython:: python
157+
158+
# These lines are equivalent
159+
dollars.str.replace(r'-\$', '-')
160+
dollars.str.replace('-$', '-', regex=False)
161+
162+
.. versionadded:: 0.20.0
163+
149164
The ``replace`` method can also take a callable as replacement. It is called
150165
on every ``pat`` using :func:`re.sub`. The callable should expect one
151166
positional argument (a regex object) and return a string.
152167

153-
.. versionadded:: 0.20.0
154-
155168
.. ipython:: python
156169
157170
# Reverse every lowercase alphabetic word
@@ -164,12 +177,12 @@ positional argument (a regex object) and return a string.
164177
repl = lambda m: m.group('two').swapcase()
165178
pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl)
166179
180+
.. versionadded:: 0.20.0
181+
167182
The ``replace`` method also accepts a compiled regular expression object
168183
from :func:`re.compile` as a pattern. All flags should be included in the
169184
compiled regular expression object.
170185

171-
.. versionadded:: 0.20.0
172-
173186
.. ipython:: python
174187
175188
import re
@@ -186,6 +199,7 @@ regular expression object will raise a ``ValueError``.
186199
---------------------------------------------------------------------------
187200
ValueError: case and flags cannot be set when pat is a compiled regex
188201

202+
189203
Indexing with ``.str``
190204
----------------------
191205

@@ -432,7 +446,7 @@ Method Summary
432446
:meth:`~Series.str.join`;Join strings in each element of the Series with passed separator
433447
:meth:`~Series.str.get_dummies`;Split strings on the delimiter returning DataFrame of dummy variables
434448
:meth:`~Series.str.contains`;Return boolean array if each string contains pattern/regex
435-
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex with some other string or the return value of a callable given the occurrence
449+
:meth:`~Series.str.replace`;Replace occurrences of pattern/regex/string with some other string or the return value of a callable given the occurrence
436450
:meth:`~Series.str.repeat`;Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
437451
:meth:`~Series.str.pad`;"Add whitespace to left, right, or both sides of strings"
438452
:meth:`~Series.str.center`;Equivalent to ``str.center``

doc/source/whatsnew/v0.23.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,7 @@ Other API Changes
620620
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
621621
- :class:`DateOffset` objects render more simply, e.g. ``<DateOffset: days=1>`` instead of ``<DateOffset: kwds={'days': 1}>`` (:issue:`19403`)
622622
- ``Categorical.fillna`` now validates its ``value`` and ``method`` keyword arguments. It now raises when both or none are specified, matching the behavior of :meth:`Series.fillna` (:issue:`19682`)
623+
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)
623624

624625
.. _whatsnew_0230.deprecations:
625626

pandas/core/strings.py

+62-30
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,7 @@ def str_endswith(arr, pat, na=np.nan):
306306
return _na_map(f, arr, na, dtype=bool)
307307

308308

309-
def str_replace(arr, pat, repl, n=-1, case=None, flags=0):
309+
def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True):
310310
r"""
311311
Replace occurrences of pattern/regex in the Series/Index with
312312
some other string. Equivalent to :meth:`str.replace` or
@@ -337,25 +337,50 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0):
337337
flags : int, default 0 (no flags)
338338
- re module flags, e.g. re.IGNORECASE
339339
- Cannot be set if `pat` is a compiled regex
340+
regex : boolean, default True
341+
- If True, assumes the passed-in pattern is a regular expression.
342+
- If False, treats the pattern as a literal string
343+
- Cannot be set to False if `pat` is a compiled regex or `repl` is
344+
a callable.
345+
346+
.. versionadded:: 0.23.0
340347
341348
Returns
342349
-------
343350
replaced : Series/Index of objects
344351
352+
Raises
353+
------
354+
ValueError
355+
* if `regex` is False and `repl` is a callable or `pat` is a compiled
356+
regex
357+
* if `pat` is a compiled regex and `case` or `flags` is set
358+
345359
Notes
346360
-----
347361
When `pat` is a compiled regex, all flags should be included in the
348-
compiled regex. Use of `case` or `flags` with a compiled regex will
349-
raise an error.
362+
compiled regex. Use of `case`, `flags`, or `regex=False` with a compiled
363+
regex will raise an error.
350364
351365
Examples
352366
--------
353-
When `repl` is a string, every `pat` is replaced as with
354-
:meth:`str.replace`. NaN value(s) in the Series are left as is.
367+
When `pat` is a string and `regex` is True (the default), the given `pat`
368+
is compiled as a regex. When `repl` is a string, it replaces matching
369+
regex patterns as with :meth:`re.sub`. NaN value(s) in the Series are
370+
left as is:
371+
372+
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True)
373+
0 bao
374+
1 baz
375+
2 NaN
376+
dtype: object
355377
356-
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f', 'b')
357-
0 boo
358-
1 buz
378+
When `pat` is a string and `regex` is False, every `pat` is replaced with
379+
`repl` as with :meth:`str.replace`:
380+
381+
>>> pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False)
382+
0 bao
383+
1 fuz
359384
2 NaN
360385
dtype: object
361386
@@ -397,34 +422,41 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0):
397422
1 bar
398423
2 NaN
399424
dtype: object
425+
400426
"""
401427

402428
# Check whether repl is valid (GH 13438, GH 15055)
403429
if not (is_string_like(repl) or callable(repl)):
404430
raise TypeError("repl must be a string or callable")
405431

406432
is_compiled_re = is_re(pat)
407-
if is_compiled_re:
408-
if (case is not None) or (flags != 0):
409-
raise ValueError("case and flags cannot be set"
410-
" when pat is a compiled regex")
411-
else:
412-
# not a compiled regex
413-
# set default case
414-
if case is None:
415-
case = True
416-
417-
# add case flag, if provided
418-
if case is False:
419-
flags |= re.IGNORECASE
420-
421-
use_re = is_compiled_re or len(pat) > 1 or flags or callable(repl)
422-
423-
if use_re:
424-
n = n if n >= 0 else 0
425-
regex = re.compile(pat, flags=flags)
426-
f = lambda x: regex.sub(repl=repl, string=x, count=n)
433+
if regex:
434+
if is_compiled_re:
435+
if (case is not None) or (flags != 0):
436+
raise ValueError("case and flags cannot be set"
437+
" when pat is a compiled regex")
438+
else:
439+
# not a compiled regex
440+
# set default case
441+
if case is None:
442+
case = True
443+
444+
# add case flag, if provided
445+
if case is False:
446+
flags |= re.IGNORECASE
447+
if is_compiled_re or len(pat) > 1 or flags or callable(repl):
448+
n = n if n >= 0 else 0
449+
compiled = re.compile(pat, flags=flags)
450+
f = lambda x: compiled.sub(repl=repl, string=x, count=n)
451+
else:
452+
f = lambda x: x.replace(pat, repl, n)
427453
else:
454+
if is_compiled_re:
455+
raise ValueError("Cannot use a compiled regex as replacement "
456+
"pattern with regex=False")
457+
if callable(repl):
458+
raise ValueError("Cannot use a callable replacement when "
459+
"regex=False")
428460
f = lambda x: x.replace(pat, repl, n)
429461

430462
return _na_map(f, arr)
@@ -1596,9 +1628,9 @@ def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None):
15961628
return self._wrap_result(result)
15971629

15981630
@copy(str_replace)
1599-
def replace(self, pat, repl, n=-1, case=None, flags=0):
1631+
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
16001632
result = str_replace(self._data, pat, repl, n=n, case=case,
1601-
flags=flags)
1633+
flags=flags, regex=regex)
16021634
return self._wrap_result(result)
16031635

16041636
@copy(str_repeat)

pandas/tests/test_strings.py

+21
Original file line numberDiff line numberDiff line change
@@ -530,6 +530,27 @@ def test_replace_compiled_regex(self):
530530
exp = Series(['foObaD__baRbaD', NA])
531531
tm.assert_series_equal(result, exp)
532532

533+
def test_replace_literal(self):
534+
# GH16808 literal replace (regex=False vs regex=True)
535+
values = Series(['f.o', 'foo', NA])
536+
exp = Series(['bao', 'bao', NA])
537+
result = values.str.replace('f.', 'ba')
538+
tm.assert_series_equal(result, exp)
539+
540+
exp = Series(['bao', 'foo', NA])
541+
result = values.str.replace('f.', 'ba', regex=False)
542+
tm.assert_series_equal(result, exp)
543+
544+
# Cannot do a literal replace if given a callable repl or compiled
545+
# pattern
546+
callable_repl = lambda m: m.group(0).swapcase()
547+
compiled_pat = re.compile('[a-z][A-Z]{2}')
548+
549+
pytest.raises(ValueError, values.str.replace, 'abc', callable_repl,
550+
regex=False)
551+
pytest.raises(ValueError, values.str.replace, compiled_pat, '',
552+
regex=False)
553+
533554
def test_repeat(self):
534555
values = Series(['a', 'b', NA, 'c', NA, 'd'])
535556

0 commit comments

Comments
 (0)