Skip to content

Commit 397252f

Browse files
dsaxtonKevin D Smith
authored and
Kevin D Smith
committed
API: Deprecate regex=True default in Series.str.replace (pandas-dev#36695)
1 parent bb0c553 commit 397252f

File tree

6 files changed

+64
-39
lines changed

6 files changed

+64
-39
lines changed

doc/source/user_guide/text.rst

+22-25
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ i.e., from the end of the string to the beginning of the string:
255255
256256
s2.str.rsplit("_", expand=True, n=1)
257257
258-
``replace`` by default replaces `regular expressions
258+
``replace`` optionally uses `regular expressions
259259
<https://docs.python.org/3/library/re.html>`__:
260260

261261
.. ipython:: python
@@ -265,35 +265,27 @@ i.e., from the end of the string to the beginning of the string:
265265
dtype="string",
266266
)
267267
s3
268-
s3.str.replace("^.a|dog", "XX-XX ", case=False)
268+
s3.str.replace("^.a|dog", "XX-XX ", case=False, regex=True)
269269
270-
Some caution must be taken to keep regular expressions in mind! For example, the
271-
following code will cause trouble because of the regular expression meaning of
272-
``$``:
273-
274-
.. ipython:: python
275-
276-
# Consider the following badly formatted financial data
277-
dollars = pd.Series(["12", "-$10", "$10,000"], dtype="string")
278-
279-
# This does what you'd naively expect:
280-
dollars.str.replace("$", "")
270+
.. warning::
281271

282-
# But this doesn't:
283-
dollars.str.replace("-$", "-")
272+
Some caution must be taken when dealing with regular expressions! The current behavior
273+
is to treat single character patterns as literal strings, even when ``regex`` is set
274+
to ``True``. This behavior is deprecated and will be removed in a future version so
275+
that the ``regex`` keyword is always respected.
284276

285-
# We need to escape the special character (for >1 len patterns)
286-
dollars.str.replace(r"-\$", "-")
277+
.. versionchanged:: 1.2.0
287278

288-
If you do want literal replacement of a string (equivalent to
289-
:meth:`str.replace`), you can set the optional ``regex`` parameter to
290-
``False``, rather than escaping each character. In this case both ``pat``
291-
and ``repl`` must be strings:
279+
If you want literal replacement of a string (equivalent to :meth:`str.replace`), you
280+
can set the optional ``regex`` parameter to ``False``, rather than escaping each
281+
character. In this case both ``pat`` and ``repl`` must be strings:
292282

293283
.. ipython:: python
294284
285+
dollars = pd.Series(["12", "-$10", "$10,000"], dtype="string")
286+
295287
# These lines are equivalent
296-
dollars.str.replace(r"-\$", "-")
288+
dollars.str.replace(r"-\$", "-", regex=True)
297289
dollars.str.replace("-$", "-", regex=False)
298290
299291
The ``replace`` method can also take a callable as replacement. It is called
@@ -310,7 +302,10 @@ positional argument (a regex object) and return a string.
310302
return m.group(0)[::-1]
311303
312304
313-
pd.Series(["foo 123", "bar baz", np.nan], dtype="string").str.replace(pat, repl)
305+
pd.Series(
306+
["foo 123", "bar baz", np.nan],
307+
dtype="string"
308+
).str.replace(pat, repl, regex=True)
314309
315310
# Using regex groups
316311
pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
@@ -320,7 +315,9 @@ positional argument (a regex object) and return a string.
320315
return m.group("two").swapcase()
321316
322317
323-
pd.Series(["Foo Bar Baz", np.nan], dtype="string").str.replace(pat, repl)
318+
pd.Series(["Foo Bar Baz", np.nan], dtype="string").str.replace(
319+
pat, repl, regex=True
320+
)
324321
325322
The ``replace`` method also accepts a compiled regular expression object
326323
from :func:`re.compile` as a pattern. All flags should be included in the
@@ -331,7 +328,7 @@ compiled regular expression object.
331328
import re
332329
333330
regex_pat = re.compile(r"^.a|dog", flags=re.IGNORECASE)
334-
s3.str.replace(regex_pat, "XX-XX ")
331+
s3.str.replace(regex_pat, "XX-XX ", regex=True)
335332
336333
Including a ``flags`` argument when calling ``replace`` with a compiled
337334
regular expression object will raise a ``ValueError``.

doc/source/whatsnew/v1.2.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,7 @@ Deprecations
287287
- Deprecated indexing :class:`DataFrame` rows with datetime-like strings ``df[string]``, use ``df.loc[string]`` instead (:issue:`36179`)
288288
- Deprecated casting an object-dtype index of ``datetime`` objects to :class:`DatetimeIndex` in the :class:`Series` constructor (:issue:`23598`)
289289
- Deprecated :meth:`Index.is_all_dates` (:issue:`27744`)
290+
- The default value of ``regex`` for :meth:`Series.str.replace` will change from ``True`` to ``False`` in a future release. In addition, single character regular expressions will *not* be treated as literal strings when ``regex=True`` is set. (:issue:`24804`)
290291
- Deprecated automatic alignment on comparison operations between :class:`DataFrame` and :class:`Series`, do ``frame, ser = frame.align(ser, axis=1, copy=False)`` before e.g. ``frame == ser`` (:issue:`28759`)
291292
- :meth:`Rolling.count` with ``min_periods=None`` will default to the size of the window in a future version (:issue:`31302`)
292293
- Deprecated slice-indexing on timezone-aware :class:`DatetimeIndex` with naive ``datetime`` objects, to match scalar indexing behavior (:issue:`36148`)

pandas/core/reshape/melt.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -483,7 +483,7 @@ def melt_stub(df, stub: str, i, j, value_vars, sep: str):
483483
var_name=j,
484484
)
485485
newdf[j] = Categorical(newdf[j])
486-
newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "")
486+
newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "", regex=True)
487487

488488
# GH17627 Cast numerics suffixes to int/float
489489
newdf[j] = to_numeric(newdf[j], errors="ignore")

pandas/core/strings/accessor.py

+15-1
Original file line numberDiff line numberDiff line change
@@ -1178,7 +1178,7 @@ def fullmatch(self, pat, case=True, flags=0, na=None):
11781178
return self._wrap_result(result, fill_value=na, returns_string=False)
11791179

11801180
@forbid_nonstring_types(["bytes"])
1181-
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
1181+
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=None):
11821182
r"""
11831183
Replace each occurrence of pattern/regex in the Series/Index.
11841184
@@ -1296,6 +1296,20 @@ def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
12961296
2 NaN
12971297
dtype: object
12981298
"""
1299+
if regex is None:
1300+
if isinstance(pat, str) and any(c in pat for c in ".+*|^$?[](){}\\"):
1301+
# warn only in cases where regex behavior would differ from literal
1302+
msg = (
1303+
"The default value of regex will change from True to False "
1304+
"in a future version."
1305+
)
1306+
if len(pat) == 1:
1307+
msg += (
1308+
" In addition, single character regular expressions will"
1309+
"*not* be treated as literal strings when regex=True."
1310+
)
1311+
warnings.warn(msg, FutureWarning, stacklevel=3)
1312+
regex = True
12991313
result = self._array._str_replace(
13001314
pat, repl, n=n, case=case, flags=flags, regex=regex
13011315
)

pandas/tests/series/methods/test_replace.py

+11
Original file line numberDiff line numberDiff line change
@@ -449,3 +449,14 @@ def test_replace_with_compiled_regex(self):
449449
result = s.replace({regex: "z"}, regex=True)
450450
expected = pd.Series(["z", "b", "c"])
451451
tm.assert_series_equal(result, expected)
452+
453+
@pytest.mark.parametrize("pattern", ["^.$", "."])
454+
def test_str_replace_regex_default_raises_warning(self, pattern):
455+
# https://github.com/pandas-dev/pandas/pull/24809
456+
s = pd.Series(["a", "b", "c"])
457+
msg = r"The default value of regex will change from True to False"
458+
if len(pattern) == 1:
459+
msg += r".*single character regular expressions.*not.*literal strings"
460+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False) as w:
461+
s.str.replace(pattern, "")
462+
assert re.match(msg, str(w[0].message))

pandas/tests/test_strings.py

+14-12
Original file line numberDiff line numberDiff line change
@@ -984,11 +984,11 @@ def test_casemethods(self):
984984
def test_replace(self):
985985
values = Series(["fooBAD__barBAD", np.nan])
986986

987-
result = values.str.replace("BAD[_]*", "")
987+
result = values.str.replace("BAD[_]*", "", regex=True)
988988
exp = Series(["foobar", np.nan])
989989
tm.assert_series_equal(result, exp)
990990

991-
result = values.str.replace("BAD[_]*", "", n=1)
991+
result = values.str.replace("BAD[_]*", "", n=1, regex=True)
992992
exp = Series(["foobarBAD", np.nan])
993993
tm.assert_series_equal(result, exp)
994994

@@ -997,15 +997,17 @@ def test_replace(self):
997997
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
998998
)
999999

1000-
rs = Series(mixed).str.replace("BAD[_]*", "")
1000+
rs = Series(mixed).str.replace("BAD[_]*", "", regex=True)
10011001
xp = Series(["a", np.nan, "b", np.nan, np.nan, "foo", np.nan, np.nan, np.nan])
10021002
assert isinstance(rs, Series)
10031003
tm.assert_almost_equal(rs, xp)
10041004

10051005
# flags + unicode
10061006
values = Series([b"abcd,\xc3\xa0".decode("utf-8")])
10071007
exp = Series([b"abcd, \xc3\xa0".decode("utf-8")])
1008-
result = values.str.replace(r"(?<=\w),(?=\w)", ", ", flags=re.UNICODE)
1008+
result = values.str.replace(
1009+
r"(?<=\w),(?=\w)", ", ", flags=re.UNICODE, regex=True
1010+
)
10091011
tm.assert_series_equal(result, exp)
10101012

10111013
# GH 13438
@@ -1023,7 +1025,7 @@ def test_replace_callable(self):
10231025

10241026
# test with callable
10251027
repl = lambda m: m.group(0).swapcase()
1026-
result = values.str.replace("[a-z][A-Z]{2}", repl, n=2)
1028+
result = values.str.replace("[a-z][A-Z]{2}", repl, n=2, regex=True)
10271029
exp = Series(["foObaD__baRbaD", np.nan])
10281030
tm.assert_series_equal(result, exp)
10291031

@@ -1049,7 +1051,7 @@ def test_replace_callable(self):
10491051
values = Series(["Foo Bar Baz", np.nan])
10501052
pat = r"(?P<first>\w+) (?P<middle>\w+) (?P<last>\w+)"
10511053
repl = lambda m: m.group("middle").swapcase()
1052-
result = values.str.replace(pat, repl)
1054+
result = values.str.replace(pat, repl, regex=True)
10531055
exp = Series(["bAR", np.nan])
10541056
tm.assert_series_equal(result, exp)
10551057

@@ -1059,11 +1061,11 @@ def test_replace_compiled_regex(self):
10591061

10601062
# test with compiled regex
10611063
pat = re.compile(r"BAD[_]*")
1062-
result = values.str.replace(pat, "")
1064+
result = values.str.replace(pat, "", regex=True)
10631065
exp = Series(["foobar", np.nan])
10641066
tm.assert_series_equal(result, exp)
10651067

1066-
result = values.str.replace(pat, "", n=1)
1068+
result = values.str.replace(pat, "", n=1, regex=True)
10671069
exp = Series(["foobarBAD", np.nan])
10681070
tm.assert_series_equal(result, exp)
10691071

@@ -1072,7 +1074,7 @@ def test_replace_compiled_regex(self):
10721074
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
10731075
)
10741076

1075-
rs = Series(mixed).str.replace(pat, "")
1077+
rs = Series(mixed).str.replace(pat, "", regex=True)
10761078
xp = Series(["a", np.nan, "b", np.nan, np.nan, "foo", np.nan, np.nan, np.nan])
10771079
assert isinstance(rs, Series)
10781080
tm.assert_almost_equal(rs, xp)
@@ -1110,7 +1112,7 @@ def test_replace_literal(self):
11101112
# GH16808 literal replace (regex=False vs regex=True)
11111113
values = Series(["f.o", "foo", np.nan])
11121114
exp = Series(["bao", "bao", np.nan])
1113-
result = values.str.replace("f.", "ba")
1115+
result = values.str.replace("f.", "ba", regex=True)
11141116
tm.assert_series_equal(result, exp)
11151117

11161118
exp = Series(["bao", "foo", np.nan])
@@ -3044,7 +3046,7 @@ def test_pipe_failures(self):
30443046

30453047
tm.assert_series_equal(result, exp)
30463048

3047-
result = s.str.replace("|", " ")
3049+
result = s.str.replace("|", " ", regex=False)
30483050
exp = Series(["A B C"])
30493051

30503052
tm.assert_series_equal(result, exp)
@@ -3345,7 +3347,7 @@ def test_replace_moar(self):
33453347
)
33463348
tm.assert_series_equal(result, expected)
33473349

3348-
result = s.str.replace("^.a|dog", "XX-XX ", case=False)
3350+
result = s.str.replace("^.a|dog", "XX-XX ", case=False, regex=True)
33493351
expected = Series(
33503352
[
33513353
"A",

0 commit comments

Comments
 (0)