Skip to content

Commit 0f7ded2

Browse files
BUG: Replace on Series/DataFrame stops replacing after first NA (pandas-dev#57865)
* update test for GH#56599 * bug: ser/df.replace only replaces first occurence with NAs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add whatsnew * fmt fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 114a84d commit 0f7ded2

File tree

3 files changed

+17
-12
lines changed

3 files changed

+17
-12
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,7 @@ Bug fixes
298298
- Fixed bug in :meth:`DataFrameGroupBy.apply` that was returning a completely empty DataFrame when all return values of ``func`` were ``None`` instead of returning an empty DataFrame with the original columns and dtypes. (:issue:`57775`)
299299
- Fixed bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
300300
- Fixed bug in :meth:`Series.rank` that doesn't preserve missing values for nullable integers when ``na_option='keep'``. (:issue:`56976`)
301+
- Fixed bug in :meth:`Series.replace` and :meth:`DataFrame.replace` inconsistently replacing matching instances when ``regex=True`` and missing values are present. (:issue:`56599`)
301302

302303
Categorical
303304
^^^^^^^^^^^

pandas/core/array_algos/replace.py

+11-10
Original file line numberDiff line numberDiff line change
@@ -93,17 +93,18 @@ def _check_comparison_types(
9393
)
9494

9595
# GH#32621 use mask to avoid comparing to NAs
96-
if isinstance(a, np.ndarray):
96+
if isinstance(a, np.ndarray) and mask is not None:
9797
a = a[mask]
98-
99-
result = op(a)
100-
101-
if isinstance(result, np.ndarray) and mask is not None:
102-
# The shape of the mask can differ to that of the result
103-
# since we may compare only a subset of a's or b's elements
104-
tmp = np.zeros(mask.shape, dtype=np.bool_)
105-
np.place(tmp, mask, result)
106-
result = tmp
98+
result = op(a)
99+
100+
if isinstance(result, np.ndarray):
101+
# The shape of the mask can differ to that of the result
102+
# since we may compare only a subset of a's or b's elements
103+
tmp = np.zeros(mask.shape, dtype=np.bool_)
104+
np.place(tmp, mask, result)
105+
result = tmp
106+
else:
107+
result = op(a)
107108

108109
_check_comparison_types(result, a, b)
109110
return result

pandas/tests/series/methods/test_replace.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -616,15 +616,18 @@ def test_replace_with_compiled_regex(self):
616616

617617
def test_pandas_replace_na(self):
618618
# GH#43344
619-
ser = pd.Series(["AA", "BB", "CC", "DD", "EE", "", pd.NA], dtype="string")
619+
# GH#56599
620+
ser = pd.Series(["AA", "BB", "CC", "DD", "EE", "", pd.NA, "AA"], dtype="string")
620621
regex_mapping = {
621622
"AA": "CC",
622623
"BB": "CC",
623624
"EE": "CC",
624625
"CC": "CC-REPL",
625626
}
626627
result = ser.replace(regex_mapping, regex=True)
627-
exp = pd.Series(["CC", "CC", "CC-REPL", "DD", "CC", "", pd.NA], dtype="string")
628+
exp = pd.Series(
629+
["CC", "CC", "CC-REPL", "DD", "CC", "", pd.NA, "CC"], dtype="string"
630+
)
628631
tm.assert_series_equal(result, exp)
629632

630633
@pytest.mark.parametrize(

0 commit comments

Comments
 (0)