Skip to content

BUG: pd.Series.replace does not preserve the original dtype #33622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
11 changes: 9 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1005,8 +1005,15 @@ def coerce_to_target_dtype(self, other):
we can also safely try to coerce to the same dtype
and will receive the same block
"""
# if we cannot then coerce to object
dtype, _ = infer_dtype_from(other, pandas_dtype=True)
if isinstance(other, str):
dtype = "string"
if is_dtype_equal(self.dtype, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't hard-code things like this, instead you can try to teach infer_dtype_from to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry...I removed hard-code.

return self
else:
dtype = np.object_
else:
# if we cannot then coerce to object
dtype, _ = infer_dtype_from(other, pandas_dtype=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion from @jreback here was to make infer_dtype_from return StringDtype when appropriate, so that L1090-L1013 here wouldnt be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to move into infer_dtype_from(), but a lot of tests fail. So, I didn't move.


if is_dtype_equal(self.dtype, dtype):
return self
Expand Down
19 changes: 18 additions & 1 deletion pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ def test_replace2(self):
def test_replace_with_dictlike_and_string_dtype(self):
# GH 32621
s = pd.Series(["one", "two", np.nan], dtype="string")
expected = pd.Series(["1", "2", np.nan])
expected = pd.Series(["1", "2", np.nan], dtype="string")
result = s.replace({"one": "1", "two": "2"})
tm.assert_series_equal(expected, result)

Expand Down Expand Up @@ -406,3 +406,20 @@ def test_replace_only_one_dictlike_arg(self):
msg = "Series.replace cannot use dict-value and non-None to_replace"
with pytest.raises(ValueError, match=msg):
ser.replace(to_replace, value)

@pytest.mark.parametrize(
"series, to_replace, expected",
[
(
pd.Series(["one", "two"], dtype="string"),
{"one": "1", "two": "2"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a bunch more tests which can hit the other EA types, e.g. Intervval, Period, Datetime w/tz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test cases.

"string",
),
(pd.Series([1, 2], dtype="int64"), {1: 10, 2: 20}, "int64"),
(pd.Series([True, False], dtype="bool"), {True: False}, "bool"),
],
)
def test_replace_dtype(self, series, to_replace, expected):
# GH 33484
result = str(series.replace(to_replace).dtype)
assert expected == result