Skip to content

REGR: DataFrame.replace when the replacement value was explicitly None #46404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Fixed regressions
- Fixed memory performance regression in :meth:`Series.fillna` when called on a :class:`DataFrame` column with ``inplace=True`` (:issue:`46149`)
- Provided an alternative solution for passing custom Excel formats in :meth:`.Styler.to_excel`, which was a regression based on stricter CSS validation. Examples available in the documentation for :meth:`.Styler.format` (:issue:`46152`)
- Fixed regression in :meth:`DataFrame.replace` when a replacement value was also a target for replacement (:issue:`46306`)
- Fixed regression in :meth:`DataFrame.replace` when the replacement value was explicitly ``None`` when passed in a dictionary to ``to_replace`` (:issue:`45601`, :issue:`45836`)
- Fixed regression when setting values with :meth:`DataFrame.loc` losing :class:`MultiIndex` names if :class:`DataFrame` was empty before (:issue:`46317`)
-

Expand Down
7 changes: 7 additions & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -777,6 +777,13 @@ def _replace_coerce(
mask=mask,
)
else:
if value is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why here instead of in 'replace'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because on main the recursion error occurs and we normally fix backports by opening PR against main and then backporting rather than against the backport branch directly.

Also, we would split the blocks in Block.replace which we didn't do on 1.3.5 and the regression fix restores previous behavior for now, see #45601 (comment).

I think is we do move to replace after the recursion is fixed we could also backport as a bug fix if we think that the block splitting is desirable to be consistent for 1.4.x

None handling is also slightly different in Block.replace than for a list-like so I suspect would need some other changes which happy as a followup on master.

This PR was a very targeted regression fix as a suitable backport for 1.4.x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, thanks.

# gh-45601, gh-45836
nb = self.astype(np.dtype(object), copy=False)
if nb is self and not inplace:
nb = nb.copy()
putmask_inplace(nb.values, mask, value)
return [nb]
return self.replace(
to_replace=to_replace, value=value, inplace=inplace, mask=mask
)
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -661,6 +661,20 @@ def test_replace_simple_nested_dict_with_nonexistent_value(self):
result = df.replace({"col": {-1: "-", 1: "a", 4: "b"}})
tm.assert_frame_equal(expected, result)

def test_replace_NA_with_None(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all of the relevant examples the both the value being replaced and the replacement are NA. are these the only affected cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC for a list like to_replace None is treated explicitly at the moment, whereas if using a scalar None, the behavior is different in some cases. My understanding is that users are therefore using a dictionary to get the explicit replacement behavior. To make these consistent, we would need to deprecate this?

# gh-45601
df = DataFrame({"value": [42, None]}).astype({"value": "Int64"})
result = df.replace({pd.NA: None})
expected = DataFrame({"value": [42, None]}, dtype=object)
tm.assert_frame_equal(result, expected)

def test_replace_NAT_with_None(self):
# gh-45836
df = DataFrame([pd.NaT, pd.NaT])
result = df.replace({pd.NaT: None, np.NaN: None})
expected = DataFrame([None, None])
tm.assert_frame_equal(result, expected)

def test_replace_value_is_none(self, datetime_frame):
orig_value = datetime_frame.iloc[0, 0]
orig2 = datetime_frame.iloc[1, 0]
Expand Down