-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: DataFrame.replace when the replacement value was explicitly None #46404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -661,6 +661,20 @@ def test_replace_simple_nested_dict_with_nonexistent_value(self): | |
result = df.replace({"col": {-1: "-", 1: "a", 4: "b"}}) | ||
tm.assert_frame_equal(expected, result) | ||
|
||
def test_replace_NA_with_None(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in all of the relevant examples the both the value being replaced and the replacement are NA. are these the only affected cases? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC for a list like to_replace None is treated explicitly at the moment, whereas if using a scalar None, the behavior is different in some cases. My understanding is that users are therefore using a dictionary to get the explicit replacement behavior. To make these consistent, we would need to deprecate this? |
||
# gh-45601 | ||
df = DataFrame({"value": [42, None]}).astype({"value": "Int64"}) | ||
result = df.replace({pd.NA: None}) | ||
expected = DataFrame({"value": [42, None]}, dtype=object) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_replace_NAT_with_None(self): | ||
# gh-45836 | ||
df = DataFrame([pd.NaT, pd.NaT]) | ||
result = df.replace({pd.NaT: None, np.NaN: None}) | ||
expected = DataFrame([None, None]) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_replace_value_is_none(self, datetime_frame): | ||
orig_value = datetime_frame.iloc[0, 0] | ||
orig2 = datetime_frame.iloc[1, 0] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here instead of in 'replace'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because on main the recursion error occurs and we normally fix backports by opening PR against main and then backporting rather than against the backport branch directly.
Also, we would split the blocks in Block.replace which we didn't do on 1.3.5 and the regression fix restores previous behavior for now, see #45601 (comment).
I think is we do move to replace after the recursion is fixed we could also backport as a bug fix if we think that the block splitting is desirable to be consistent for 1.4.x
None handling is also slightly different in Block.replace than for a list-like so I suspect would need some other changes which happy as a followup on master.
This PR was a very targeted regression fix as a suitable backport for 1.4.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, thanks.