Skip to content

BUG: RecursionError when attempting to replace np.nan values (#45725) #45749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 3, 2022
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,7 @@ Conversion
- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`)
- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`)
- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`)
- Bug when attempting to replace ``numpy.nan`` values causing ``RecursionError``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i read the original issue correctly this then this was working in 1.4 so no actual change (its only a change in master), if so delete this note

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right. I removed the note from the changelog.


Strings
^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -669,6 +669,7 @@ def replace(
# go through replace_list

values = self.values
value = self._standardize_fill_value(value) # GH#45725

if isinstance(values, Categorical):
# TODO: avoid special-casing
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -661,6 +661,14 @@ def test_replace_simple_nested_dict_with_nonexistent_value(self):
result = df.replace({"col": {-1: "-", 1: "a", 4: "b"}})
tm.assert_frame_equal(expected, result)

@pytest.mark.parametrize("to_replace, value", [(np.nan, pd.NA), (np.nan, None)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add np.nan as a target

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, but had to change the tests to explicitly use object type even after replacing. Replacing np.nan with np.nan can cause an implicit type conversion to float64. There is still discussion going on about this, but for now this is the same behavior in pandas 1.4.0 and 1.3.5 so it shouldn't be changed in this bug fix.

def test_replace_numpy_nan(self, to_replace, value):
# GH#45725 ensure numpy.nan can be replaced with pandas.NA or None
df = DataFrame({"A": [to_replace]}, dtype=object)
result = df.replace({to_replace: value})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test usage with df.replace(to_replace, value) or is this tested close by here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes done.

expected = DataFrame({"A": [value]}, dtype=object)
tm.assert_frame_equal(result, expected)

def test_replace_value_is_none(self, datetime_frame):
orig_value = datetime_frame.iloc[0, 0]
orig2 = datetime_frame.iloc[1, 0]
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,15 @@ def test_replace_explicit_none(self):
assert expected.iloc[-1] is None
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("to_replace, value", [(np.nan, pd.NA), (np.nan, None)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as above. consider just using the null_fixture which has all the nans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nulls_fixture was a good solution. Implemented.

def test_replace_numpy_nan(self, to_replace, value):
# GH#45725 ensure numpy.nan can be replaced with pandas.NA or None
ser = pd.Series([to_replace], dtype=object)
result = ser.replace({to_replace: value})
expected = pd.Series([value], dtype=object)
tm.assert_series_equal(result, expected)
assert result.dtype == object

def test_replace_noop_doesnt_downcast(self):
# GH#44498
ser = pd.Series([None, None, pd.Timestamp("2021-12-16 17:31")], dtype=object)
Expand Down