Skip to content

REGR: Fix regression RecursionError when replacing numeric scalar with None #48234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 31, 2022
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ Fixed regressions
- Fixed Regression in :meth:`DataFrame.loc` when setting values as a :class:`DataFrame` with all ``True`` indexer (:issue:`48701`)
- Regression in :func:`.read_csv` causing an ``EmptyDataError`` when using an UTF-8 file handle that was already read from (:issue:`48646`)
- Regression in :func:`to_datetime` when ``utc=True`` and ``arg`` contained timezone naive and aware arguments raised a ``ValueError`` (:issue:`48678`)
- Fixed regression in :meth:`Series.replace` raising ``RecursionError`` with numeric dtype and when specifying ``value=None`` (:issue:`45725`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we should move to 1.5.2 now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Fixed regression in :meth:`DataFrame.loc` raising ``FutureWarning`` when setting an empty :class:`DataFrame` (:issue:`48480`)
- Fixed regression in :meth:`DataFrame.describe` raising ``TypeError`` when result contains ``NA`` (:issue:`48778`)
- Fixed regression in :meth:`DataFrame.plot` ignoring invalid ``colormap`` for ``kind="scatter"`` (:issue:`48726`)
Expand Down
1 change: 1 addition & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,7 @@ def replace(
# Note: the checks we do in NDFrame.replace ensure we never get
# here with listlike to_replace or value, as those cases
# go through replace_list
value = self._standardize_fill_value(value)

values = self.values

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1496,6 +1496,14 @@ def test_replace_list_with_mixed_type(
result = obj.replace(box(to_replace), value)
tm.assert_equal(result, expected)

@pytest.mark.parametrize("val", [2, np.nan, 2.0])
def test_replace_value_none_dtype_numeric(self, val):
# GH#48231
df = DataFrame({"a": [1, val]})
result = df.replace(val, None)
expected = DataFrame({"a": [1, np.nan]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just confirming this result shouldn't be this?

In [1]: pd.Series([1.0, np.nan]).replace(np.nan, value=None)
Out[1]:
0     1.0
1    None
dtype: object

In [2]: pd.__version__
Out[2]: '1.4.4'

Copy link
Member Author

@phofl phofl Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is really what we want...

Wanted to check if ci passes like this and go from there

Especially since this does not handle the dict case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. IMO if we go with the current result here, I think we should document that the original df/ser needs to be object type for None to appear.

Anecdotally, I've definitely seen replace(np.nan, None) or replace(np.nan, "") and people being surprised that np.nan remains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mroeschke This would restore the 1.4.x behavior for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we had special logic for the case where the user specifically wants to replace one NA-like value with another (in this case np.nan->None). The non-special logic would do something like:

def replace(self, old, new):
    if is_valid_na_for_dtype(new, self.dtype):
        new = self. _standardize_fill_value(new)

which for float64 dtype would take the None back to np.nan, consistent with what we do with every other setitem-like method. But in this case it is what the user specifically doesn't want.

Not sure if this is helpful, but there it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this would make more sense imo too, but would break previous behavior and fail a couple of tests too

tm.assert_frame_equal(result, expected)


class TestDataFrameReplaceRegex:
@pytest.mark.parametrize(
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -667,3 +667,11 @@ def test_replace_different_int_types(self, any_int_numpy_dtype):
result = labs.replace(map_dict)
expected = labs.replace({0: 0, 2: 1, 1: 2})
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("val", [2, np.nan, 2.0])
def test_replace_value_none_dtype_numeric(self, val):
# GH#48231
ser = pd.Series([1, val])
result = ser.replace(val, None)
expected = pd.Series([1, np.nan])
tm.assert_series_equal(result, expected)