TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

vsbits · 2022-11-19T00:14:26Z

closes BUG: DataFrame.replace fails to replace value when column contains pd.NA #47480
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.

phofl · 2022-11-19T10:42:53Z

pandas/tests/frame/methods/test_replace.py

+        df.at[0, "A"] = pd.NA
+        expected = df.copy()
+        df["A"].replace(to_replace=1, value=100, inplace=True)
+        expected.at[1, "A"] = 100


Can you please construct expected explicitly?

No problem. Just pushed the edit.

phofl · 2022-11-19T15:35:25Z

pandas/tests/frame/methods/test_replace.py

@@ -1503,6 +1503,13 @@ def test_replace_value_none_dtype_numeric(self, val):
        result = df.replace({val: None})
        tm.assert_frame_equal(result, expected)

+    def test_replace_in_col_containing_na(self):
+        # GH#47480
+        df = DataFrame({"A": [pd.NA, 1, 2]}, dtype="Int64")


We will need to tests if you want to construct df like this. The DataFrame in the issue had dtype object.

If you set pd.NA into a column with dtype="int64" it gets cast to object.

Edit: You could parametrize with dtype in [object, "Int64"]. I think it is worth having both tests

Question: Is df = pd.DataFrame({"A": [pd.NA, 1, 2]}, dtype=dtype) a valid way of constructing a DataFrame or should it be avoided? Because when dtype is Int64 or Float64 it works just fine, but if it is None or explicitly object, the replace method raises an error:

dtype = <class 'object'> @pytest.mark.parametrize("dtype", [object, "Int64"]) def test_replace_in_col_containing_na(self, dtype): # GH#47480 df = DataFrame({"A": [pd.NA, 1, 2]}, dtype=dtype) > df["A"].replace(to_replace=1, value=100, inplace=True) tests/frame/methods/test_replace.py:1511: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _core/series.py:5104: in replace return super().replace( core/generic.py:7211: in replace new_data = self._mgr.replace( core/internals/managers.py:464: in replace return self.apply( core/internals/managers.py:350: in apply applied = getattr(b, f)(**kwargs) core/internals/blocks.py:561: in replace mask = missing.mask_missing(values, to_replace) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ arr = array([<NA>, 1, 2], dtype=object), values_to_mask = array(1) def mask_missing(arr: ArrayLike, values_to_mask) -> npt.NDArray[np.bool_]: """ Return a masking array of same size/shape as arr with entries equaling any member of values_to_mask set to True Parameters ---------- arr : ArrayLike values_to_mask: list, tuple, or scalar Returns ------- np.ndarray[bool] """ # When called from Block.replace/replace_list, values_to_mask is a scalar # known to be holdable by arr. # When called from Series._single_replace, values_to_mask is tuple or list dtype, values_to_mask = infer_dtype_from(values_to_mask) # error: Argument "dtype" to "array" has incompatible type "Union[dtype[Any], # ExtensionDtype]"; expected "Union[dtype[Any], None, type, _SupportsDType, str, # Union[Tuple[Any, int], Tuple[Any, Union[int, Sequence[int]]], List[Any], # _DTypeDict, Tuple[Any, Any]]]" values_to_mask = np.array(values_to_mask, dtype=dtype) # type: ignore[arg-type] na_mask = isna(values_to_mask) nonna = values_to_mask[~na_mask] # GH 21977 mask = np.zeros(arr.shape, dtype=bool) for x in nonna: if is_numeric_v_string_like(arr, x): # GH#29553 prevent numpy deprecation warnings pass else: new_mask = arr == x if not isinstance(new_mask, np.ndarray): # usually BooleanArray > new_mask = new_mask.to_numpy(dtype=bool, na_value=False) E AttributeError: 'bool' object has no attribute 'to_numpy'

We will need to tests if you want to construct df like this. The DataFrame in the issue had dtype object.

If you set pd.NA into a column with dtype="int64" it gets cast to object.

Trying here it looks like it is casted to Float64. Is that the expected behavior?

>>> df = pd.DataFrame({'A': [0, 1, 2]}) >>> df['A'].dtypes dtype('int64') >>> df.at[0, 'A'] = pd.NA >>> df['A'].dtypes dtype('float64')

The tests passes when parametrizing with dtype in ["Float64", "Int64"].

Interesting, I would have expected that this is cast to object, will have a closer look tomorrow

TST: Test for Dataframe.replace when column contains pd.NA (#47480)

0074cf2

phofl reviewed Nov 19, 2022

View reviewed changes

TST: Test for Dataframe.replace when column contains pd.NA (#47480)

1f6ea85

phofl reviewed Nov 19, 2022

View reviewed changes

vsbits closed this Nov 19, 2022

vsbits deleted the add-test branch November 20, 2022 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

vsbits commented Nov 19, 2022 •

edited

Loading

phofl Nov 19, 2022

vsbits Nov 19, 2022

phofl Nov 19, 2022 •

edited

Loading

vsbits Nov 19, 2022

vsbits Nov 19, 2022

phofl Nov 19, 2022

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

Conversation

vsbits commented Nov 19, 2022 • edited Loading

phofl Nov 19, 2022

Choose a reason for hiding this comment

vsbits Nov 19, 2022

Choose a reason for hiding this comment

phofl Nov 19, 2022 • edited Loading

Choose a reason for hiding this comment

vsbits Nov 19, 2022

Choose a reason for hiding this comment

vsbits Nov 19, 2022

Choose a reason for hiding this comment

phofl Nov 19, 2022

Choose a reason for hiding this comment

vsbits commented Nov 19, 2022 •

edited

Loading

phofl Nov 19, 2022 •

edited

Loading