TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

phofl · 2022-11-19T15:35:25Z

We will need to tests if you want to construct df like this. The DataFrame in the issue had dtype object.

If you set pd.NA into a column with dtype="int64" it gets cast to object.

Edit: You could parametrize with dtype in [object, "Int64"]. I think it is worth having both tests

Question: Is df = pd.DataFrame({"A": [pd.NA, 1, 2]}, dtype=dtype) a valid way of constructing a DataFrame or should it be avoided? Because when dtype is Int64 or Float64 it works just fine, but if it is None or explicitly object, the replace method raises an error:

dtype = <class 'object'> @pytest.mark.parametrize("dtype", [object, "Int64"]) def test_replace_in_col_containing_na(self, dtype): # GH#47480 df = DataFrame({"A": [pd.NA, 1, 2]}, dtype=dtype) > df["A"].replace(to_replace=1, value=100, inplace=True) tests/frame/methods/test_replace.py:1511: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _core/series.py:5104: in replace return super().replace( core/generic.py:7211: in replace new_data = self._mgr.replace( core/internals/managers.py:464: in replace return self.apply( core/internals/managers.py:350: in apply applied = getattr(b, f)(**kwargs) core/internals/blocks.py:561: in replace mask = missing.mask_missing(values, to_replace) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ arr = array([<NA>, 1, 2], dtype=object), values_to_mask = array(1) def mask_missing(arr: ArrayLike, values_to_mask) -> npt.NDArray[np.bool_]: """ Return a masking array of same size/shape as arr with entries equaling any member of values_to_mask set to True Parameters ---------- arr : ArrayLike values_to_mask: list, tuple, or scalar Returns ------- np.ndarray[bool] """ # When called from Block.replace/replace_list, values_to_mask is a scalar # known to be holdable by arr. # When called from Series._single_replace, values_to_mask is tuple or list dtype, values_to_mask = infer_dtype_from(values_to_mask) # error: Argument "dtype" to "array" has incompatible type "Union[dtype[Any], # ExtensionDtype]"; expected "Union[dtype[Any], None, type, _SupportsDType, str, # Union[Tuple[Any, int], Tuple[Any, Union[int, Sequence[int]]], List[Any], # _DTypeDict, Tuple[Any, Any]]]" values_to_mask = np.array(values_to_mask, dtype=dtype) # type: ignore[arg-type] na_mask = isna(values_to_mask) nonna = values_to_mask[~na_mask] # GH 21977 mask = np.zeros(arr.shape, dtype=bool) for x in nonna: if is_numeric_v_string_like(arr, x): # GH#29553 prevent numpy deprecation warnings pass else: new_mask = arr == x if not isinstance(new_mask, np.ndarray): # usually BooleanArray > new_mask = new_mask.to_numpy(dtype=bool, na_value=False) E AttributeError: 'bool' object has no attribute 'to_numpy'

We will need to tests if you want to construct df like this. The DataFrame in the issue had dtype object.

If you set pd.NA into a column with dtype="int64" it gets cast to object.

Trying here it looks like it is casted to Float64. Is that the expected behavior?

>>> df = pd.DataFrame({'A': [0, 1, 2]}) >>> df['A'].dtypes dtype('int64') >>> df.at[0, 'A'] = pd.NA >>> df['A'].dtypes dtype('float64')

The tests passes when parametrizing with dtype in ["Float64", "Int64"].

Interesting, I would have expected that this is cast to object, will have a closer look tomorrow

-Original file line number
+Diff line change
@@ Expand Up / @@ -1503,6 +1503,13 @@ def test_replace_value_none_dtype_numeric(self, val): @@
             result = df.replace({val: None})
             tm.assert_frame_equal(result, expected)
+        def test_replace_in_col_containing_na(self):
+            # GH#47480
+            df = DataFrame({"A": [pd.NA, 1, 2]}, dtype="Int64")
+            df["A"].replace(to_replace=1, value=100, inplace=True)
+            expected = DataFrame({"A": [pd.NA, 100, 2]}, dtype="Int64")
+            tm.assert_frame_equal(df, expected)
     class TestDataFrameReplaceRegex:
         @pytest.mark.parametrize(
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

Diff view

Diff view

There are no files selected for viewing

phofl Nov 19, 2022 •

edited

Loading

vsbits Nov 19, 2022

vsbits Nov 19, 2022

phofl Nov 19, 2022

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

TST: Test for Dataframe.replace when column contains pd.NA (#47480) #49783

Diff view

Diff view

There are no files selected for viewing

phofl Nov 19, 2022 • edited Loading

Choose a reason for hiding this comment

vsbits Nov 19, 2022

Choose a reason for hiding this comment

vsbits Nov 19, 2022

Choose a reason for hiding this comment

phofl Nov 19, 2022

Choose a reason for hiding this comment

phofl Nov 19, 2022 •

edited

Loading