Skip to content

REGR: fillna on f32 column raising for f64 #43455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 9, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v1.3.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Fixed regressions
- Fixed regression in :func:`is_list_like` where objects with ``__iter__`` set to ``None`` would be identified as iterable (:issue:`43373`)
- Fixed regression in :meth:`.Resampler.aggregate` when used after column selection would raise if ``func`` is a list of aggregation functions (:issue:`42905`)
- Fixed regression in :meth:`DataFrame.corr` where Kendall correlation would produce incorrect results for columns with repeated values (:issue:`43401`)
- Fixed regression in :meth:`Series.fillna` raising ``TypeError`` when filling ``float32`` ``Series`` with list-like fill value having a larger dtype (like ``float64``) (:issue:`43424`)
-

.. ---------------------------------------------------------------------------

Expand All @@ -43,6 +45,7 @@ Performance improvements
Bug fixes
~~~~~~~~~
- Bug in :meth:`.DataFrameGroupBy.agg` and :meth:`.DataFrameGroupBy.transform` with ``engine="numba"`` where ``index`` data was not being correctly passed into ``func`` (:issue:`43133`)
- Bug in setting a ``float32`` :class:`Series` with a larger dtype (like ``float64``) not upcasting (:issue:`43424`)
-

.. ---------------------------------------------------------------------------
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/array_algos/putmask.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,9 @@ def putmask_smart(values: np.ndarray, mask: np.ndarray, new) -> np.ndarray:

new = np.asarray(new)

if values.dtype.kind == new.dtype.kind:
values_kind = values.dtype.kind
new_kind = new.dtype.kind
if values_kind == new_kind or (values_kind == "f" and new_kind in ["i", "u"]):
# preserves dtype if possible
return _putmask_preserve(values, new, mask)

Expand Down
3 changes: 2 additions & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -2208,10 +2208,11 @@ def can_hold_element(arr: ArrayLike, element: Any) -> bool:

elif dtype.kind == "f":
if tipo is not None:
# TODO: itemsize check?
if tipo.kind not in ["f", "i", "u"]:
# Anything other than float/integer we cannot hold
return False
elif dtype.itemsize < tipo.itemsize:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the itemsize only be compared if dtype.kind and tipo.kind are the same? i.e. tipo.kind is also 'f'. How is the itemsize of an int or uint relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. I think it ultimately comes down to -> if can_hold_element holds True, then using np.putmask is assumed to be safe. So itemsize of int or uint matters because numpy allows safe casting of i16 to f32 for example, but not i64 to f32.

But I think the itemsize check is not sufficient here, because numpy won't putmask int32 infloat32. Maybe np.can_cast should be used instead?

return False
elif not isinstance(tipo, np.dtype):
# i.e. nullable IntegerDtype or FloatingDtype;
# we can put this into an ndarray losslessly iff it has no NAs
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,24 @@ def test_setitem_dtype_upcast3(self):
assert is_float_dtype(left["foo"])
assert is_float_dtype(left["baz"])

@pytest.mark.parametrize("dtype", [np.int64, np.uint64, np.float64])
def test_setitem_dtype_float_listlike_upcasts(self, dtype, indexer_sli):
# GH-43424
result = Series([1.1, 1.1], dtype=np.float32)
indexer_sli(result)[[0]] = np.array([1], dtype=dtype)
expected = Series([1.0, 1.1], dtype=np.float64)
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"dtype,new_dtype", [(np.int8, np.int64), (np.uint16, np.uint32)]
)
def test_setitem_dtype_int_listlike_upcasts(self, dtype, new_dtype, indexer_sli):
# GH-43424
result = Series([1, 1], dtype=dtype)
indexer_sli(result)[[0]] = np.array([2], dtype=new_dtype)
expected = Series([2, 1], dtype=new_dtype)
tm.assert_series_equal(result, expected)

def test_dups_fancy_indexing(self):

# GH 3455
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/series/methods/test_fillna.py
Original file line number Diff line number Diff line change
Expand Up @@ -702,6 +702,22 @@ def test_fillna_categorical_raises(self):
with pytest.raises(TypeError, match=msg):
ser.fillna(DataFrame({1: ["a"], 3: ["b"]}))

@pytest.mark.parametrize("fill_type", [np.float64, np.uint64, np.int64])
def test_fillna_f32_upcast(self, fill_type):
# GH-43424
ser = Series([np.nan, 1.2], dtype=np.float32)
fill_values = Series([2, 2], dtype=fill_type)
result = ser.fillna(fill_values)
expected = Series([2.0, 1.2], dtype="float64")
tm.assert_series_equal(result, expected)

def test_fillna_f32_upcast_with_dict(self):
# GH-43424
ser = Series([np.nan, 1.2], dtype=np.float32)
result = ser.fillna({0: 1})
expected = Series([1.0, 1.2])
tm.assert_series_equal(result, expected)

# ---------------------------------------------------------------
# Invalid Usages

Expand Down