Skip to content

BUG: Dtype was not changed when float was assignt to int column #37680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,7 @@ Indexing
- Bug in :meth:`Index.where` incorrectly casting numeric values to strings (:issue:`37591`)
- Bug in :meth:`Series.loc` and :meth:`DataFrame.loc` raises when numeric label was given for object :class:`Index` although label was in :class:`Index` (:issue:`26491`)
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in :meth:`DataFrame.at` and :meth:`Series.at` did not adjust dtype when float was assigned to integer column (:issue:`26395`, :issue:`20643`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not adjust -> not adjusting


Missing
^^^^^^^
Expand Down
10 changes: 2 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@
maybe_downcast_to_dtype,
maybe_infer_to_datetimelike,
maybe_upcast,
validate_numeric_casting,
)
from pandas.core.dtypes.common import (
ensure_int64,
Expand Down Expand Up @@ -3207,13 +3206,8 @@ def _set_value(self, index, col, value, takeable: bool = False):
return

series = self._get_item_cache(col)
engine = self.index._engine
loc = engine.get_loc(index)
validate_numeric_casting(series.dtype, value)

series._values[loc] = value
# Note: trying to use series._set_value breaks tests in
# tests.frame.indexing.test_indexing and tests.indexing.test_partial
self.index._engine.get_loc(index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you discarding loc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If get_loc raises a KeyError we have to catch this here insted of in series._set_value but we do not need loc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother calling get_loc here at all if its going to be called again in series._set_value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

can you elaborate on this? is it bc the except clause here does a takeable check whereas the one in _set_value does not? if so, could just add that check there?

why is the deleted comment about "tests.frame.indexing.test_indexing and tests.indexing.test_partial" no longer relevant?

series._set_value(index, value, takeable)
except (KeyError, TypeError):
# set using a non-recursive method & reset the cache
if takeable:
Expand Down
17 changes: 15 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@

from pandas.core.dtypes.cast import (
convert_dtypes,
infer_dtype_from_scalar,
maybe_cast_to_extension_array,
validate_numeric_casting,
)
Expand All @@ -50,10 +51,12 @@
is_bool,
is_categorical_dtype,
is_dict_like,
is_dtype_equal,
is_extension_array_dtype,
is_integer,
is_iterator,
is_list_like,
is_numeric_dtype,
is_object_dtype,
is_scalar,
validate_all_hashable,
Expand Down Expand Up @@ -1043,7 +1046,11 @@ def _set_with_engine(self, key, value):
# fails with AttributeError for IntervalIndex
loc = self.index._engine.get_loc(key)
validate_numeric_casting(self.dtype, value)
self._values[loc] = value
dtype, _ = infer_dtype_from_scalar(value)
if is_dtype_equal(self.dtype, dtype) or isna(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want is_valid_nat_for_dtype instead of isna

self._values[loc] = value
else:
self.loc[key] = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt validate_numeric_casting just raise here?


def _set_with(self, key, value):
# other: fancy integer or otherwise
Expand Down Expand Up @@ -1105,11 +1112,17 @@ def _set_value(self, label, value, takeable: bool = False):
takeable : interpret the index as indexers, default False
"""
try:
if takeable:
dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
if takeable and is_dtype_equal(self.dtype, dtype):
self._values[label] = value
elif takeable:
self.iloc[label] = value
else:
loc = self.index.get_loc(label)
validate_numeric_casting(self.dtype, value)
if not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to check if is numeric here? isn't just not dtype equal enough?

Copy link
Member Author

@phofl phofl Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test calling this function where scalar is a string value, which is expected to raise an ValueError, hence this check there. If the test is somehow erroneous, we could delete it.

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=47080&view=logs&j=ba13898e-1dfb-5ace-9966-8b7af3677790
The failing test is in this build

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you integrate this with the higher level if/else, these nested if/else are harder to read, even if it means duplicating some code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.loc[label] = value
return
self._values[loc] = value
except KeyError:

Expand Down
30 changes: 30 additions & 0 deletions pandas/tests/indexing/test_at.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,33 @@ def test_at_getitem_mixed_index_no_fallback(self):
ser.at[0]
with pytest.raises(KeyError, match="^4$"):
ser.at[4]


@pytest.mark.parametrize("func", ["at", "loc"])
def test_at_assign_float_to_int_frame(func):
# GH: 26395
obj = DataFrame([0, 0, 0], index=["A", "B", "C"], columns=["D"])
getattr(obj, func)["C", "D"] = 44.5
expected = DataFrame([0, 0, 44.5], index=["A", "B", "C"], columns=["D"])
tm.assert_frame_equal(obj, expected)


@pytest.mark.parametrize("func", ["at", "loc"])
def test_at_assign_float_to_int_series(func):
# GH: 26395
obj = Series([0, 0, 0], index=["A", "B", "C"])
getattr(obj, func)["C"] = 44.5
expected = Series([0, 0, 44.5], index=["A", "B", "C"])
tm.assert_series_equal(obj, expected)


def test_assign_float_to_int_series_takeable():
# GH: 20643
ser = Series([0, 1, 2], index=list("abc"))
ser.iat[1] = 3.1
expected = Series([0, 3.1, 2], index=list("abc"))
tm.assert_series_equal(ser, expected)

ser = Series([0, 1, 2], index=list("abc"))
ser.at["b"] = 3.1
tm.assert_series_equal(ser, expected)
42 changes: 6 additions & 36 deletions pandas/tests/indexing/test_coercion.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,31 +114,17 @@ def test_setitem_series_int64(self, val, exp_dtype, request):
obj = pd.Series([1, 2, 3, 4])
assert obj.dtype == np.int64

if exp_dtype is np.float64:
exp = pd.Series([1, 1, 3, 4])
self._assert_setitem_series_conversion(obj, 1.1, exp, np.int64)
mark = pytest.mark.xfail(reason="GH12747 The result must be float")
request.node.add_marker(mark)

exp = pd.Series([1, val, 3, 4])
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

@pytest.mark.parametrize(
"val,exp_dtype", [(np.int32(1), np.int8), (np.int16(2 ** 9), np.int16)]
"val,exp_dtype", [(np.int32(1), np.int32), (np.int16(2 ** 9), np.int16)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was downcasting before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

)
def test_setitem_series_int8(self, val, exp_dtype, request):
obj = pd.Series([1, 2, 3, 4], dtype=np.int8)
assert obj.dtype == np.int8

if exp_dtype is np.int16:
exp = pd.Series([1, 0, 3, 4], dtype=np.int8)
self._assert_setitem_series_conversion(obj, val, exp, np.int8)
mark = pytest.mark.xfail(
reason="BUG: it must be pd.Series([1, 1, 3, 4], dtype=np.int16"
)
request.node.add_marker(mark)

exp = pd.Series([1, val, 3, 4], dtype=np.int8)
exp = pd.Series([1, val, 3, 4], dtype=exp_dtype)
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

@pytest.mark.parametrize(
Expand Down Expand Up @@ -171,33 +157,17 @@ def test_setitem_series_complex128(self, val, exp_dtype):
@pytest.mark.parametrize(
"val,exp_dtype",
[
(1, np.int64),
(3, np.int64),
(1.1, np.float64),
(1 + 1j, np.complex128),
(1, "object"),
(3, "object"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these were doing an inferred cast before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy inferred cast, yes

(1.1, "object"),
(1 + 1j, "object"),
(True, np.bool_),
],
)
def test_setitem_series_bool(self, val, exp_dtype, request):
obj = pd.Series([True, False, True, False])
assert obj.dtype == np.bool_

mark = None
if exp_dtype is np.int64:
exp = pd.Series([True, True, True, False])
self._assert_setitem_series_conversion(obj, val, exp, np.bool_)
mark = pytest.mark.xfail(reason="TODO_GH12747 The result must be int")
elif exp_dtype is np.float64:
exp = pd.Series([True, True, True, False])
self._assert_setitem_series_conversion(obj, val, exp, np.bool_)
mark = pytest.mark.xfail(reason="TODO_GH12747 The result must be float")
elif exp_dtype is np.complex128:
exp = pd.Series([True, True, True, False])
self._assert_setitem_series_conversion(obj, val, exp, np.bool_)
mark = pytest.mark.xfail(reason="TODO_GH12747 The result must be complex")
if mark is not None:
request.node.add_marker(mark)

exp = pd.Series([True, val, True, False])
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

Expand Down
1 change: 0 additions & 1 deletion pandas/tests/indexing/test_partial.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ def test_partial_setting(self):
with pytest.raises(IndexError, match=msg):
s.iloc[3] = 5.0

msg = "index 3 is out of bounds for axis 0 with size 3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the message now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messages for iat and iloc are now the same, hence the deletion of the redefinition

with pytest.raises(IndexError, match=msg):
s.iat[3] = 5.0

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/series/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,3 +257,11 @@ def test_setitem_slice_into_readonly_backing_data():
series[1:3] = 1

assert not array.any()


def test_setitem_float_to_int():
# GH 20643
ser = Series([0, 1, 2], index=list("abc"))
ser["b"] = 3.1
expected = Series([0, 3.1, 2], index=list("abc"))
tm.assert_series_equal(ser, expected)