Skip to content

BUG: Dtype was not changed when float was assignt to int column #37680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`)
- Bug in :meth:`DataFrame.at` and :meth:`Series.at` did not adjust dtype when float was assigned to integer column (:issue:`26395`, :issue:`20643`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not adjust -> not adjusting


Missing
^^^^^^^
Expand Down
10 changes: 2 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@
maybe_downcast_to_dtype,
maybe_infer_to_datetimelike,
maybe_upcast,
validate_numeric_casting,
)
from pandas.core.dtypes.common import (
ensure_int64,
Expand Down Expand Up @@ -3206,13 +3205,8 @@ def _set_value(self, index, col, value, takeable: bool = False):
return

series = self._get_item_cache(col)
engine = self.index._engine
loc = engine.get_loc(index)
validate_numeric_casting(series.dtype, value)

series._values[loc] = value
# Note: trying to use series._set_value breaks tests in
# tests.frame.indexing.test_indexing and tests.indexing.test_partial
self.index._engine.get_loc(index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you discarding loc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If get_loc raises a KeyError we have to catch this here insted of in series._set_value but we do not need loc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother calling get_loc here at all if its going to be called again in series._set_value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

can you elaborate on this? is it bc the except clause here does a takeable check whereas the one in _set_value does not? if so, could just add that check there?

why is the deleted comment about "tests.frame.indexing.test_indexing and tests.indexing.test_partial" no longer relevant?

series._set_value(index, value, takeable)
except (KeyError, TypeError):
# set using a non-recursive method & reset the cache
if takeable:
Expand Down
23 changes: 21 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@

from pandas.core.dtypes.cast import (
convert_dtypes,
infer_dtype_from_scalar,
maybe_cast_to_extension_array,
validate_numeric_casting,
)
Expand All @@ -50,10 +51,12 @@
is_bool,
is_categorical_dtype,
is_dict_like,
is_dtype_equal,
is_extension_array_dtype,
is_integer,
is_iterator,
is_list_like,
is_numeric_dtype,
is_object_dtype,
is_scalar,
validate_all_hashable,
Expand Down Expand Up @@ -1043,7 +1046,15 @@ def _set_with_engine(self, key, value):
# fails with AttributeError for IntervalIndex
loc = self.index._engine.get_loc(key)
validate_numeric_casting(self.dtype, value)
self._values[loc] = value
dtype, _ = infer_dtype_from_scalar(value)
if is_dtype_equal(self.dtype, dtype):
self._values[loc] = value
else:
# This only raises when index contains tuples
try:
self.loc[key] = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very strange change, why is it needed?

Copy link
Member Author

@phofl phofl Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is pretty ugly, but a loc limitation unfortunately. If the index is a regular index where values are tuples, loc does not work.

Example:

obj = Series([1, 2], index=[(0,1), (1,2)])
obj[(0,1)] = "test"

This runs in the loc case and loc raises

KeyError: "None of [Int64Index([0, 1], dtype='int64')] are in the [index]"

Tuples are interpreted as components of MultiIndex

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_set_with_engine is a fastpath, really shouldn't be going through self.loc

the whole infer_dtype_from_scalar stuff seems like it should be handled as part of validate_numeric_casting

the issue this PR is addressing is only for when we are indexing an entire column, right?

except KeyError:
self._values[loc] = value

def _set_with(self, key, value):
# other: fancy integer or otherwise
Expand Down Expand Up @@ -1105,8 +1116,16 @@ def _set_value(self, label, value, takeable: bool = False):
takeable : interpret the index as indexers, default False
"""
try:
if takeable:
dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
if takeable and is_dtype_equal(self.dtype, dtype):
self._values[label] = value
elif takeable:
self.iloc[label] = value
elif not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype):
loc = self.index.get_loc(label)
validate_numeric_casting(self.dtype, value)
self.loc[label] = value
return
else:
loc = self.index.get_loc(label)
validate_numeric_casting(self.dtype, value)
Expand Down
30 changes: 30 additions & 0 deletions pandas/tests/indexing/test_at.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,33 @@ def test_at_getitem_mixed_index_no_fallback(self):
ser.at[0]
with pytest.raises(KeyError, match="^4$"):
ser.at[4]


@pytest.mark.parametrize("func", ["at", "loc"])
def test_at_assign_float_to_int_frame(func):
# GH: 26395
obj = DataFrame([0, 0, 0], index=["A", "B", "C"], columns=["D"])
getattr(obj, func)["C", "D"] = 44.5
expected = DataFrame([0, 0, 44.5], index=["A", "B", "C"], columns=["D"])
tm.assert_frame_equal(obj, expected)


@pytest.mark.parametrize("func", ["at", "loc"])
def test_at_assign_float_to_int_series(func):
# GH: 26395
obj = Series([0, 0, 0], index=["A", "B", "C"])
getattr(obj, func)["C"] = 44.5
expected = Series([0, 0, 44.5], index=["A", "B", "C"])
tm.assert_series_equal(obj, expected)


def test_assign_float_to_int_series_takeable():
# GH: 20643
ser = Series([0, 1, 2], index=list("abc"))
ser.iat[1] = 3.1
expected = Series([0, 3.1, 2], index=list("abc"))
tm.assert_series_equal(ser, expected)

ser = Series([0, 1, 2], index=list("abc"))
ser.at["b"] = 3.1
tm.assert_series_equal(ser, expected)
42 changes: 6 additions & 36 deletions pandas/tests/indexing/test_coercion.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,31 +114,17 @@ def test_setitem_series_int64(self, val, exp_dtype, request):
obj = pd.Series([1, 2, 3, 4])
assert obj.dtype == np.int64

if exp_dtype is np.float64:
exp = pd.Series([1, 1, 3, 4])
self._assert_setitem_series_conversion(obj, 1.1, exp, np.int64)
mark = pytest.mark.xfail(reason="GH12747 The result must be float")
request.node.add_marker(mark)

exp = pd.Series([1, val, 3, 4])
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

@pytest.mark.parametrize(
"val,exp_dtype", [(np.int32(1), np.int8), (np.int16(2 ** 9), np.int16)]
"val,exp_dtype", [(np.int32(1), np.int32), (np.int16(2 ** 9), np.int16)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was downcasting before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

)
def test_setitem_series_int8(self, val, exp_dtype, request):
obj = pd.Series([1, 2, 3, 4], dtype=np.int8)
assert obj.dtype == np.int8

if exp_dtype is np.int16:
exp = pd.Series([1, 0, 3, 4], dtype=np.int8)
self._assert_setitem_series_conversion(obj, val, exp, np.int8)
mark = pytest.mark.xfail(
reason="BUG: it must be pd.Series([1, 1, 3, 4], dtype=np.int16"
)
request.node.add_marker(mark)

exp = pd.Series([1, val, 3, 4], dtype=np.int8)
exp = pd.Series([1, val, 3, 4], dtype=exp_dtype)
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

@pytest.mark.parametrize(
Expand Down Expand Up @@ -171,33 +157,17 @@ def test_setitem_series_complex128(self, val, exp_dtype):
@pytest.mark.parametrize(
"val,exp_dtype",
[
(1, np.int64),
(3, np.int64),
(1.1, np.float64),
(1 + 1j, np.complex128),
(1, "object"),
(3, "object"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these were doing an inferred cast before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy inferred cast, yes

(1.1, "object"),
(1 + 1j, "object"),
(True, np.bool_),
],
)
def test_setitem_series_bool(self, val, exp_dtype, request):
obj = pd.Series([True, False, True, False])
assert obj.dtype == np.bool_

mark = None
if exp_dtype is np.int64:
exp = pd.Series([True, True, True, False])
self._assert_setitem_series_conversion(obj, val, exp, np.bool_)
mark = pytest.mark.xfail(reason="TODO_GH12747 The result must be int")
elif exp_dtype is np.float64:
exp = pd.Series([True, True, True, False])
self._assert_setitem_series_conversion(obj, val, exp, np.bool_)
mark = pytest.mark.xfail(reason="TODO_GH12747 The result must be float")
elif exp_dtype is np.complex128:
exp = pd.Series([True, True, True, False])
self._assert_setitem_series_conversion(obj, val, exp, np.bool_)
mark = pytest.mark.xfail(reason="TODO_GH12747 The result must be complex")
if mark is not None:
request.node.add_marker(mark)

exp = pd.Series([True, val, True, False])
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

Expand Down
1 change: 0 additions & 1 deletion pandas/tests/indexing/test_partial.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ def test_partial_setting(self):
with pytest.raises(IndexError, match=msg):
s.iloc[3] = 5.0

msg = "index 3 is out of bounds for axis 0 with size 3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the message now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messages for iat and iloc are now the same, hence the deletion of the redefinition

with pytest.raises(IndexError, match=msg):
s.iat[3] = 5.0

Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/series/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,11 @@ def test_setitem_slice_into_readonly_backing_data():
series[1:3] = 1

assert not array.any()


def test_setitem_float_to_int():
# GH 20643
ser = Series([0, 1, 2], index=list("abc"))
ser["b"] = 3.1
expected = Series([0, 3.1, 2], index=list("abc"))
tm.assert_series_equal(ser, expected)