Skip to content

BUG: Dtype was not changed when float was assignt to int column #37680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,7 @@ Indexing
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`MultiIndex` with a level named "0" (:issue:`37194`)
- Bug in :meth:`Series.__getitem__` when using an unsigned integer array as an indexer giving incorrect results or segfaulting instead of raising ``KeyError`` (:issue:`37218`)
- Bug in :meth:`Index.where` incorrectly casting numeric values to strings (:issue:`37591`)
- Bug in :meth:`DataFrame.at` and :meth:`Series.at` did not adjust dtype when float was assigned to integer column (:issue:`26395`)

Missing
^^^^^^^
Expand Down
10 changes: 2 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@
maybe_downcast_to_dtype,
maybe_infer_to_datetimelike,
maybe_upcast,
validate_numeric_casting,
)
from pandas.core.dtypes.common import (
ensure_int64,
Expand Down Expand Up @@ -3202,13 +3201,8 @@ def _set_value(self, index, col, value, takeable: bool = False):
return

series = self._get_item_cache(col)
engine = self.index._engine
loc = engine.get_loc(index)
validate_numeric_casting(series.dtype, value)

series._values[loc] = value
# Note: trying to use series._set_value breaks tests in
# tests.frame.indexing.test_indexing and tests.indexing.test_partial
self.index._engine.get_loc(index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you discarding loc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If get_loc raises a KeyError we have to catch this here insted of in series._set_value but we do not need loc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother calling get_loc here at all if its going to be called again in series._set_value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

can you elaborate on this? is it bc the except clause here does a takeable check whereas the one in _set_value does not? if so, could just add that check there?

why is the deleted comment about "tests.frame.indexing.test_indexing and tests.indexing.test_partial" no longer relevant?

series._set_value(index, value, takeable)
except (KeyError, TypeError):
# set using a non-recursive method & reset the cache
if takeable:
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@

from pandas.core.dtypes.cast import (
convert_dtypes,
infer_dtype_from_scalar,
maybe_cast_to_extension_array,
validate_numeric_casting,
)
Expand All @@ -50,10 +51,12 @@
is_bool,
is_categorical_dtype,
is_dict_like,
is_dtype_equal,
is_extension_array_dtype,
is_integer,
is_iterator,
is_list_like,
is_numeric_dtype,
is_object_dtype,
is_scalar,
validate_all_hashable,
Expand Down Expand Up @@ -1110,6 +1113,10 @@ def _set_value(self, label, value, takeable: bool = False):
else:
loc = self.index.get_loc(label)
validate_numeric_casting(self.dtype, value)
dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
if not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to check if is numeric here? isn't just not dtype equal enough?

Copy link
Member Author

@phofl phofl Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test calling this function where scalar is a string value, which is expected to raise an ValueError, hence this check there. If the test is somehow erroneous, we could delete it.

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=47080&view=logs&j=ba13898e-1dfb-5ace-9966-8b7af3677790
The failing test is in this build

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you integrate this with the higher level if/else, these nested if/else are harder to read, even if it means duplicating some code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.loc[label] = value
return
self._values[loc] = value
except KeyError:

Expand Down
18 changes: 17 additions & 1 deletion pandas/tests/indexing/test_at.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import numpy as np
import pytest

from pandas import DataFrame
from pandas import DataFrame, Series
import pandas._testing as tm


Expand Down Expand Up @@ -39,3 +39,19 @@ def test_at_with_duplicate_axes_requires_scalar_lookup(self):
df.at[1, ["A"]] = 1
with pytest.raises(ValueError, match=msg):
df.at[:, "A"] = 1


def test_at_assign_float_to_int_frame():
Copy link
Contributor

@jreback jreback Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize this on .loc for the same examples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also testing iat/iloc for the same (obviously using the position). if its becomes messy then can do that in a followup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we would want to test this for other dtypes as well. so that is for sure a followon, pls open an issue for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this said, we might have some tests for this already (likely we do), prob have to hunt a bit for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parametrized the tests for loc, iloc and iat would be an ugly parametrization. I will open an issue for dtypes and iloc/iat

# GH: 26395
obj = DataFrame([0, 0, 0], index=["A", "B", "C"], columns=["D"])
obj.at["C", "D"] = 44.5
expected = DataFrame([0, 0, 44.5], index=["A", "B", "C"], columns=["D"])
tm.assert_frame_equal(obj, expected)


def test_at_assign_float_to_int_series():
# GH: 26395
obj = Series([0, 0, 0], index=["A", "B", "C"])
obj.at["C"] = 44.5
expected = Series([0, 0, 44.5], index=["A", "B", "C"])
tm.assert_series_equal(obj, expected)