-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Dtype was not changed when float was assignt to int column #37680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Had to check for numeric dtype, we could do this in |
pandas/core/series.py
Outdated
@@ -1110,6 +1113,10 @@ def _set_value(self, label, value, takeable: bool = False): | |||
else: | |||
loc = self.index.get_loc(label) | |||
validate_numeric_casting(self.dtype, value) | |||
dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True) | |||
if not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to check if is numeric here? isn't just not dtype equal enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a test calling this function where scalar is a string value, which is expected to raise an ValueError, hence this check there. If the test is somehow erroneous, we could delete it.
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=47080&view=logs&j=ba13898e-1dfb-5ace-9966-8b7af3677790
The failing test is in this build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_set_value_resize
in <pandas.tests.frame.indexing.test_set_value
.
Raising lines are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/indexing/test_at.py
Outdated
@@ -39,3 +39,19 @@ def test_at_with_duplicate_axes_requires_scalar_lookup(self): | |||
df.at[1, ["A"]] = 1 | |||
with pytest.raises(ValueError, match=msg): | |||
df.at[:, "A"] = 1 | |||
|
|||
|
|||
def test_at_assign_float_to_int_frame(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this on .loc for the same examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also testing iat/iloc for the same (obviously using the position). if its becomes messy then can do that in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we would want to test this for other dtypes as well. so that is for sure a followon, pls open an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all this said, we might have some tests for this already (likely we do), prob have to hunt a bit for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parametrized the tests for loc, iloc and iat would be an ugly parametrization. I will open an issue for dtypes and iloc/iat
� Conflicts: � pandas/tests/indexing/test_at.py
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
� Conflicts: � pandas/tests/indexing/test_at.py
Found #20643 which was not fixed completly. Adjusted the code accordingly to fix this too. |
cc @jreback Two open points here:
Fixed a few tests, which were xfailing previously. |
series._values[loc] = value | ||
# Note: trying to use series._set_value breaks tests in | ||
# tests.frame.indexing.test_indexing and tests.indexing.test_partial | ||
self.index._engine.get_loc(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you discarding loc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If get_loc raises a KeyError
we have to catch this here insted of in series._set_value
but we do not need loc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother calling get_loc here at all if its going to be called again in series._set_value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value
can you elaborate on this? is it bc the except clause here does a takeable check whereas the one in _set_value does not? if so, could just add that check there?
why is the deleted comment about "tests.frame.indexing.test_indexing and tests.indexing.test_partial" no longer relevant?
pandas/core/series.py
Outdated
if is_dtype_equal(self.dtype, dtype) or isna(value): | ||
self._values[loc] = value | ||
else: | ||
self.loc[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt validate_numeric_casting just raise here?
pandas/core/series.py
Outdated
@@ -1043,7 +1046,11 @@ def _set_with_engine(self, key, value): | |||
# fails with AttributeError for IntervalIndex | |||
loc = self.index._engine.get_loc(key) | |||
validate_numeric_casting(self.dtype, value) | |||
self._values[loc] = value | |||
dtype, _ = infer_dtype_from_scalar(value) | |||
if is_dtype_equal(self.dtype, dtype) or isna(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want is_valid_nat_for_dtype instead of isna
My suggestion is to start with adding in validate_numeric_casting:
That won't fix all of this, but i think is the right place to collect the relevant logic. |
The issue was actually different. Did not recognize this a few days ago. This is only a problem, if the Index contains tuples as values, loc does not support this. So we get a |
yep agreed |
Ran indexing asvs
If I understood you correctly, you would have expected the opposite? |
exp = pd.Series([1, val, 3, 4]) | ||
self._assert_setitem_series_conversion(obj, val, exp, exp_dtype) | ||
|
||
@pytest.mark.parametrize( | ||
"val,exp_dtype", [(np.int32(1), np.int8), (np.int16(2 ** 9), np.int16)] | ||
"val,exp_dtype", [(np.int32(1), np.int32), (np.int16(2 ** 9), np.int16)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was downcasting before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
(1.1, np.float64), | ||
(1 + 1j, np.complex128), | ||
(1, "object"), | ||
(3, "object"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these were doing an inferred cast before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy inferred cast, yes
@@ -47,7 +47,6 @@ def test_partial_setting(self): | |||
with pytest.raises(IndexError, match=msg): | |||
s.iloc[3] = 5.0 | |||
|
|||
msg = "index 3 is out of bounds for axis 0 with size 3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the message now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Messages for iat
and iloc
are now the same, hence the deletion of the redefinition
pandas/core/series.py
Outdated
else: | ||
loc = self.index.get_loc(label) | ||
validate_numeric_casting(self.dtype, value) | ||
if not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you integrate this with the higher level if/else, these nested if/else are harder to read, even if it means duplicating some code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
else: | ||
# This only raises when index contains tuples | ||
try: | ||
self.loc[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very strange change, why is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is pretty ugly, but a loc
limitation unfortunately. If the index is a regular index where values are tuples, loc does not work.
Example:
obj = Series([1, 2], index=[(0,1), (1,2)])
obj[(0,1)] = "test"
This runs in the loc case and loc raises
KeyError: "None of [Int64Index([0, 1], dtype='int64')] are in the [index]"
Tuples are interpreted as components of MultiIndex
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -584,6 +584,7 @@ Indexing | |||
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`) | |||
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`) | |||
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`) | |||
- Bug in :meth:`DataFrame.at` and :meth:`Series.at` did not adjust dtype when float was assigned to integer column (:issue:`26395`, :issue:`20643`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not adjust -> not adjusting
else: | ||
# This only raises when index contains tuples | ||
try: | ||
self.loc[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_set_with_engine is a fastpath, really shouldn't be going through self.loc
the whole infer_dtype_from_scalar stuff seems like it should be handled as part of validate_numeric_casting
the issue this PR is addressing is only for when we are indexing an entire column, right?
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@phofl can you merge master (even if this is on the back-burner, lets prevent it from drifting too far) |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/core/frame.py
Yeah have to get back to this... |
@phofl what's the status here? |
@phofl closing as stale. reopen when ready. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
We have to keep the get_loc statement to check, if the series._set_value function does the trick. I added the dtype conversion there, to avoid implementing the same thing twice