BUG: Dtype was not changed when float was assignt to int column #37680

phofl · 2020-11-07T00:31:20Z

closes different type conversion results in assignments using DataFrame.loc and DataFrame.at #26395
closes BUG: .at/.iat and Series.__setitem__ do not upcast int to float #20643
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

We have to keep the get_loc statement to check, if the series._set_value function does the trick. I added the dtype conversion there, to avoid implementing the same thing twice

phofl · 2020-11-07T19:41:37Z

Had to check for numeric dtype, we could do this in validate_numeric_casting?

jreback · 2020-11-07T23:59:04Z

pandas/core/series.py

@@ -1110,6 +1113,10 @@ def _set_value(self, label, value, takeable: bool = False):
            else:
                loc = self.index.get_loc(label)
                validate_numeric_casting(self.dtype, value)
+                dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
+                if not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype):


why do you need to check if is numeric here? isn't just not dtype equal enough?

We have a test calling this function where scalar is a string value, which is expected to raise an ValueError, hence this check there. If the test is somehow erroneous, we could delete it.

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=47080&view=logs&j=ba13898e-1dfb-5ace-9966-8b7af3677790
The failing test is in this build

can you show the test

test_set_value_resize in <pandas.tests.frame.indexing.test_set_value.

Raising lines are

https://github.com/pandas-dev/pandas/blob/c5d105b1a32a02cce0e68ff964b6b30e9d682506/pandas/tests/frame/indexing/test_set_value.py#L39:L40

jreback

cc @jbrockmendel

jreback · 2020-11-08T00:00:56Z

pandas/tests/indexing/test_at.py

@@ -39,3 +39,19 @@ def test_at_with_duplicate_axes_requires_scalar_lookup(self):
            df.at[1, ["A"]] = 1
        with pytest.raises(ValueError, match=msg):
            df.at[:, "A"] = 1
+
+
+def test_at_assign_float_to_int_frame():


can you parameterize this on .loc for the same examples

also testing iat/iloc for the same (obviously using the position). if its becomes messy then can do that in a followup.

also we would want to test this for other dtypes as well. so that is for sure a followon, pls open an issue for this.

all this said, we might have some tests for this already (likely we do), prob have to hunt a bit for them.

Parametrized the tests for loc, iloc and iat would be an ugly parametrization. I will open an issue for dtypes and iloc/iat

� Conflicts: � pandas/tests/indexing/test_at.py

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

� Conflicts: � pandas/tests/indexing/test_at.py

phofl · 2020-11-10T21:07:50Z

Found #20643 which was not fixed completly. Adjusted the code accordingly to fix this too.

phofl · 2020-11-10T22:10:49Z

cc @jreback

Two open points here:

If integer/float is assigned to boolean series, which dtype is expected? Currently it is converted to object
Through dispatching to iloc, when it is takeble but dtypes are not equal in iat, we get a different error. Is this a problem?

Fixed a few tests, which were xfailing previously.

jbrockmendel · 2020-11-12T02:58:25Z

pandas/core/frame.py

-            series._values[loc] = value
-            # Note: trying to use series._set_value breaks tests in
-            #  tests.frame.indexing.test_indexing and tests.indexing.test_partial
+            self.index._engine.get_loc(index)


why are you discarding loc?

If get_loc raises a KeyError we have to catch this here insted of in series._set_value but we do not need loc

why bother calling get_loc here at all if its going to be called again in series._set_value?

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

We want to arrive in the except clause here, when index does not exist instead of the except clause in series._set_value

can you elaborate on this? is it bc the except clause here does a takeable check whereas the one in _set_value does not? if so, could just add that check there?

why is the deleted comment about "tests.frame.indexing.test_indexing and tests.indexing.test_partial" no longer relevant?

jbrockmendel · 2020-11-12T03:01:21Z

pandas/core/series.py

+        if is_dtype_equal(self.dtype, dtype) or isna(value):
+            self._values[loc] = value
+        else:
+            self.loc[key] = value


shouldnt validate_numeric_casting just raise here?

jbrockmendel · 2020-11-12T03:01:38Z

pandas/core/series.py

@@ -1043,7 +1046,11 @@ def _set_with_engine(self, key, value):
        # fails with AttributeError for IntervalIndex
        loc = self.index._engine.get_loc(key)
        validate_numeric_casting(self.dtype, value)
-        self._values[loc] = value
+        dtype, _ = infer_dtype_from_scalar(value)
+        if is_dtype_equal(self.dtype, dtype) or isna(value):


probably want is_valid_nat_for_dtype instead of isna

jbrockmendel · 2020-11-12T03:09:57Z

My suggestion is to start with adding in validate_numeric_casting:

    if issubclass(dtype.type, (np.integer, np.bool_)):
        if is_float(value) and np.isnan(value):
            raise ValueError("Cannot assign nan to integer series")
+        if is_float(value) and not value.is_integer():
+            raise ValueError("Cannot assign non-integer float to integer series")

That won't fix all of this, but i think is the right place to collect the relevant logic.

phofl · 2020-11-12T11:44:51Z

The issue was actually different. Did not recognize this a few days ago. This is only a problem, if the Index contains tuples as values, loc does not support this. So we get a KeyError. Changed this now a bit, which should make it clearer.

jbrockmendel · 2020-11-14T03:40:00Z

@phofl can you run the indexing asvs on this. we've optimized this portion of the code pretty hard and im not confident this is the right place to catch these cases

@jreback pls hold off on merging for a couple days so i can hopefully get a handle on this

jreback · 2020-11-14T04:04:10Z

yep agreed

phofl · 2020-11-15T12:57:15Z

Ran indexing asvs

      before           after         ratio
     [50ae0bfc]       [566d7692]
     <master>         <26395>   
+      8.45±0.3ms       10.2±0.4ms     1.20  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
+      54.3±0.8μs         63.9±8μs     1.18  indexing.NonNumericSeriesIndexing.time_getitem_pos_slice('period', 'non_monotonic')
-         105±3μs       95.1±0.3μs     0.91  indexing.DataFrameNumericIndexing.time_iloc_dups
-         256±4ms          223±7ms     0.87  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-         259±9ms          219±6ms     0.85  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-         253±8ms          214±8ms     0.85  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-        261±10ms          217±7ms     0.83  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-         211±1μs          171±2μs     0.81  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
-         188±7ms          152±4ms     0.81  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-        270±10ms          216±7ms     0.80  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-         184±2ms          146±4ms     0.79  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      42.5±0.9μs         26.4±1μs     0.62  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     1.89±0.05ms      1.09±0.04ms     0.58  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     1.62±0.04ms         910±40μs     0.56  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-        310±10μs          168±5μs     0.54  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-         307±8μs          164±6μs     0.54  indexing.NumericSeriesIndexing.time_getitem_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-        90.1±2μs         47.7±1μs     0.53  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')

If I understood you correctly, you would have expected the opposite?

jreback · 2020-11-15T16:58:26Z

pandas/tests/indexing/test_coercion.py

        exp = pd.Series([1, val, 3, 4])
        self._assert_setitem_series_conversion(obj, val, exp, exp_dtype)

    @pytest.mark.parametrize(
-        "val,exp_dtype", [(np.int32(1), np.int8), (np.int16(2 ** 9), np.int16)]
+        "val,exp_dtype", [(np.int32(1), np.int32), (np.int16(2 ** 9), np.int16)]


this was downcasting before?

jreback · 2020-11-15T16:58:55Z

pandas/tests/indexing/test_coercion.py

-            (1.1, np.float64),
-            (1 + 1j, np.complex128),
+            (1, "object"),
+            (3, "object"),


so these were doing an inferred cast before?

numpy inferred cast, yes

jreback · 2020-11-15T16:59:06Z

pandas/tests/indexing/test_partial.py

@@ -47,7 +47,6 @@ def test_partial_setting(self):
        with pytest.raises(IndexError, match=msg):
            s.iloc[3] = 5.0

-        msg = "index 3 is out of bounds for axis 0 with size 3"


what is the message now

Messages for iat and iloc are now the same, hence the deletion of the redefinition

jreback · 2020-11-15T17:00:06Z

pandas/core/series.py

            else:
                loc = self.index.get_loc(label)
                validate_numeric_casting(self.dtype, value)
+                if not is_dtype_equal(self.dtype, dtype) and is_numeric_dtype(dtype):


can you integrate this with the higher level if/else, these nested if/else are harder to read, even if it means duplicating some code

jreback · 2020-11-15T17:00:52Z

pandas/core/series.py

+        else:
+            # This only raises when index contains tuples
+            try:
+                self.loc[key] = value


this is a very strange change, why is it needed?

Yeah, that is pretty ugly, but a loc limitation unfortunately. If the index is a regular index where values are tuples, loc does not work.

Example:

obj = Series([1, 2], index=[(0,1), (1,2)]) obj[(0,1)] = "test"

This runs in the loc case and loc raises

KeyError: "None of [Int64Index([0, 1], dtype='int64')] are in the [index]"

Tuples are interpreted as components of MultiIndex

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

jbrockmendel · 2020-11-19T22:58:38Z

doc/source/whatsnew/v1.2.0.rst

@@ -584,6 +584,7 @@ Indexing
 - Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
 - Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
 - Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`)
+- Bug in :meth:`DataFrame.at` and :meth:`Series.at` did not adjust dtype when float was assigned to integer column (:issue:`26395`, :issue:`20643`)


did not adjust -> not adjusting

jbrockmendel · 2020-11-19T23:05:12Z

pandas/core/series.py

+        else:
+            # This only raises when index contains tuples
+            try:
+                self.loc[key] = value


_set_with_engine is a fastpath, really shouldn't be going through self.loc

the whole infer_dtype_from_scalar stuff seems like it should be handled as part of validate_numeric_casting

the issue this PR is addressing is only for when we are indexing an entire column, right?

github-actions · 2020-12-20T00:15:22Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

jbrockmendel · 2021-01-03T01:26:03Z

@phofl can you merge master (even if this is on the back-burner, lets prevent it from drifting too far)

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/core/frame.py

phofl · 2021-01-03T18:52:07Z

Yeah have to get back to this...

jbrockmendel · 2021-01-31T15:42:30Z

@phofl #39478 does something similar, same effect on test_setitem_series_bool. Can that approach be used here too?

simonjayhawkins · 2021-06-08T18:55:40Z

@phofl what's the status here?

simonjayhawkins · 2021-06-16T13:46:26Z

@phofl closing as stale. reopen when ready.

phofl added 6 commits November 7, 2020 00:54

Start fixing dtype setting

eafd9ea

Fix dtype setting for incompatible dtypes

c95dcc9

Add whatsnew

3788c4d

Fix imports

27d9e35

Change import order

23cc926

Add check for numeric dtype

cee9fc2

jreback requested changes Nov 7, 2020

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 8, 2020

jreback requested changes Nov 8, 2020

View reviewed changes

phofl changed the title ~~BUG: Dtype was not changed when float was assignt do int column~~ BUG: Dtype was not changed when float was assignt to int column Nov 8, 2020

phofl added 2 commits November 8, 2020 01:14

Merge branch 'master' of https://github.com/pandas-dev/pandas into 26395

97e6da8

� Conflicts: � pandas/tests/indexing/test_at.py

Parametrize tests

d1bc870

phofl mentioned this pull request Nov 8, 2020

TST: Add tests for dtype changes when assigning values via indexing #37692

Closed

phofl added 5 commits November 9, 2020 23:20

Merge branch 'master' of https://github.com/pandas-dev/pandas into 26395

7e16465

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Improve dtype casting for at and iat

4d91e66

Adjust whatsnew

d608474

Merge branch 'master' of https://github.com/pandas-dev/pandas into 26395

1d332e8

� Conflicts: � pandas/tests/indexing/test_at.py

Run black

760c6a1

phofl added 2 commits November 10, 2020 23:01

Fix tests with new dtype conversion

870e927

Fix remaining test

fa8b331

Change if condition

903be31

jbrockmendel reviewed Nov 12, 2020

View reviewed changes

Fix if condition

566d769

jreback requested changes Nov 15, 2020

View reviewed changes

phofl added 2 commits November 15, 2020 18:40

Move if condition

916faa4

Merge branch 'master' of https://github.com/pandas-dev/pandas into 26395

b0699a9

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

jbrockmendel reviewed Nov 19, 2020

View reviewed changes

github-actions bot added the Stale label Dec 20, 2020

phofl added 2 commits January 3, 2021 19:51

Merge branch 'master' of https://github.com/pandas-dev/pandas into 26395

33c8424

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/core/frame.py

Move whatsnew

6e7ee82

simonjayhawkins closed this Jun 16, 2021

phofl deleted the 26395 branch April 27, 2023 19:52

BUG: Dtype was not changed when float was assignt to int column #37680

BUG: Dtype was not changed when float was assignt to int column #37680

Conversation

phofl commented Nov 7, 2020 • edited Loading

phofl commented Nov 7, 2020

Choose a reason for hiding this comment

phofl Nov 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback Nov 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Nov 10, 2020

phofl commented Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 12, 2020

phofl commented Nov 12, 2020

jbrockmendel commented Nov 14, 2020

jreback commented Nov 14, 2020

phofl commented Nov 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl Nov 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 20, 2020

jbrockmendel commented Jan 3, 2021

phofl commented Jan 3, 2021

jbrockmendel commented Jan 31, 2021

simonjayhawkins commented Jun 8, 2021

simonjayhawkins commented Jun 16, 2021

phofl commented Nov 7, 2020 •

edited

Loading

phofl Nov 8, 2020 •

edited

Loading

jreback Nov 8, 2020 •

edited

Loading

phofl commented Nov 10, 2020 •

edited

Loading

phofl Nov 15, 2020 •

edited

Loading