BUG: Fix bug in loc setitem changing the dtype when condition is False #37672

phofl · 2020-11-06T20:49:53Z

closes BUG: Loc changes dtype when condition is completly False #37550
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This fixes the bug, but I am not sure, if the solution is good enough....

� Conflicts: � pandas/tests/frame/indexing/test_setitem.py

jreback · 2020-11-08T03:06:34Z

pandas/core/internals/blocks.py

            else:
                # e.g. we are bool dtype and value is nan
                # TODO: watch out for case with listlike value and scalar/empty indexer
+                if is_list_like(value) and is_empty_indexer(indexer, np.array(value)):


can you put this as another elif case

jbrockmendel · 2020-11-08T23:04:16Z

pandas/core/internals/blocks.py

@@ -920,10 +920,11 @@ def setitem(self, indexer, value):

            elif lib.is_scalar(value) and not isna(value):
                dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
-
+            elif is_list_like(value) and is_empty_indexer(indexer, np.array(value)):


any way to avoid the np.array call? i think that makes a copy if we have eg a list

I don't think so, is_empty_indexer requires an ndarray, while value may be a list.

the test is for loc.__setitem__. is that (or i guess iloc.__setitem__) the only way we can get here? it may make sense to do this in setitem_with_indexer

Yeah, that would be possible I think.
Sport would be in

pandas/pandas/core/indexing.py

Line 1567 in 524fc9c

for i, idx in enumerate(indexer):

Ran the tests in indexing, frame/indexing and series/indexing locally, this seems to work. Let me know, if we should move this

you can use np.asarray, also can we check len of the indexer here first to short-cut?

Thx used asarray. Indexer is a tuple containing an empty list, len does not work here unfortunately.

jbrockmendel · 2020-11-08T23:06:59Z

pandas/tests/frame/indexing/test_setitem.py

@@ -298,6 +298,13 @@ def test_iloc_setitem_bool_indexer(self, klass):
        expected = DataFrame({"flag": ["x", "y", "z"], "value": [2, 3, 4]})
        tm.assert_frame_equal(df, expected)

+    def test_setitem_only_false_indexer_dtype_changed(self):


can you put this below in the class TestDataFrameSetItemBooleanMask?

which part(s) of the line df.loc[[False], ["b"]] = 10 - df["c"] are relevant to the bugfix here? e.g. does it matter that the right hand side is a Series? Would this be testing the right thing if it were series.loc[[False]]?

df.loc[[False], ["b"]] = 10 did not trigger the error. We need a Series on the rhs.

Moved the test

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

pandas/tests/frame/indexing/test_setitem.py

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

jbrockmendel · 2020-11-14T03:44:04Z

doc/source/whatsnew/v1.2.0.rst

@@ -469,6 +469,7 @@ Indexing
 - Bug in :meth:`Index.where` incorrectly casting numeric values to strings (:issue:`37591`)
 - Bug in :meth:`Series.loc` and :meth:`DataFrame.loc` raises when numeric label was given for object :class:`Index` although label was in :class:`Index` (:issue:`26491`)
 - Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
+- Bug in :meth:`DataFrame.loc.__setitem__` changed dtype when indexer was completely ``False`` (:issue:`37550`)


changed -> changing

these should usually be participles

Thx, will follow this in the future

jbrockmendel · 2020-11-14T03:49:51Z

pandas/tests/frame/indexing/test_setitem.py

@@ -339,3 +339,13 @@ def test_setitem_boolean_mask(self, mask_type, float_frame):
        expected = df.copy()
        expected.values[np.array(mask)] = np.nan
        tm.assert_frame_equal(result, expected)
+
+    def test_setitem_only_false_indexer_dtype_changed(self):


can you comment and/or edit the test name to make clear

is iloc affected?

does it matter that you have ["b"] instead of, say, "b" or ":"?

comment that the bug is for the value being set being a Series obj

is Series affected?

Series is not affected. Goes only wrong with ["b"]

Iloc is actually also affected, but has to be fixed differently probably. In case of loc we get an empty indexer. In case of iloc we get an indexer like ([False],), which is not empty per is_empty_indexer. We could adjust is_empty_indexer or we have to go a different way here

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

jreback · 2020-11-15T17:51:16Z

pandas/core/internals/blocks.py

@@ -920,10 +920,11 @@ def setitem(self, indexer, value):

            elif lib.is_scalar(value) and not isna(value):
                dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
-
+            elif is_list_like(value) and is_empty_indexer(indexer, np.array(value)):


you can use np.asarray, also can we check len of the indexer here first to short-cut?

jbrockmendel · 2020-11-16T17:19:56Z

pandas/tests/frame/indexing/test_setitem.py

+        expected = DataFrame({"a": ["a"], "b": [1], "c": [1]})
+        tm.assert_frame_equal(df, expected)
+
+        df.loc[[False], ["b"]] = 10 - 1


you can simplify to 9

jbrockmendel · 2020-11-16T17:20:40Z

pandas/tests/frame/indexing/test_setitem.py

+        # GH#37550
+        # Dtype is only changed when value to set is a Series
+        df = DataFrame({"a": ["a"], "b": [1], "c": [1]})
+        df.loc[[False], ["b"]] = 10 - df["c"]


can you make clear either in the test name and/or comment that/whether the type of indexer for ["b"] is relevant

Added comment and parametrized test

jbrockmendel · 2020-11-16T17:23:26Z

pandas/core/internals/blocks.py

@@ -928,10 +928,11 @@ def setitem(self, indexer, value):

            elif lib.is_scalar(value) and not isna(value):
                dtype, _ = infer_dtype_from_scalar(value, pandas_dtype=True)
-
+            elif is_list_like(value) and is_empty_indexer(indexer, np.asarray(value)):


im not sure this is the right place to catch this, since i think we are specifically interested in all-False boolean indexers coming through loc. does this get here via setitem_with_indexer, and if so, can you point out specifically which branch(es)?

Through _setitem_with_indexer_split_path -> _setitem_with_indexer_2d_value -> _setitem_single_column

as mentioned in #37672 (comment) we could do that in setitem_with_indexer if this is better

In _setitem_with_indexer_split_path L 1686-1688 we have a no-op block that is for all-false boolean masks. could move it up to before the 2D check. would that do the trick?

One possible solution is moving up the lplane_indexer==0 condition and removing the value part. Probably the shortest and does not seem to break tests. Don't know if this kills behavior not tested, but if indexer has len=0, we probably do not want to set values anyway

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/core/indexing.py

jbrockmendel · 2020-11-17T22:42:04Z

pandas/tests/frame/indexing/test_setitem.py

+        expected = DataFrame({"a": ["a"], "b": [1], "c": [1]})
+        tm.assert_frame_equal(df, expected)
+
+        df.loc[[False], ["b"]] = 9


should [False] be func(value) as above? if not, this might belong as a separate test

pandas/tests/frame/indexing/test_setitem.py

jreback · 2021-05-21T17:36:59Z

@phofl can you rebase and will look again

phofl · 2021-05-23T22:31:07Z

merged master

jreback · 2021-05-26T12:43:40Z

@jbrockmendel ok here?

jreback · 2021-05-31T15:00:59Z

@phofl can you update

jbrockmendel · 2021-05-31T15:20:17Z

Looks like the ser[[]] = [1, 2, 3] thing hasnt been addressed

jreback · 2021-05-31T15:32:56Z

ok let's move off the milestone for rn, can always do after the rc

jreback · 2021-10-04T00:14:12Z

@phofl status of this (merge master)

jreback · 2021-11-28T21:06:57Z

@phofl worth fixing up?

phofl · 2021-11-28T21:11:58Z

Yeah definitely.

i hope that I can get back to this friday or friday next week

# Conflicts: # doc/source/whatsnew/v1.3.0.rst # pandas/core/indexing.py

phofl · 2021-12-10T15:04:11Z

I moved the fix for the all False case. This fixes an inconsistency in _align_series and covers the all False indexer case.

I could not figure the empty list thing out. The values look exactly the same as in the all False case. We probably will have to fix this someplace else, but not sure right now.

I woild propose open an issue about that and go from there.

jreback · 2021-12-10T17:43:25Z

lgtm @jbrockmendel if any comments

jbrockmendel · 2021-12-10T18:06:04Z

will take a look this afternoon. (got booster yesterday, moving slow today)

jbrockmendel · 2021-12-11T23:16:11Z

pandas/core/indexing.py

@@ -2058,6 +2058,8 @@ def ravel(i):
            # we have a frame, with multiple indexers on both axes; and a
            # series, so need to broadcast (see GH5206)
            if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer):
+                if is_empty_indexer(indexer[0], ser._values):
+                    return ser._values.copy()


evidently it works, but this seems like a weird place to handle this. is my intuition wrong here?

This ensures, that an indexer like (np.array([]), np.array([1])) is handled the same as (np.array([]), 1), hence I handled it here.

jbrockmendel · 2021-12-11T23:16:36Z

Does this relate to #43632?

phofl · 2021-12-11T23:19:41Z

I don't think so. The all False case is an empty not a boolean indexer when jumping into the setitem logic

jbrockmendel · 2021-12-13T16:03:03Z

pandas/tests/frame/indexing/test_setitem.py

+    def test_setitem_loc_empty_indexer_raises_with_non_empty_value(self, box):
+        # GH#37672
+        df = DataFrame({"a": ["a"], "b": [1], "c": [1]})
+        if box == Series:


nitpick "box is Series", i can change this in my next CLN branch

jbrockmendel

LGTM

jreback · 2021-12-13T16:53:45Z

thanks @phofl

phofl added 3 commits November 4, 2020 00:06

Fix bug in setitem when indexer was only false changed dtype

9df9c89

Merge branch 'master' of https://github.com/pandas-dev/pandas into 37550

6e0ec24

� Conflicts: � pandas/tests/frame/indexing/test_setitem.py

Add space

26b7dae

phofl added the Indexing Related to indexing on series/frames, not to indexes themselves label Nov 6, 2020

rhshadrach added the Bug label Nov 6, 2020

jreback requested changes Nov 8, 2020

View reviewed changes

Move condition to elif

11d27b1

jbrockmendel reviewed Nov 8, 2020

View reviewed changes

phofl added 3 commits November 9, 2020 00:15

Move test

a527342

Merge branch 'master' of https://github.com/pandas-dev/pandas into 37550

722cf9d

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Adjust test

345b3c1

jreback reviewed Nov 9, 2020

View reviewed changes

pandas/tests/frame/indexing/test_setitem.py Show resolved Hide resolved

Merge branch 'master' of https://github.com/pandas-dev/pandas into 37550

8c57c09

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

jbrockmendel reviewed Nov 14, 2020

View reviewed changes

phofl added 3 commits November 14, 2020 13:17

Fix whatsnew

3b92d31

Merge branch 'master' of https://github.com/pandas-dev/pandas into 37550

97203de

� Conflicts: � doc/source/whatsnew/v1.2.0.rst

Make test more clear

2f7bf13

jreback requested changes Nov 15, 2020

View reviewed changes

Use asarray

eb3b204

jbrockmendel reviewed Nov 16, 2020

View reviewed changes

phofl added 4 commits November 16, 2020 21:43

Parametrize test

9d59de4

Move if condition

b02e629

Merge branch 'master' of https://github.com/pandas-dev/pandas into 37550

b27c825

� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/core/indexing.py

Remove import

8edc7d0

jbrockmendel reviewed Nov 17, 2020

View reviewed changes

pandas/tests/frame/indexing/test_setitem.py Outdated Show resolved Hide resolved

Merge branch 'master' of https://github.com/pandas-dev/pandas into 37550

6871ae9

jreback added this to the 1.3 milestone May 24, 2021

jreback removed this from the 1.3 milestone May 31, 2021

phofl added 3 commits December 10, 2021 15:13

Merge remote-tracking branch 'upstream/master' into 37550

56f4f76

# Conflicts: # doc/source/whatsnew/v1.3.0.rst # pandas/core/indexing.py

Move check

d334c7a

Split tests

f12ef06

Merge remote-tracking branch 'upstream/master' into 37550

131d168

jreback added this to the 1.4 milestone Dec 10, 2021

jreback approved these changes Dec 10, 2021

View reviewed changes

jbrockmendel reviewed Dec 11, 2021

View reviewed changes

jbrockmendel reviewed Dec 13, 2021

View reviewed changes

jbrockmendel approved these changes Dec 13, 2021

View reviewed changes

jreback merged commit 776857a into pandas-dev:master Dec 13, 2021

simonjayhawkins mentioned this pull request Feb 7, 2022

BUG: setting with loc breaks in 1.4.0 when indexing using an all-False boolean series #45778

Closed

3 tasks

phofl deleted the 37550 branch April 27, 2023 19:52

BUG: Fix bug in loc setitem changing the dtype when condition is False #37672

BUG: Fix bug in loc setitem changing the dtype when condition is False #37672

Conversation

phofl commented Nov 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl Nov 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented May 21, 2021

phofl commented May 23, 2021

jreback commented May 26, 2021

jreback commented May 31, 2021

jbrockmendel commented May 31, 2021

jreback commented May 31, 2021

jreback commented Oct 4, 2021

jreback commented Nov 28, 2021

phofl commented Nov 28, 2021

phofl commented Dec 10, 2021

jreback commented Dec 10, 2021

jbrockmendel commented Dec 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 11, 2021

phofl commented Dec 11, 2021

Choose a reason for hiding this comment

jbrockmendel left a comment

Choose a reason for hiding this comment

jreback commented Dec 13, 2021

phofl Nov 8, 2020 •

edited

Loading