REF: mask values in loc.setitem with bool indexer #45501

jbrockmendel · 2022-01-20T18:09:48Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

The motivation here is the 1-line change in Block.setitem, which will enable us to share Block.setitem with EABackedBlock.setitem.

cc @phofl thoughts on how to make this prettier or more robust?

phofl · 2022-01-20T18:51:53Z

pandas/core/indexing.py

+        # mask = np.array([True, False, True, False, True])
+        # ser[mask] = ser * 2  # <- works
+        # ser.loc[mask] = ser * 2  # <- works
+        # ser.iloc[mask] = ser * 2  # <- raises


loc and setitem align, while iloc does not. Hence this should not work for iloc. This was changed for 1.2 I think and was intended

phofl · 2022-01-20T19:06:43Z

pandas/core/indexing.py

+                    if ndim == 1:
+                        value = value[pi]
+                        indexer = (newkey, icols)
+                        # TODO: is it sketchy that icols is an ndarray and value


I think this is ok, something like

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) df.loc[[False, True, True], ["a"]] = pd.Series([10, 20, 30])

is valid and runs through there

i guess. still seems weird as the analogous behavior with ndarray doesn't work

arr = np.random.randn(3, 2) values = np.array([1, 2, 3]) arr[:, 0] = values # <- works arr[:, [0]] = values # <- raises

This is odd yes

phofl · 2022-01-20T19:09:55Z

pandas/core/indexing.py

+
+                    elif ndim == 2 and value.shape[1] == 1:
+                        if isinstance(value, ABCDataFrame):
+                            value = value.iloc[newkey]


This causes bugs:

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) df.loc[[False, False, True], ["a"]] = pd.DataFrame({"a": [10, 20, 30]}, index=[2, 1, 0])

This returns

a b 0 1 4 1 2 5 2 10 6

on main, which is correct and

a b 0 1.0 4.0 1 2.0 5.0 2 NaN 6.0

on this branch, which is wrong

This has to be aligned before applying iloc

good catch, will update+test

so using align_series/align_frame fixes the examples you've provided, but breaks test_loc_setitem_boolean_mask_allfalse, test_setitem_loc_only_false_indexer_dtype_changed, test_loc_setitem_all_false_boolean_two_blocks, all cases where mask.sum() == 0. ATM i can get the tests passing by putting in a check specific to that case, but its definitely a kludge

Hm, it looks like the align functions can not deal with all False boolean indexers if I understand this correctly. This looks like a bug in there...

If you set the indexer to something like indexer = (np.array([False, False, False]), 0) you get the identity.

phofl · 2022-01-20T19:12:21Z

pandas/core/indexing.py

+                        if isinstance(value, ABCDataFrame):
+                            value = value.iloc[newkey]
+                        else:
+                            value = value[pi]


´This case is for arrays?

phofl · 2022-01-20T19:19:34Z

pandas/core/indexing.py

+                    and len(icols) == 1
+                ):
+                    if ndim == 1:
+                        value = value[pi]


Same here as with iloc below.

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) df.loc[[False, False, True], ["a"]] = pd.Series([10, 11, 12], index=[2, 1, 0])

This should set 10 instead of NaN

phofl · 2022-01-20T19:25:22Z

pandas/core/indexing.py

+    @final
+    def _maybe_mask_setitem_value(self, indexer, value):
+        """
+        If we have obj.iloc[mask] = arraylike and arraylike has the same


I think you mean loc here?

I think arraylike is not correct, we do this only when the rhs has an index. We align not apply the mask too

df = pd.DataFrame({"a": [1, 2, 3], "b": 1}) df.loc[[False, False, True], "b"] = np.array([10, 11, 12])

Raises, which is what I would have expected

I think this should continue to raise

phofl · 2022-01-20T19:28:32Z

pandas/core/indexing.py

+
+                if is_scalar_indexer(icols, self.ndim - 1) and ndim == 1:
+                    # e.g. test_loc_setitem_boolean_mask_allfalse
+                    value = value[pi]


Same as for the iloc case below

df = pd.DataFrame({"a": [1, 2, 3], "b": 1}) df.loc[[False, False, True], "a"] = pd.Series([10, 12, 13], index=[2, 1, 0])

Should also set 10

phofl · 2022-01-20T19:32:09Z

This is tricky, in case of Series/DataFrame I think reindexing the value should work. In case of objects without indexes (e.g. np.array) this should raise I think

jbrockmendel · 2022-01-20T22:42:15Z

This is tricky, in case of Series/DataFrame I think reindexing the value should work. In case of objects without indexes (e.g. np.array) this should raise I think

I like this idea, will update to let non-Series/DataFrame fall through.

jbrockmendel · 2022-01-22T00:25:17Z

Updated to address comments and add tests. special-cases all-False cases with FIXME comments to punt on #45501 (comment).

jreback · 2022-01-23T01:25:20Z

https://github.com/pandas-dev/pandas/runs/4909709364?check_suite_focus=true

jreback · 2022-01-31T00:05:03Z

status here?

jbrockmendel · 2022-01-31T00:08:09Z

status here?

ready to go. looking forward to some nice de-duplication this will allow.

REF: mask values in loc.__setitem__ with bool indexer

d7b22f3

phofl reviewed Jan 20, 2022

View reviewed changes

Merge branch 'main' into ref-bool_indexer

9e73633

jbrockmendel added 3 commits January 20, 2022 15:08

handle reindexing

92a7e3a

Merge branch 'main' into ref-bool_indexer

0c4d64e

Merge branch 'main' into ref-bool_indexer

f237aac

Merge branch 'main' into ref-bool_indexer

94c6761

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code labels Jan 23, 2022

jbrockmendel added 4 commits January 22, 2022 17:36

Merge branch 'main' into ref-bool_indexer

b91fc34

delay masking

46f0f22

Merge branch 'main' into ref-bool_indexer

5b62674

Merge branch 'main' into ref-bool_indexer

65ec7d8

jreback added this to the 1.5 milestone Jan 25, 2022

jreback merged commit c64fbce into pandas-dev:main Jan 31, 2022

jbrockmendel deleted the ref-bool_indexer branch January 31, 2022 00:21

phofl pushed a commit to phofl/pandas that referenced this pull request Feb 14, 2022

REF: mask values in loc.__setitem__ with bool indexer (pandas-dev#45501)

3f7344e

phofl mentioned this pull request Feb 14, 2022

Regression: Series.loc.setitem raising with all false indexer and series on rhs #45784

Merged

4 tasks

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

REF: mask values in loc.__setitem__ with bool indexer (pandas-dev#45501)

469e582

jorisvandenbossche mentioned this pull request Sep 22, 2022

REGR: Setting values with 'loc' and boolean mask mixes up values (all-True mask) #48701

Closed

Uh oh!

REF: mask values in loc.__setitem__ with bool indexer #45501

REF: mask values in loc.__setitem__ with bool indexer #45501

Uh oh!

Conversation

jbrockmendel commented Jan 20, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phofl Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phofl commented Jan 20, 2022

Uh oh!

jbrockmendel commented Jan 20, 2022

Uh oh!

jbrockmendel commented Jan 22, 2022

Uh oh!

jreback commented Jan 23, 2022

Uh oh!

jreback commented Jan 31, 2022

Uh oh!

jbrockmendel commented Jan 31, 2022

Uh oh!

Uh oh!

REF: mask values in loc.setitem with bool indexer #45501

REF: mask values in loc.setitem with bool indexer #45501

phofl Jan 20, 2022 •

edited

Loading