-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: mask values in loc.__setitem__ with bool indexer #45501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexing.py
Outdated
# mask = np.array([True, False, True, False, True]) | ||
# ser[mask] = ser * 2 # <- works | ||
# ser.loc[mask] = ser * 2 # <- works | ||
# ser.iloc[mask] = ser * 2 # <- raises |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loc and setitem align, while iloc does not. Hence this should not work for iloc. This was changed for 1.2 I think and was intended
pandas/core/indexing.py
Outdated
if ndim == 1: | ||
value = value[pi] | ||
indexer = (newkey, icols) | ||
# TODO: is it sketchy that icols is an ndarray and value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok, something like
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df.loc[[False, True, True], ["a"]] = pd.Series([10, 20, 30])
is valid and runs through there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess. still seems weird as the analogous behavior with ndarray doesn't work
arr = np.random.randn(3, 2)
values = np.array([1, 2, 3])
arr[:, 0] = values # <- works
arr[:, [0]] = values # <- raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd yes
pandas/core/indexing.py
Outdated
|
||
elif ndim == 2 and value.shape[1] == 1: | ||
if isinstance(value, ABCDataFrame): | ||
value = value.iloc[newkey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes bugs:
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df.loc[[False, False, True], ["a"]] = pd.DataFrame({"a": [10, 20, 30]}, index=[2, 1, 0])
This returns
a b
0 1 4
1 2 5
2 10 6
on main, which is correct and
a b
0 1.0 4.0
1 2.0 5.0
2 NaN 6.0
on this branch, which is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be aligned before applying iloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will update+test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so using align_series/align_frame fixes the examples you've provided, but breaks test_loc_setitem_boolean_mask_allfalse, test_setitem_loc_only_false_indexer_dtype_changed, test_loc_setitem_all_false_boolean_two_blocks, all cases where mask.sum() == 0. ATM i can get the tests passing by putting in a check specific to that case, but its definitely a kludge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it looks like the align functions can not deal with all False boolean indexers if I understand this correctly. This looks like a bug in there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set the indexer to something like indexer = (np.array([False, False, False]), 0)
you get the identity.
pandas/core/indexing.py
Outdated
if isinstance(value, ABCDataFrame): | ||
value = value.iloc[newkey] | ||
else: | ||
value = value[pi] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
´This case is for arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pandas/core/indexing.py
Outdated
and len(icols) == 1 | ||
): | ||
if ndim == 1: | ||
value = value[pi] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as with iloc below.
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df.loc[[False, False, True], ["a"]] = pd.Series([10, 11, 12], index=[2, 1, 0])
This should set 10 instead of NaN
pandas/core/indexing.py
Outdated
@final | ||
def _maybe_mask_setitem_value(self, indexer, value): | ||
""" | ||
If we have obj.iloc[mask] = arraylike and arraylike has the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean loc here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think arraylike is not correct, we do this only when the rhs has an index. We align not apply the mask too
df = pd.DataFrame({"a": [1, 2, 3], "b": 1})
df.loc[[False, False, True], "b"] = np.array([10, 11, 12])
Raises, which is what I would have expected
I think this should continue to raise
pandas/core/indexing.py
Outdated
|
||
if is_scalar_indexer(icols, self.ndim - 1) and ndim == 1: | ||
# e.g. test_loc_setitem_boolean_mask_allfalse | ||
value = value[pi] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the iloc case below
df = pd.DataFrame({"a": [1, 2, 3], "b": 1})
df.loc[[False, False, True], "a"] = pd.Series([10, 12, 13], index=[2, 1, 0])
Should also set 10
This is tricky, in case of Series/DataFrame I think reindexing the value should work. In case of objects without indexes (e.g. np.array) this should raise I think |
I like this idea, will update to let non-Series/DataFrame fall through. |
Updated to address comments and add tests. special-cases all-False cases with FIXME comments to punt on #45501 (comment). |
status here? |
ready to go. looking forward to some nice de-duplication this will allow. |
The motivation here is the 1-line change in Block.setitem, which will enable us to share Block.setitem with EABackedBlock.setitem.
cc @phofl thoughts on how to make this prettier or more robust?