-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Bug in loc raised ValueError when setting value via boolean list #37761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexing.py
Outdated
@@ -602,7 +603,7 @@ def _get_setitem_indexer(self, key): | |||
""" | |||
Convert a potentially-label-based key into a positional indexer. | |||
""" | |||
if self.name == "loc": | |||
if self.name == "loc" or is_bool_dtype(np.array(key)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it only lists we want here, or would com.is_bool_indexer(key)
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is better, changed it :)
pandas/tests/indexing/test_loc.py
Outdated
@@ -1955,6 +1955,13 @@ def test_loc_setitem_dt64tz_values(self): | |||
result = s2["a"] | |||
assert result == expected | |||
|
|||
def test_loc_setitem_boolean_list(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets put this in TestLocBooleanMask
can this be shared by DataFrame?
few words after the GH ref to the effect of "specifically list key, not arraylike"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added a DataFrame
part to the test, parametrization would be quite ugly here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the original issue wa about iloc, can you test that as well (not sure if this is a different bug then)
pandas/tests/indexing/test_loc.py
Outdated
@@ -1644,6 +1644,18 @@ def test_loc_setitem_mask_td64_series_value(self): | |||
assert expected == result | |||
tm.assert_frame_equal(df, df_copy) | |||
|
|||
def test_loc_setitem_boolean_list(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize the rhs by array as well as list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i see the comment, but still think can parameterize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
cc @jreback Thanks very much, iloc actually was a different issue. The validation in |
pandas/tests/indexing/test_loc.py
Outdated
def test_loc_setitem_boolean_list(self, func, rhs_func): | ||
# GH#20438 testing specifically list key, not arraylike | ||
ser = Series([0, 1, 2]) | ||
getattr(ser, func)[[True, False, True]] = rhs_func([5, 10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also parameterize on the indexing type as well (e.g. list / array / pd.Index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index is not allowed for iloc, added the remaining ones
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
doc/source/whatsnew/v1.2.0.rst
Outdated
@@ -476,6 +476,7 @@ Indexing | |||
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`) | |||
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`) | |||
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`) | |||
- Bug in :meth:`Series.loc` raised an Error when input was filtered with a boolean list and values to set were a list with lower dimension (:issue:`20438`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a specific exception we can say instead of Error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError, added it
pandas/tests/indexing/test_loc.py
Outdated
@pytest.mark.parametrize("indexing_func", [list, np.array]) | ||
@pytest.mark.parametrize("func", ["loc", "iloc"]) | ||
@pytest.mark.parametrize("rhs_func", [list, np.array]) | ||
def test_loc_setitem_boolean_list(self, func, rhs_func, indexing_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its only the *-iloc-list cases that fail on master. at its heart, is this an iloc bug or a loc bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iloc, thx.
pandas/tests/indexing/test_loc.py
Outdated
def test_loc_setitem_boolean_list(self, func, rhs_func, indexing_func): | ||
# GH#20438 testing specifically list key, not arraylike | ||
ser = Series([0, 1, 2]) | ||
getattr(ser, func)[indexing_func([True, False, True])] = rhs_func([5, 10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the np.array case, do we care if the ndarray has bool vs object dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object dtype is killed by https://github.com/pandas-dev/pandas/blob/b96665777c083516c9cdf21d25b7d9ccf70c81bd/pandas/core/indexers.py#L153:L157 and raises a ValueError too.
pandas/core/indexing.py
Outdated
@@ -602,7 +602,7 @@ def _get_setitem_indexer(self, key): | |||
""" | |||
Convert a potentially-label-based key into a positional indexer. | |||
""" | |||
if self.name == "loc": | |||
if self.name == "loc" or com.is_bool_indexer(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i haven't fully formed an opinion, but i think another alternative would be to properly handle boolean masks in Block.setitem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, loc is not failing, this means we do not need this or. Removed it. Must have mixed something up before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lists can be handled in Block.setitem already, but arrays from dtype object don't work. Lists are killed by the validation mentioned above in pandas.core.indexers
. If we let them through, setitem seems to be able to handle them in the iloc testcases
does this have a perf impact? IIRC is_bool_indexer isn't completely cheap |
Ran indexing asvs. Seems a bit weird
|
couple comments, needs rebase otherwise LGTM |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/indexing/test_iloc.py
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
� Conflicts: � doc/source/whatsnew/v1.2.0.rst � pandas/tests/indexing/test_iloc.py
ping @jbrockmendel ? |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
It looks like in master the test case raises in check_setitem_lengths. would it make sense to patch at that point? |
We could check for lists there too and count the True occurrences. Implemented it. |
� Conflicts: � pandas/tests/indexing/test_iloc.py
@phofl can you rebase (i am not sure when it was before), and ping on green. |
@jreback greenish, we have same number of failures as on master |
Thanks @phofl |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Not quite sure if conversion to numpy array is necessary, but
is_dtype_bool
does not accept lists