Skip to content

BUG: unwrap setitem indexer for ArrowExtensionArray #50085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 4, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -710,6 +710,7 @@ Indexing
- Bug in :meth:`DataFrame.reindex` filling with wrong values when indexing columns and index for ``uint`` dtypes (:issue:`48184`)
- Bug in :meth:`DataFrame.loc` coercing dtypes when setting values with a list indexer (:issue:`49159`)
- Bug in :meth:`DataFrame.loc` raising ``ValueError`` with ``bool`` indexer and :class:`MultiIndex` (:issue:`47687`)
- Bug in :meth:`DataFrame.loc` raising ``IndexError`` when setting values for a pyarrow-backed column with a non-scalar indexer (:issue:`50085`)
- Bug in :meth:`DataFrame.__setitem__` raising ``ValueError`` when right hand side is :class:`DataFrame` with :class:`MultiIndex` columns (:issue:`49121`)
- Bug in :meth:`DataFrame.reindex` casting dtype to ``object`` when :class:`DataFrame` has single extension array column when re-indexing ``columns`` and ``index`` (:issue:`48190`)
- Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1669,7 +1669,11 @@ def _unwrap_setitem_indexer(self, indexer):
# Should never have length > 2. Caller is responsible for checking.
# Length 1 is reached vis setitem_single_block and setitem_single_column
# each of which pass indexer=(pi,)
if len(indexer) == 2:
if len(indexer) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but it might make more sense to handle this in setitem of the ArrowExtensionArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was 50/50 on whether to put it here or in ArrowExtensionArray.setitem. I ended up leaning towards this as it seems consistent with this method and may be helpful beyond ArrowExtensionArray (e.g. third-party EAs). But happy to move it if setitem is a better place for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more concerned with our setiteim for Arrow diverging from the numpy setitem call, but not sure if this is correct either.

cc @jbrockmendel Could you weigh in here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id want to handle it correctly in the ArrowArray method. could add a test to the extension base tests for handling this case correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will change this to be handled within ArrowExtensionArray.__setitem__.

I guess the typing on ExtensionArray.__setitem__ and ArrowExtensionArray.__setitem__ should change as well?

def __setitem__(self, key: int | slice | np.ndarray, value: Any) -> None:

def __setitem__(self, key: int | slice | np.ndarray, value: Any) -> None:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah would make sense

# GH50085 1D-only non-ndarray may not expect a tuple (e.g. pyarrow)
indexer = indexer[0]

elif len(indexer) == 2:

if all(isinstance(x, np.ndarray) and x.ndim == 2 for x in indexer):
# GH#44703 went through indexing.maybe_convert_ix
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/extension/base/setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,3 +441,11 @@ def test_setitem_invalid(self, data, invalid_scalar):

with pytest.raises((ValueError, TypeError), match=msg):
data[:] = invalid_scalar

def test_setitem_2d_values(self, data):
# GH50085
original = data.copy()
df = pd.DataFrame({"a": data, "b": data})
df.loc[[0, 1], :] = df.loc[[1, 0], :].values
assert (df.loc[0, :] == original[1]).all()
assert (df.loc[1, :] == original[0]).all()