Skip to content

BUG: unwrap setitem indexer for ArrowExtensionArray #50085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 4, 2023

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Dec 6, 2022

Setting values in a pyarrow-backed column with a non-scalar indexer was raising.

Fixes the following:

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, dtype="int64[pyarrow]")

# IndexError: index 1 is out of bounds for axis 0 with size 1
df.loc[[0, 1], "b"] = 0

@lukemanley lukemanley added Bug Indexing Related to indexing on series/frames, not to indexes themselves Arrow pyarrow functionality labels Dec 6, 2022
@@ -1669,7 +1669,11 @@ def _unwrap_setitem_indexer(self, indexer):
# Should never have length > 2. Caller is responsible for checking.
# Length 1 is reached vis setitem_single_block and setitem_single_column
# each of which pass indexer=(pi,)
if len(indexer) == 2:
if len(indexer) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but it might make more sense to handle this in setitem of the ArrowExtensionArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was 50/50 on whether to put it here or in ArrowExtensionArray.setitem. I ended up leaning towards this as it seems consistent with this method and may be helpful beyond ArrowExtensionArray (e.g. third-party EAs). But happy to move it if setitem is a better place for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more concerned with our setiteim for Arrow diverging from the numpy setitem call, but not sure if this is correct either.

cc @jbrockmendel Could you weigh in here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id want to handle it correctly in the ArrowArray method. could add a test to the extension base tests for handling this case correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will change this to be handled within ArrowExtensionArray.__setitem__.

I guess the typing on ExtensionArray.__setitem__ and ArrowExtensionArray.__setitem__ should change as well?

def __setitem__(self, key: int | slice | np.ndarray, value: Any) -> None:

def __setitem__(self, key: int | slice | np.ndarray, value: Any) -> None:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah would make sense

@lukemanley lukemanley changed the title BUG: unwrap setitem indexer for 1D-only extension arrays BUG: unwrap setitem indexer for ArrowExtensionArray Dec 17, 2022
@mroeschke mroeschke added this to the 2.0 milestone Jan 4, 2023
@mroeschke mroeschke merged commit 0a4ba64 into pandas-dev:main Jan 4, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the ea-unwrap-setitem-indexer branch January 19, 2023 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants