Skip to content

BUG: Index.getitem returning wrong result with negative step for arrow #55832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 22, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Fixed regressions
Bug fixes
~~~~~~~~~
- Bug in :meth:`DatetimeIndex.diff` raising ``TypeError`` (:issue:`55080`)
- Bug in :meth:`Index.__getitem__` returning wrong result for Arrow dtypes and negative stepsize (:issue:`55832`)
- Bug in :meth:`Index.isin` raising for Arrow backed string and ``None`` value (:issue:`55821`)

.. ---------------------------------------------------------------------------
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,13 @@ def __getitem__(self, item: PositionalIndexer):
)
# We are not an array indexer, so maybe e.g. a slice or integer
# indexer. We dispatch to pyarrow.
if isinstance(item, slice):
if item.start == item.stop:
pass
elif item.start == -len(self) - 1:
Copy link
Member

@lukemanley lukemanley Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be item.start < -len(self)?

Not sure if you want to mimic numpy behavior here but numpy allows slicing beyond the len of the array:

In [1]: import numpy as np

In [2]: np.array(["a", "b", "c"])[slice(-10, 10)]
Out[2]: array(['a', 'b', 'c'], dtype='<U1')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense, updated

item = slice(None, item.stop, item.step)
elif item.stop == -len(self) - 1:
item = slice(item.start, None, item.step)
value = self._pa_array[item]
if isinstance(value, pa.ChunkedArray):
return type(self)(value)
Expand Down
14 changes: 11 additions & 3 deletions pandas/tests/indexes/object/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import pytest

from pandas._libs.missing import is_matching_na
import pandas.util._test_decorators as td

import pandas as pd
from pandas import Index
Expand Down Expand Up @@ -144,6 +145,13 @@ def test_get_indexer_non_unique_np_nats(self, np_nat_fixture, np_nat_fixture2):


class TestSliceLocs:
@pytest.mark.parametrize(
"dtype",
[
"object",
pytest.param("string[pyarrow_numpy]", marks=td.skip_if_no("pyarrow")),
],
)
@pytest.mark.parametrize(
"in_slice,expected",
[
Expand All @@ -167,12 +175,12 @@ class TestSliceLocs:
(pd.IndexSlice["m":"m":-1], ""), # type: ignore[misc]
],
)
def test_slice_locs_negative_step(self, in_slice, expected):
index = Index(list("bcdxy"))
def test_slice_locs_negative_step(self, in_slice, expected, dtype):
index = Index(list("bcdxy"), dtype=dtype)

s_start, s_stop = index.slice_locs(in_slice.start, in_slice.stop, in_slice.step)
result = index[s_start : s_stop : in_slice.step]
expected = Index(list(expected))
expected = Index(list(expected), dtype=dtype)
tm.assert_index_equal(result, expected)

def test_slice_locs_dup(self):
Expand Down