Skip to content

API/BUG: obj.str.slice(start:None:-1) #59710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Sep 4, 2024 · 2 comments · Fixed by #59724
Closed

API/BUG: obj.str.slice(start:None:-1) #59710

jbrockmendel opened this issue Sep 4, 2024 · 2 comments · Fixed by #59724
Labels
Arrow pyarrow functionality Bug Strings String extension data type and string data
Milestone

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Sep 4, 2024

import pandas as pd
import pyarrow as pa

ser = pd.Series(["aafootwo", "aabartwo", pd.NA, "aabazqux"], dtype="string[pyarrow]")
ser2 = ser.astype(pd.ArrowDtype(pa.string()))

>>> ser.str.slice(None, None, -1)
0    owtoofaa
1    owtrabaa
2        <NA>
3    xuqzabaa
dtype: string

>>> ser2.str.slice(None, None, -1)
0       a
1       a
2    <NA>
3       a
dtype: string[pyarrow]

This looks to me like the ArrowEA version is wrong, can @mroeschke or @jorisvandenbossche confirm this is not intentional?

The ArrowEA version's only test is in test_str_slice in tests/extension/test_arrow.py, has no test cases with either stop=None or step<0. The StringDtype version has a test_slice in tests/strings/test_strings.py that has a bit better coverage.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2024
@mroeschke
Copy link
Member

Looks like ArrowStringArray falls back for the stop=None case compared to ArrowEA.

Both dispatch to https://arrow.apache.org/docs/python/generated/pyarrow.compute.utf8_slice_codeunits.html though, and I don't know if it's supposed to support negative steps. If not, I would say the ArrowEA._str_slice has a bug and it should have logic to swap the start and stop based on the sign on step

@mroeschke mroeschke added Strings String extension data type and string data Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 5, 2024
@jorisvandenbossche
Copy link
Member

I don't know if it's supposed to support negative steps.

According to the docstring you linked, it does. Trying it out, if you pass explicit start/stop it seems to work:

In [15]: pc.utf8_slice_codeunits(["aafootwo"], start=5, stop=3, step=-1)
Out[15]: 
<pyarrow.lib.StringArray object at 0x7fd9cd9ce200>
[
  "to"
]

In [16]: "aafootwo"[5:3:-1]
Out[16]: 'to'

So it might be our translation of the passed user-passed arguments to pyarrow. Because with pyarrow, you need to specify some start value, you cannot do slice(none, None, -1) as in python. But explicitly saying to start at the end in that case seems to work again:

In [18]: "aafootwo"[None:None:-1]
Out[18]: 'owtoofaa'

In [19]: pc.utf8_slice_codeunits(["aafootwo"], start=-1, stop=None, step=-1)
Out[19]: 
<pyarrow.lib.StringArray object at 0x7fd9d7eadd20>
[
  "owtoofaa"
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants