-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Implement str.extract for ArrowDtype #56334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
result = ser.str.extract(r"[ab](?P<digit>\d)", expand=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add coverage for when there are multiple groups and expand=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I parametrized the above test with expand=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the expand=False
case only produce a series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily:
https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html
If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK - sorry I misremembered how that works. I was somewhat expecting expand=False
here to just return the struct array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I'm OK with this now too as it is definitely consistent with our current API. There may be a future where we actuallly want it to return a pyarrow struct but can always come back and do that later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed. For arrow types, I think in the future expand=False
should return a struct typed Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
str.extract
Method Not Implemented forpd.ArrowDtype(pa.string())
#56268 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.