-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug: fix Series.str.split when 'regex=None' for series having 'pd.ArrowDtype(pa.string())' dtype #58418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: fix Series.str.split when 'regex=None' for series having 'pd.ArrowDtype(pa.string())' dtype #58418
Changes from 6 commits
0349837
4948784
14c059c
f55ca62
f0c2097
6f93a8d
b9b3197
d919957
5fd5bcb
a7f8af2
20e5ebe
14fd864
cd802ac
ab5f337
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -339,14 +339,8 @@ def _str_split( | |
new_pat: str | re.Pattern | ||
if regex is True or isinstance(pat, re.Pattern): | ||
new_pat = re.compile(pat) | ||
elif regex is False: | ||
new_pat = pat | ||
# regex is None so link to old behavior #43563 | ||
else: | ||
if len(pat) == 1: | ||
new_pat = pat | ||
else: | ||
new_pat = re.compile(pat) | ||
new_pat = pat | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we want to keep this, otherwise it would not be a bugfix for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed above #58418 (comment), the bug is caused by that this logic is not implemented in the Moreover I think the existing tests have covered all combinations of parameters, and as long as they all pass, the behaviors should still be the same. |
||
|
||
if isinstance(new_pat, re.Pattern): | ||
if n is None or n == -1: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually prefer to just add the
pd.ArrowDtype(pa.string())
to the existing string dtypes instead of copying and creating a new fixture. Guessing that causes a lot of other test failures?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot that
pd.ArrowDtype(pa.string())
was not actually in the fixture, so my suggestion lead you a bit in the wrong way. Sorry!Right now adding this to the main
any_string_dtype
fixture will indeed give quite some failures, yes. I agree that it might be better to actually do that (and it would be interesting to see which tests actually fail), but that's for another PR / out of scope for this bug fix (doing so would also require removing some tests are now only exist for the arrow string dtype, to not keep things duplicated).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a lot of other tests need to be adjusted if adding ArrowDtype to the fixture.
So for this PR, should I just add test in
pandas/tests/extension/test_arrow.py
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #58495 so we can track the larger issue