-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug pyarrow implementation of str.fullmatch matches partial string. issue #56652 #56691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ssue-pandas-dev#56652 changed array.py: Makes Series(["abc$abc]).str.fullmatch("abc\\$") give the same result as when dtype = pyarow.string as opposed to dtype = str. Issue reporter (Issue-pandas-dev#56652) requested change "//$" to "\$", but this resulted in DepreciationWarnng, so used "\\$" instead. Change test_arrow.py: updated test_str_fullmatch to account for edge cases where string starts with and ends with literal $ as well as ends with the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
Grepping the codebase for '//\$'
gives
./pandas/core/arrays/arrow/array.py
./pandas/core/arrays/string_arrow.py
Testing further - it looks like the first affects dtype=pd.ArrowDtype(pa.string())
while the second one affects dtype="string[pyarrow]"
. The fix should be applied to both of the above files.
Also while the test method states test_str_fullmatch
it seems to be testing ser.str.match
instead. The fullmatch tests are present in ./pandas/tests/strings/test_find_replace.py
Thanks @asishm , I can shortly make a commit which makes the changes in To clarify, from testing, the changes to ./pandas/core/arrays/arrow/array.py also affect |
Changed: string_arrow: because it shoud be conistent with other changes. Affects circumstance where dtype="string[pyarrow]" as opposed to dtype=pd.ArrowDtype(pa.string()) Changed: test_find_replace.py: Added unit test for the edge cases where the user wants o search for a string that ends in a literal dollar sign. The string_arrow.oy updates effect this also. Question: For consistency, I formatted test_fullmatch_dollar_literal() to match other unit tests in .py - Should I instead aim to implement @pytest.mark.parametrize, which is in consistent, but more in line with developer guidelines?
Updated PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks could you add a whatsnew note in v2.2.0.rst
under the Strings
section?
Thanks @mroeschke that update to |
…ntation-of-str.fullmatch-matches-partial-string.-Issue-pandas-dev#56652
Resolved merge conflict by rebasing off upstream main. |
Thanks @JackCollins91 |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
…atch matches partial string. issue pandas-dev#56652
….fullmatch matches partial string. issue #56652) (#56715) Backport PR #56691: Bug pyarrow implementation of str.fullmatch matches partial string. issue #56652 Co-authored-by: JackCollins91 <[email protected]>
Changed array.py: Makes Series(["abc$abc]).str.fullmatch("abc\$") give the same result as when dtype = pyarrow.string as opposed to dtype = str.
Issue reporter (Issue-#56652) requested change "//$" to "\$", but this resulted in DepreciationWarnng in pytest, so used "\\$" instead.
Change test_arrow.py: updated test_str_fullmatch to account for edge cases where string starts with and ends with literal $ as well as ends with the string.