Skip to content

[ArrowStringArray] PERF: use pyarrow.compute.replace_substring(_regex) if available #41590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 26, 2021

Conversation

simonjayhawkins
Copy link
Member

[ 50.00%] ··· strings.Methods.time_replace                                                                                                                ok
[ 50.00%] ··· ============== ==========
                  dtype                
              -------------- ----------
                   str        20.6±0ms 
                  string      16.3±0ms 
               arrow_string   1.83±0ms 
              ============== ==========

happy to break up if easier to review.

@simonjayhawkins simonjayhawkins added Performance Memory or execution speed performance Strings String extension data type and string data labels May 20, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 20, 2021
@simonjayhawkins
Copy link
Member Author

happy to break up if easier to review.

will do that and mark this as draft for now.

@simonjayhawkins
Copy link
Member Author

closes #41602 , will break off once other pre-cursors merged

@simonjayhawkins simonjayhawkins linked an issue May 21, 2021 that may be closed by this pull request
3 tasks
@simonjayhawkins
Copy link
Member Author

closes #41602 , will break off once other pre-cursors merged

opened #41606, after that is merged, this PR is just the addition of the method on ArrowStringArray

@simonjayhawkins simonjayhawkins changed the title [ArrowStringArray] use pyarrow.compute.replace_substring(_regex) if available [ArrowStringArray] PERF: use pyarrow.compute.replace_substring(_regex) if available May 21, 2021
@simonjayhawkins
Copy link
Member Author

opened #41606, after that is merged, this PR is just the addition of the method on ArrowStringArray

will close till then to help clear the queue.

@simonjayhawkins simonjayhawkins marked this pull request as ready for review May 25, 2021 14:43
@jreback jreback merged commit 43d450a into pandas-dev:master May 26, 2021
@jreback
Copy link
Contributor

jreback commented May 26, 2021

@simonjayhawkins at some point should have a whatsnew sub-section about the new arrow perf string functions.

@simonjayhawkins
Copy link
Member Author

sure. I think should probably draw a line under the str accessor methods for now (this is perf only as str accessor methods all fallback if not implemented on the array, we have parameterised test for most) and concentrate on dtype/docs/skipped tests. Will do that in parallel with preparing for the rc.

@simonjayhawkins simonjayhawkins deleted the _str_replace branch May 26, 2021 08:21
@simonjayhawkins
Copy link
Member Author

@simonjayhawkins at some point should have a whatsnew sub-section about the new arrow perf string functions.

added to #39908

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: case argument ignored when regex is False in Series.str.replace
2 participants