Skip to content

[ArrowStringArray] PERF: Series.str.get_dummies #41455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2021

Conversation

simonjayhawkins
Copy link
Member

planning to remove the padding code from _wrap_result eventually, but until then we can skip it when we return a integer array from get_dummies

adding tests and benchmarks as precursor to potential changes to _wrap_result #41372

have a working implementation for ArrowStringArray using pyarrow native functions but is slower than object fallback, so am leaving that for a followup.

       before           after         ratio
     [4ec6925c]       [091b0b02]
     <master>         <get_dummies>
-      2.58±0.02s         655±10ms     0.25  strings.Dummies.time_get_dummies('arrow_string')
-      2.58±0.03s          643±9ms     0.25  strings.Dummies.time_get_dummies('string')
-      2.59±0.07s          638±7ms     0.25  strings.Dummies.time_get_dummies('str')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@simonjayhawkins simonjayhawkins added Performance Memory or execution speed performance Strings String extension data type and string data labels May 13, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 13, 2021
@jreback jreback merged commit 3846040 into pandas-dev:master May 13, 2021
@jreback
Copy link
Contributor

jreback commented May 13, 2021

great

@simonjayhawkins simonjayhawkins deleted the get_dummies branch May 14, 2021 09:17
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants