Skip to content

PERF: StringArray from np.str_ array #49109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 18, 2022

Conversation

lukemanley
Copy link
Member

Perf improvement when constructing a StringArray from a numpy array with type np.str_.

       before           after         ratio
     [fb19ddbf]       [a4b0d732]
     <main>           <string-array-np-str>
-      23.0±0.8ms      6.15±0.04ms     0.27  array.StringArray.time_from_np_str_array
- ```

@lukemanley lukemanley added Performance Memory or execution speed performance Strings String extension data type and string data labels Oct 15, 2022
@lukemanley lukemanley changed the title PERF: StringArray from np.str_ PERF: StringArray from np.str_ array Oct 15, 2022
@@ -703,6 +703,10 @@ cpdef ndarray[object] ensure_string_array(
if copy and result is arr:
result = result.copy()

if util.is_array(arr) and issubclass(arr.dtype.type, np.str_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the np.aasarray call above guarantee util.is_array(arr) is always true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, you're right. I've removed the is_array check. thanks!

@mroeschke mroeschke added this to the 2.0 milestone Oct 18, 2022
@mroeschke mroeschke merged commit 62e05f5 into pandas-dev:main Oct 18, 2022
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the string-array-np-str branch October 26, 2022 10:18
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* perf: string array from np.str_

* add test

* whatsnew

* cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants