PERF: Series.str.get for pyarrow-backed strings #53152

lukemanley · 2023-05-09T10:43:27Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.1.0.rst file if fixing a bug or adding a new feature.

import pandas as pd
import pandas._testing as tm
import pyarrow as pa

N = 1_000_000

ser = pd.Series(tm.makeStringIndex(N), dtype=pd.ArrowDtype(pa.string()))

%timeit ser.str.get(1)

# 70.1 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  -> main
# 37.7 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  -> PR

mroeschke · 2023-05-09T16:54:46Z

sort whatsnew entries alphabetically....................................................................Failed
- hook id: sort-whatsnew-items
- duration: 0.04s
- exit code: 1
- files were modified by this hook
pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.
To run `pre-commit` as part of git workflow, use `pre-commit install`.
All changes made by hooks:
diff --git a/doc/source/whatsnew/v2.1.0.rst b/doc/source/whatsnew/v2.1.0.rst
index 44ba2ac..a7ad001 100644
--- a/doc/source/whatsnew/v2.1.0.rst
+++ b/doc/source/whatsnew/v2.1.0.rst
@@ -290,10 +290,10 @@ Performance improvements
 - Performance improvement in :meth:`.DataFrameGroupBy.groups` (:issue:`53088`)
 - Performance improvement in :meth:`DataFrame.loc` when selecting rows and columns (:issue:`53014`)
 - Performance improvement in :meth:`Series.corr` and :meth:`Series.cov` for extension dtypes (:issue:`52502`)
+- Performance improvement in :meth:`Series.str.get` for pyarrow-backed strings (:issue:`53152`)
 - Performance improvement in :meth:`Series.to_numpy` when dtype is a numpy float dtype and ``na_value`` is ``np.nan`` (:issue:`52430`)
 - Performance improvement in :meth:`~arrays.ArrowExtensionArray.to_numpy` (:issue:`52525`)
 - Performance improvement when doing various reshaping operations on :class:`arrays.IntegerArrays` & :class:`arrays.FloatingArray` by avoiding doing unnecessary validation (:issue:`53013`)
-- Performance improvement in :meth:`Series.str.get` for pyarrow-backed strings (:issue:`53152`)
 -

mroeschke · 2023-05-10T00:58:44Z

Thanks @lukemanley

* PERF: Series.str.get for pyarrow-backed strings * whatsnew * whatsnew * whatsnew

PERF: Series.str.get for pyarrow-backed strings

e16f629

lukemanley added Performance Memory or execution speed performance Arrow pyarrow functionality labels May 9, 2023

lukemanley added 2 commits May 9, 2023 06:43

whatsnew

a9dc545

whatsnew

7f2907b

lukemanley added 2 commits May 9, 2023 17:29

Merge remote-tracking branch 'upstream/main' into perf-arrow-str-get

0bcd4a2

whatsnew

6dcd717

mroeschke added this to the 2.1 milestone May 10, 2023

mroeschke approved these changes May 10, 2023

View reviewed changes

mroeschke merged commit a90fbc8 into pandas-dev:main May 10, 2023

Rylie-W pushed a commit to Rylie-W/pandas that referenced this pull request May 19, 2023

PERF: Series.str.get for pyarrow-backed strings (pandas-dev#53152)

bbf5614

* PERF: Series.str.get for pyarrow-backed strings * whatsnew * whatsnew * whatsnew

lukemanley deleted the perf-arrow-str-get branch May 30, 2023 22:16

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023

PERF: Series.str.get for pyarrow-backed strings (pandas-dev#53152)

eb42a9d

* PERF: Series.str.get for pyarrow-backed strings * whatsnew * whatsnew * whatsnew

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Series.str.get for pyarrow-backed strings #53152

PERF: Series.str.get for pyarrow-backed strings #53152

Uh oh!

lukemanley commented May 9, 2023

Uh oh!

mroeschke commented May 9, 2023

Uh oh!

mroeschke commented May 10, 2023

Uh oh!

Uh oh!

Uh oh!

PERF: Series.str.get for pyarrow-backed strings #53152

PERF: Series.str.get for pyarrow-backed strings #53152

Uh oh!

Conversation

lukemanley commented May 9, 2023

Uh oh!

mroeschke commented May 9, 2023

Uh oh!

mroeschke commented May 10, 2023

Uh oh!

Uh oh!