Skip to content

PERF: Indexing with pyarrow timestamp & duration dtypes #53368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 24, 2023

Conversation

lukemanley
Copy link
Member

pyarrow timestamps:

import pandas as pd

N = 100_000

idx = pd.Index(range(N), dtype="timestamp[s][pyarrow]")
idx2 = idx[::2]

%timeit idx.get_indexer_for(idx2)

# 264 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)      -> main
# 1.1 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  -> PR

pyarrow durations:

import pandas as pd

N = 100_000

idx = pd.Index(range(N), dtype="duration[s][pyarrow]")
idx2 = idx[::2]

%timeit idx.get_indexer_for(idx2)

# 705 ms ± 21.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)        -> main
# 1.35 ms ± 83.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  -> PR

@lukemanley lukemanley added Datetime Datetime data dtype Performance Memory or execution speed performance Arrow pyarrow functionality labels May 24, 2023
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke added this to the 2.1 milestone May 24, 2023
@mroeschke mroeschke merged commit 8253d4e into pandas-dev:main May 24, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 27, 2023
…3368)

* PERF: Indexing with pyarrow timestamp & duration dtypes

* whatsnew
@lukemanley lukemanley deleted the perf-arrow-temporal-indexing branch May 30, 2023 22:16
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Jun 5, 2023
…3368)

* PERF: Indexing with pyarrow timestamp & duration dtypes

* whatsnew
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…3368)

* PERF: Indexing with pyarrow timestamp & duration dtypes

* whatsnew
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants