Skip to content

CLN: ArrowStringArray._cmp_method - use ChunkedArray.to_numpy() #46417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2022

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Mar 18, 2022

Addresses minor TODO comment in ArrowStringArray._cmp_method.

Also, better perf:

import numpy as np
import pyarrow as pa

ca = pa.chunked_array([ 
    np.random.choice([0, 1], 1000),
    np.random.choice([0, 1], 1000),
], type=pa.bool_())

%timeit ca.to_pandas().values
45.9 µs ± 2.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit ca.to_numpy()
3.8 µs ± 273 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

@lukemanley lukemanley changed the title ArrowStringArray._cmp_method - use ChunkedArray.to_numpy() CLN: ArrowStringArray._cmp_method - use ChunkedArray.to_numpy() Mar 18, 2022
@lukemanley lukemanley added the Arrow pyarrow functionality label Mar 18, 2022
@simonjayhawkins simonjayhawkins added the Performance Memory or execution speed performance label Mar 18, 2022
@simonjayhawkins simonjayhawkins added this to the 1.5 milestone Mar 18, 2022
@jreback jreback merged commit d2478f5 into pandas-dev:main Mar 18, 2022
@lukemanley lukemanley deleted the arrowstringarray-todo branch March 20, 2022 23:18
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants