Skip to content

PERF: get_block_type #52109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 22, 2023
Merged

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Mar 21, 2023

cc @jbrockmendel - this may partly close #48212, however, I suspect the OP was referring to non-EA's given the old version of pandas.

Performance improvement is mostly for EA's where the .kind call can be a bottleneck.

import pyarrow as pa
import pandas as pd
from pandas.core.internals.blocks import get_block_type

%timeit get_block_type(pd.ArrowDtype(pa.float64()))
# 3.51 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)    <- main
# 740 ns ± 5.19 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  <- PR

%timeit get_block_type(pd.Float64Dtype())
# 1.3 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  <- main
# 289 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)   <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Mar 21, 2023
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# than is_foo_dtype
kind = dtype.kind
if kind in ["M", "m"]:
return DatetimeLikeBlock
elif kind in ["f", "c", "i", "u", "b"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can improve a little bit here by checking kind in "fciub" instead of the list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

kind = dtype.kind

cls: type[Block]

if isinstance(dtype, SparseDtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the SparseDtype check may no longer be needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggested updates give a bit of an improvement to non-EA's as well:

import numpy as np
from pandas.core.internals.blocks import get_block_type

%timeit get_block_type(np.dtype('float64'))

# 724 ns ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  -> main
# 590 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)    -> PR

@jbrockmendel
Copy link
Member

ping on green

@lukemanley
Copy link
Member Author

ping on green

green - thanks

@jbrockmendel jbrockmendel merged commit 5c15588 into pandas-dev:main Mar 22, 2023
@jbrockmendel
Copy link
Member

thanks @lukemanley

@lukemanley lukemanley added this to the 2.1 milestone Mar 22, 2023
@lukemanley lukemanley deleted the perf-get-block-type branch April 18, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: get_block_type heavy use could benefit performance improvements
2 participants