Skip to content

PERF: get_block_type #52109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 22, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 13 additions & 17 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2329,7 +2329,7 @@ def maybe_coerce_values(values: ArrayLike) -> ArrayLike:
return values


def get_block_type(dtype: DtypeObj):
def get_block_type(dtype: DtypeObj) -> type[Block]:
"""
Find the appropriate Block subclass to use for the given values and dtype.

Expand All @@ -2341,30 +2341,26 @@ def get_block_type(dtype: DtypeObj):
-------
cls : class, subclass of Block
"""
# We use kind checks because it is much more performant
# than is_foo_dtype
kind = dtype.kind

cls: type[Block]

if isinstance(dtype, SparseDtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the SparseDtype check may no longer be needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggested updates give a bit of an improvement to non-EA's as well:

import numpy as np
from pandas.core.internals.blocks import get_block_type

%timeit get_block_type(np.dtype('float64'))

# 724 ns ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)  -> main
# 590 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)    -> PR

# Need this first(ish) so that Sparse[datetime] is sparse
cls = ExtensionBlock
return ExtensionBlock
elif isinstance(dtype, DatetimeTZDtype):
cls = DatetimeTZBlock
return DatetimeTZBlock
elif isinstance(dtype, PeriodDtype):
cls = NDArrayBackedExtensionBlock
return NDArrayBackedExtensionBlock
elif isinstance(dtype, ExtensionDtype):
# Note: need to be sure PandasArray is unwrapped before we get here
cls = ExtensionBlock
return ExtensionBlock

elif kind in ["M", "m"]:
cls = DatetimeLikeBlock
# We use kind checks because it is much more performant
# than is_foo_dtype
kind = dtype.kind
if kind in ["M", "m"]:
return DatetimeLikeBlock
elif kind in ["f", "c", "i", "u", "b"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can improve a little bit here by checking kind in "fciub" instead of the list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

cls = NumericBlock
else:
cls = ObjectBlock
return cls
return NumericBlock

return ObjectBlock


def new_block_2d(
Expand Down