Skip to content

REF: call _block_shape from EABlock.make_block #33308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 10, 2020
2 changes: 0 additions & 2 deletions pandas/core/internals/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
IntBlock,
ObjectBlock,
TimeDeltaBlock,
_block_shape,
_safe_reshape,
make_block,
)
Expand All @@ -34,7 +33,6 @@
"TimeDeltaBlock",
"_safe_reshape",
"make_block",
"_block_shape",
"BlockManager",
"SingleBlockManager",
"concatenate_block_managers",
Expand Down
28 changes: 11 additions & 17 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,8 @@ def make_block(self, values, placement=None) -> "Block":
"""
if placement is None:
placement = self.mgr_locs
if self.is_extension:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that is this necessary this (if), or are you saying that this is a perf issue? (e.g. we don't want to always call _block_shape

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM we are calling _block_shape in a bunch of places before calling ExtensionBlock.make_block. This is just moving all of those calls to one place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to always call _block_shape

With some corner-case-caveats, we only want to call it if self.is_extension, and we pretty much always have to call it if self.is_extension (otherwise we could use make_block_same_class

values = _block_shape(values, ndim=self.ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is this not a conditional check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a conditional check, but self.is_extension doesnt tell us anything about whether values is an EA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are calling .make_block from an EA block, don't we by definition know that we have extension block values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. e.g. if we call pd.isna(frame) that will end up with something like:

res_values = pd.isna(block.values)
return block.make_block(res_values)

which will be ndarray[bool]


return make_block(values, placement=placement, ndim=self.ndim)

Expand Down Expand Up @@ -355,13 +357,12 @@ def _split_op_result(self, result) -> List["Block"]:
nbs = []
for i, loc in enumerate(self.mgr_locs):
vals = result[i]
nv = _block_shape(vals, ndim=self.ndim)
block = self.make_block(values=nv, placement=[loc])
block = self.make_block(values=vals, placement=[loc])
nbs.append(block)
return nbs

if not isinstance(result, Block):
result = self.make_block(values=_block_shape(result, ndim=self.ndim))
result = self.make_block(result)

return [result]

Expand Down Expand Up @@ -1277,9 +1278,6 @@ def take_nd(self, indexer, axis: int, new_mgr_locs=None, fill_tuple=None):
def diff(self, n: int, axis: int = 1) -> List["Block"]:
""" return block for the diff of the values """
new_values = algos.diff(self.values, n, axis=axis, stacklevel=7)
# We use block_shape for ExtensionBlock subclasses, which may call here
# via a super.
new_values = _block_shape(new_values, ndim=self.ndim)
return [self.make_block(values=new_values)]

def shift(self, periods: int, axis: int = 0, fill_value=None):
Expand Down Expand Up @@ -2267,7 +2265,7 @@ def concat_same_type(self, to_concat):
values = values.astype(object, copy=False)
placement = self.mgr_locs if self.ndim == 2 else slice(len(values))

return self.make_block(_block_shape(values, self.ndim), placement=placement)
return self.make_block(values, placement=placement)
return super().concat_same_type(to_concat)

def fillna(self, value, limit=None, inplace=False, downcast=None):
Expand Down Expand Up @@ -2456,7 +2454,6 @@ def f(mask, val, idx):
# TODO: allow EA once reshape is supported
values = values.reshape(shape)

values = _block_shape(values, ndim=self.ndim)
return values

if self.ndim == 2:
Expand Down Expand Up @@ -2738,9 +2735,7 @@ def concat_same_type(self, to_concat):
)
placement = self.mgr_locs if self.ndim == 2 else slice(len(values))
# not using self.make_block_same_class as values can be object dtype
return self.make_block(
_block_shape(values, ndim=self.ndim), placement=placement
)
return self.make_block(values, placement=placement)

def replace(
self,
Expand Down Expand Up @@ -2859,16 +2854,15 @@ def _extend_blocks(result, blocks=None):
return blocks


def _block_shape(values, ndim=1, shape=None):
def _block_shape(values: ArrayLike, ndim: int = 1) -> ArrayLike:
""" guarantee the shape of the values to be at least 1 d """
if values.ndim < ndim:
if shape is None:
shape = values.shape
if not is_extension_array_dtype(values):
# TODO: https://github.com/pandas-dev/pandas/issues/23023
shape = values.shape
if not is_extension_array_dtype(values.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? (as you are calling the check in the block constructor).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we get a bunch of test failures without this check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i think we DO want to remove this from inside block_shape though, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the whole point of block_shape, im not clear on what you're suggesting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a check IN the Block constructor AND in the _block_shape function itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have 1 or the other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we dont have a reshape call in the constructor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously, i think we should have 2D EAs and not have to hassle with any of this

# TODO(EA2D): https://github.com/pandas-dev/pandas/issues/23023
# block.shape is incorrect for "2D" ExtensionArrays
# We can't, and don't need to, reshape.
values = values.reshape(tuple((1,) + shape))
values = values.reshape(tuple((1,) + shape)) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OK to use cast after using is_* to avoid the #type: ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we now using cast instead of type: ignore? i went with the latter bc its easier to grep for and out on a list of Things To Fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pandas.pydata.org/docs/development/contributing.html#style-guidelines specifically mentions the is_* case and suggests a refactor is the preferred route.

i went with the latter bc its easier to grep for and out on a list of Things To Fix

for me, this is OK. but I believe Jeff and Will oppose type:ignore as a reason to revist later. In the past, i've been in the minority and struggled to get PRs accepted. If you can see the benefits of a fix later strategy with type hints, now could be a good time to resurface this discussion.

pyre accepts this is a reasonable need for gradual typing, see https://pyre-check.org/docs/error-suppression.html#suppression-comment-types. it has pyre-fixme and pyre-ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel #32785 was opened for this discussion. see #32730 (comment) cc @jorisvandenbossche

return values


Expand Down