-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: BlockManager.equals blockwise #35357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/internals/managers.py
Outdated
return False | ||
left = self.blocks[0].values | ||
right = other.blocks[0].values | ||
def blk_func(left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this module left
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this; make this a module level function
pandas/core/internals/ops.py
Outdated
@@ -34,21 +31,31 @@ def operate_blockwise( | |||
right_ea = not isinstance(rblk.values, np.ndarray) | |||
|
|||
lvals, rvals = _get_same_shape_values(blk, rblk, left_ea, right_ea) | |||
yield lvals, rvals, locs, left_ea, right_ea, rblk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe yield a NamedTuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there are issues with NamedTuple construction performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe wait a couple of days and use a dataclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there are issues with NamedTuple construction performance
really? can you point to something that shows this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be out of date but I think this is where i got the impression that NamedTuple has perf issues: https://lwn.net/Articles/731423/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i prefer readability here
unless it's way slower
pandas/core/internals/ops.py
Outdated
|
||
|
||
def blockwise_all(left: "BlockManager", right: "BlockManager", op) -> bool: | ||
for lvals, rvals, _, _, _, _ in _iter_block_pairs(left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string (also using a NamedTuple makes this easier to read here)
pandas/core/internals/managers.py
Outdated
return False | ||
left = self.blocks[0].values | ||
right = other.blocks[0].values | ||
def blk_func(left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this; make this a module level function
pandas/core/internals/ops.py
Outdated
) | ||
|
||
|
||
def _iter_block_pairs(left: "BlockManager", right: "BlockManager"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type the output
pandas/core/internals/ops.py
Outdated
def _iter_block_pairs(left: "BlockManager", right: "BlockManager"): | ||
def _iter_block_pairs( | ||
left: "BlockManager", right: "BlockManager" | ||
) -> Generator[BlockPairInfo, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterable (or Iterator) is adequate.
from https://mypy.readthedocs.io/en/stable/kinds_of_types.html#generators
A basic generator that only yields values can be annotated as having a return type of either Iterator[YieldType] or Iterable[YieldType].
pandas/core/internals/ops.py
Outdated
@@ -127,3 +134,17 @@ def blockwise_all(left: "BlockManager", right: "BlockManager", op) -> bool: | |||
if not res: | |||
return False | |||
return True | |||
|
|||
|
|||
def array_equals(left: ArrayLike, right: ArrayLike) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the inputs are ArrayLike
and not block related, would it make sense to co-locate this with array_equivalent
in core\dtypes\missing.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
updated+green |
pandas/core/dtypes/missing.py
Outdated
elif isinstance(left, ABCExtensionArray): | ||
return left.equals(right) | ||
elif isinstance(right, ABCExtensionArray): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this elif block is not needed since you already checked that the dtypes are equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will update
updated+green |
very nice |
No description provided.