-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: dont consolidate in BlockManager.equals #34962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, we no longer use the Block.equals methods?
return False | ||
elif isinstance(left, ExtensionArray): | ||
if not left.equals(right): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be return left.equals(right)
?
And same for array_equivalent
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, wait, not here. this is an "all" check, so we only return on False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, it's inside the for loop ..
Probably, yes |
I think this is correct, will double-check and update. |
return all( | ||
block.equals(oblock) for block, oblock in zip(self_blocks, other_blocks) | ||
) | ||
if self.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give some comments here (or at least a comment about the ordering of the checks)
else: | ||
return array_equivalent(blk.values, rblk.values) | ||
|
||
for i in range(len(self.items)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added
gentle ping |
@@ -389,7 +389,7 @@ def test_copy(self, mgr): | |||
|
|||
# copy assertion we either have a None for a base or in case of | |||
# some blocks it is an array (e.g. datetimetz), but was copied | |||
assert cp_blk.equals(blk) | |||
tm.assert_equal(cp_blk.values, blk.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should equals also work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Block.equals is removed
* REF: dont consolidate in BlockManager.equals * doctest fixup * Remove Block.equals * simplify, comments
This had a large performance regression in https://pandas.pydata.org/speed/pandas/index.html#frame_methods.Equals.time_frame_float_equal?commits=77f4a4d673e2d2b3f7396a94818771d332bd913c. Is that expected? |
Some slowdown can probably expected for this case since it went from block-wise to column-wise (while having a consolidated block). But I think a large part of the slowdown is due to
and this is quite a bit faster:
|
I suppose that a similar "fastpath" could be added to Also for DatetimeTZBlock, there was an override of
|
I think the thing to do here is optimize array_equivalent. |
see also #32339 |
Avoid side-effects.
Side-note: with EA.equals recently added to the interface, should we update array_equivalent to use it? ATM array_equivalent starts by casting both inputs to ndarray.