Skip to content

PERF: Split blocks in blk.delete #50148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 9, 2023
Merged

PERF: Split blocks in blk.delete #50148

merged 17 commits into from
Jan 9, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 9, 2022

       before           after         ratio
     [a047fb6c]       [e8c3a52a]
-      1.16±0.01s         962±10μs     0.00  indexing.Setitem.time_setitem_list
-         382±7ms         302±10μs     0.00  indexing.Setitem.time_setitem

@phofl phofl marked this pull request as draft December 9, 2022 11:36
@phofl phofl marked this pull request as ready for review December 9, 2022 15:55
@phofl
Copy link
Member Author

phofl commented Dec 9, 2022

cc @jbrockmendel would you mind having a look here?

@jbrockmendel
Copy link
Member

will look over the weekend (applies to many many pings in my inbox)

@@ -1531,11 +1531,11 @@ def putmask(self, mask, new) -> list[Block]:

return [self]

def delete(self, loc) -> Block:
def delete(self, loc) -> list[Block]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 2D EAs we probably want to do the same thing as we do for NumpyBlock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any 2D EAs? I agree in general, but thought all of them were 1D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see NDArrayBackedExtensionBlock

new_blocks: list[Block] = []

previous_loc = -1
for idx in loc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reminds me of BlockManager._slice_take_blocks_ax0 with only_slice=True if we passed e.g. slice_or_indexer = np.delete(np.arange(length), loc)). is it worth trying to de-duplicate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, this seemed to do something different than what we need. Tried to get it to work a while back, but didn't get it to work

@mroeschke mroeschke added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Dec 13, 2022
@jbrockmendel
Copy link
Member

looks like some mypy complaints

@phofl
Copy link
Member Author

phofl commented Dec 16, 2022

Thx, fixed mypy

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @mroeschke

@lithomas1 lithomas1 mentioned this pull request Jan 7, 2023
5 tasks
@mroeschke mroeschke added this to the 2.0 milestone Jan 9, 2023
@mroeschke mroeschke merged commit e7e3676 into pandas-dev:main Jan 9, 2023
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the blk_delete branch January 12, 2023 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: performance regression in assigning value to df[col_name]
4 participants