Skip to content

ENH: Add lazy copy to replace #50746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 17, 2023
Merged

ENH: Add lazy copy to replace #50746

merged 17 commits into from
Jan 17, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 14, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @lithomas1 as discussed on slack

value_getitem(val_locs), BlockPlacement(blk.mgr_locs[blk_locs])
)
if self.refs is not None:
self.refs.extend([self.refs[blkno_l]] * len(nbs_tup))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to clear the refs at self.refs[blkno_l] now, since first_nb should not have a reference block?

Copy link
Member Author

@phofl phofl Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for catching, must have removed the line accidentially

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted one of the tests to cover this

([0], np.array([-1, -2, -3], dtype=np.intp)),
([1], np.array([-1, -2, -3], dtype=np.intp)),
([5], np.array([-1, -2, -3], dtype=np.intp)),
([0, 1], np.array([-1, -2, -3], dtype=np.intp)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @lithomas1 I don't think that this is valid or can be reached in this way. I think the dimension of your values has to be the same as your indexer. This raises if blk.should_store_value is False, because it does not get broadcast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Even for the non CoW, the iset operation doesn't fail(strangely), but printing the DataFrame afterwards does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you get a block that has dimension 2 according to block placement but the underlying array has only one column, doesn’t really error when creating but as soon as you access it youll get into trouble :)

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lithomas1 lithomas1 added this to the 2.0 milestone Jan 16, 2023
@lithomas1 lithomas1 merged commit b0c0d8a into pandas-dev:main Jan 17, 2023
@lithomas1
Copy link
Member

thanks @phofl.

@phofl
Copy link
Member Author

phofl commented Jan 17, 2023

Thx to you as well, you did most of the work :)

@phofl phofl deleted the enh_cow_replace branch January 17, 2023 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants