-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Improve replace lazy copy handling #51658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -714,6 +714,8 @@ def replace_list( | |||||||
(x, y) for x, y in zip(src_list, dest_list) if self._can_hold_element(x) | ||||||||
] | ||||||||
if not len(pairs): | ||||||||
if using_cow: | ||||||||
return [self.copy(deep=False)] | ||||||||
# shortcut, nothing to replace | ||||||||
return [self] if inplace else [self.copy()] | ||||||||
|
||||||||
|
@@ -734,8 +736,9 @@ def replace_list( | |||||||
masks = [extract_bool_array(x) for x in masks] | ||||||||
|
||||||||
if using_cow and inplace: | ||||||||
# TODO(CoW): Optimize | ||||||||
rb = [self.copy()] | ||||||||
# Don't set up refs here, otherwise we will think that we have | ||||||||
# references when we check again later | ||||||||
rb = [self] | ||||||||
else: | ||||||||
rb = [self if inplace else self.copy()] | ||||||||
for i, (src, dest) in enumerate(pairs): | ||||||||
|
@@ -762,10 +765,16 @@ def replace_list( | |||||||
mask=m, # type: ignore[arg-type] | ||||||||
inplace=inplace, | ||||||||
regex=regex, | ||||||||
using_cow=using_cow, | ||||||||
) | ||||||||
if convert and blk.is_object and not all(x is None for x in dest_list): | ||||||||
# GH#44498 avoid unwanted cast-back | ||||||||
result = extend_blocks([b.convert(copy=True) for b in result]) | ||||||||
result = extend_blocks( | ||||||||
[ | ||||||||
b.convert(copy=True and not using_cow, using_cow=using_cow) | ||||||||
for b in result | ||||||||
] | ||||||||
) | ||||||||
new_rb.extend(result) | ||||||||
rb = new_rb | ||||||||
return rb | ||||||||
|
@@ -778,6 +787,7 @@ def _replace_coerce( | |||||||
mask: npt.NDArray[np.bool_], | ||||||||
inplace: bool = True, | ||||||||
regex: bool = False, | ||||||||
using_cow: bool = False, | ||||||||
) -> list[Block]: | ||||||||
""" | ||||||||
Replace value corresponding to the given boolean array with another | ||||||||
|
@@ -811,17 +821,24 @@ def _replace_coerce( | |||||||
if value is None: | ||||||||
# gh-45601, gh-45836, gh-46634 | ||||||||
if mask.any(): | ||||||||
nb = self.astype(np.dtype(object), copy=False) | ||||||||
if nb is self and not inplace: | ||||||||
has_ref = self.refs.has_reference() | ||||||||
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
pandas/pandas/core/dtypes/astype.py Lines 136 to 138 in 29140d4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No this can create a view, would have to step through the code paths but discovered a bug when this is a view There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Haven't looked into this too closely, but I guess this would be a view when going from object -> object. Happy to let optimizing this be a follow up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah this is the case where I was able to produce the bug. Run time wise this is more or less optimal right now, the actual checks are performed within astype anyway There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was referring to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah this can still happen, but Astype should take care of returning a block without references if a copy already happened (check is not perfect yet though) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for clarifying. |
||||||||
if (nb is self or using_cow) and not inplace: | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the |
||||||||
nb = nb.copy() | ||||||||
elif inplace and has_ref and nb.refs.has_reference(): | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC nb.refs.has_reference should only be True if we are already object dtype? |
||||||||
# no copy in astype and we had refs before | ||||||||
nb = nb.copy() | ||||||||
putmask_inplace(nb.values, mask, value) | ||||||||
return [nb] | ||||||||
if using_cow: | ||||||||
return [self.copy(deep=False)] | ||||||||
return [self] if inplace else [self.copy()] | ||||||||
return self.replace( | ||||||||
to_replace=to_replace, | ||||||||
value=value, | ||||||||
inplace=inplace, | ||||||||
mask=mask, | ||||||||
using_cow=using_cow, | ||||||||
) | ||||||||
|
||||||||
# --------------------------------------------------------------------- | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC checking self.refs.has_reference here vs a few lines down can have different results in the case where self.dtype == object? does that make a difference?
if it doesnt make a difference, i think we can refactor this to share some code with some other CoW-handling