-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Improve replace lazy copy handling #51658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you consolidate the inplace replace with no refs and inplace replace with refs tests by parametrizing? Would make it a little easier to see implemented so far/missing. |
done |
nb = self.astype(np.dtype(object), copy=False) | ||
if nb is self and not inplace: | ||
has_ref = self.refs.has_reference() | ||
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy=False
doesn't do anything here right since astyping (to object) will always make a copy?
pandas/pandas/core/dtypes/astype.py
Lines 136 to 138 in 29140d4
if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype): | |
# Explicit copy, or required since NumPy can't view from / to object. | |
return arr.astype(dtype, copy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this can create a view, would have to step through the code paths but discovered a bug when this is a view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked into this too closely, but I guess this would be a view when going from object -> object.
Happy to let optimizing this be a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is the case where I was able to produce the bug. Run time wise this is more or less optimal right now, the actual checks are performed within astype anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the nb.copy()
below. I'm a little worried that that might double copy if astype already copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this can still happen, but Astype should take care of returning a block without references if a copy already happened (check is not perfect yet though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying.
assert not np.shares_memory(get_array(df, "a"), arr_a) | ||
assert df._mgr._has_no_reference(0) | ||
assert view._mgr._has_no_reference(0) | ||
if isinstance(to_replace, list) and len(to_replace) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to parametrize test_replace_emptylist
for inplace=True
/False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it to another test where it added less complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending other comment.
nb = self.astype(np.dtype(object), copy=False) | ||
if nb is self and not inplace: | ||
has_ref = self.refs.has_reference() | ||
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying.
…ndling) (#51679) Backport PR #51658: ENH: Improve replace lazy copy handling Co-authored-by: Patrick Hoefler <[email protected]>
if nb is self and not inplace: | ||
has_ref = self.refs.has_reference() | ||
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow) | ||
if (nb is self or using_cow) and not inplace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the nb is self
here ever possible?
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow) | ||
if (nb is self or using_cow) and not inplace: | ||
nb = nb.copy() | ||
elif inplace and has_ref and nb.refs.has_reference(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC nb.refs.has_reference should only be True if we are already object dtype?
@@ -811,17 +821,24 @@ def _replace_coerce( | |||
if value is None: | |||
# gh-45601, gh-45836, gh-46634 | |||
if mask.any(): | |||
nb = self.astype(np.dtype(object), copy=False) | |||
if nb is self and not inplace: | |||
has_ref = self.refs.has_reference() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC checking self.refs.has_reference here vs a few lines down can have different results in the case where self.dtype == object? does that make a difference?
if it doesnt make a difference, i think we can refactor this to share some code with some other CoW-handling
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.