Skip to content

ENH: Improve replace lazy copy handling #51658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 27, 2023
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Feb 26, 2023

@lithomas1
Copy link
Member

lithomas1 commented Feb 27, 2023

Can you consolidate the inplace replace with no refs and inplace replace with refs tests by parametrizing?

Would make it a little easier to see implemented so far/missing.

@phofl
Copy link
Member Author

phofl commented Feb 27, 2023

done

@phofl phofl added this to the 2.0 milestone Feb 27, 2023
nb = self.astype(np.dtype(object), copy=False)
if nb is self and not inplace:
has_ref = self.refs.has_reference()
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow)
Copy link
Member

@lithomas1 lithomas1 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy=False doesn't do anything here right since astyping (to object) will always make a copy?

if copy or is_object_dtype(arr.dtype) or is_object_dtype(dtype):
# Explicit copy, or required since NumPy can't view from / to object.
return arr.astype(dtype, copy=True)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this can create a view, would have to step through the code paths but discovered a bug when this is a view

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked into this too closely, but I guess this would be a view when going from object -> object.

Happy to let optimizing this be a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is the case where I was able to produce the bug. Run time wise this is more or less optimal right now, the actual checks are performed within astype anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the nb.copy() below. I'm a little worried that that might double copy if astype already copied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this can still happen, but Astype should take care of returning a block without references if a copy already happened (check is not perfect yet though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

assert not np.shares_memory(get_array(df, "a"), arr_a)
assert df._mgr._has_no_reference(0)
assert view._mgr._has_no_reference(0)
if isinstance(to_replace, list) and len(to_replace) == 0:
Copy link
Member

@lithomas1 lithomas1 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to parametrize test_replace_emptylist for inplace=True/False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it to another test where it added less complexity

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending other comment.

nb = self.astype(np.dtype(object), copy=False)
if nb is self and not inplace:
has_ref = self.refs.has_reference()
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

@phofl phofl merged commit aeafdd6 into pandas-dev:main Feb 27, 2023
@phofl phofl deleted the cow_replace branch February 27, 2023 21:53
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 27, 2023
phofl added a commit that referenced this pull request Feb 27, 2023
…ndling) (#51679)

Backport PR #51658: ENH: Improve replace lazy copy handling

Co-authored-by: Patrick Hoefler <[email protected]>
if nb is self and not inplace:
has_ref = self.refs.has_reference()
nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow)
if (nb is self or using_cow) and not inplace:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the nb is self here ever possible?

nb = self.astype(np.dtype(object), copy=False, using_cow=using_cow)
if (nb is self or using_cow) and not inplace:
nb = nb.copy()
elif inplace and has_ref and nb.refs.has_reference():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC nb.refs.has_reference should only be True if we are already object dtype?

@@ -811,17 +821,24 @@ def _replace_coerce(
if value is None:
# gh-45601, gh-45836, gh-46634
if mask.any():
nb = self.astype(np.dtype(object), copy=False)
if nb is self and not inplace:
has_ref = self.refs.has_reference()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC checking self.refs.has_reference here vs a few lines down can have different results in the case where self.dtype == object? does that make a difference?

if it doesnt make a difference, i think we can refactor this to share some code with some other CoW-handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants