Skip to content

ENH: Add lazy copy to align #50432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 28, 2022
Merged

ENH: Add lazy copy to align #50432

merged 2 commits into from
Dec 28, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 24, 2022

@mroeschke
Copy link
Member

Whatsnew entry?

@phofl
Copy link
Member Author

phofl commented Dec 27, 2022

We did not add entries for any of those, but can discuss this maybe in the next dev call and then add all at once?

@mroeschke
Copy link
Member

Sounds good. IMO I think we should add an entry for these methods that get a CoW benefit

@phofl
Copy link
Member Author

phofl commented Dec 27, 2022

Sounds good to me, we should also add a whatsnew in 1.5.0 explaining how to try it out. I’ll make prs for this in the coming days after compiling the prs that were already merged

@mroeschke mroeschke added this to the 2.0 milestone Dec 28, 2022
@mroeschke mroeschke merged commit 032d112 into pandas-dev:main Dec 28, 2022
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the cow_align branch December 28, 2022 19:05
@@ -9344,7 +9344,7 @@ def _align_series(
if is_series:
left = self._reindex_indexer(join_index, lidx, copy)
elif lidx is None or join_index is None:
left = self.copy() if copy else self
left = self.copy(deep=copy) if copy or copy is None else self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there are still cases where self is being returned? (if you manually specify copy=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: s1 = pd.Series([1, 2, 3])

In [2]: s2 = pd.Series([4, 5, 6])

In [3]: s11, s22 = s1.align(s2, copy=False)

In [4]: s11 is s1
Out[4]: True

In [5]: s11.iloc[0] = 10

In [6]: s1
Out[6]: 
0    10
1     2
2     3
dtype: int64

Of course, this is also because copy is already an existing keyword, so we don't just do self = self.copy(deep=None) under the hood as in other methods that in the past always returned a copy.
But for other methods that already have a copy keyword, I think for now we used copy=False to also mean shallow/lazy copy like copy=None with CoW enabled.
(it's also something we still need to discuss in general what to do with those copy keywords)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for other methods that already have a copy keyword, I think for now we used copy=False to also mean shallow/lazy copy like copy=None with CoW enabled.

Actually, that's not fully correct. It seems this depends on case by case depending on how it is implemented, and we didn't consistently cover this case. Will open a dedicated issue for this topic: #50535

if copy:
return self.copy()
if copy or copy is None:
return self.copy(deep=copy)
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants