Skip to content

ENH: Make shallow copy for align nocopy with CoW #50917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 27, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 21, 2023

xref #49473

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Comment on lines +9498 to +9501
if uses_cow:
left = self.copy(deep=copy)
else:
left = self.copy(deep=copy) if copy or copy is None else self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider just doing left = self.copy(deep=copy) in both cases. In general, we always return new objects from methods even with copy=False (with swapaxes being another exception).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, but would prefer to keep things separate. We have a couple of other methods where we return self in case of len(index)==0 for example, would like to change them all together if we want to.

Can put up a pr for this first before merging here if you prefer

Comment on lines 5430 to +5433
if (copy or copy is None) and new_data is self._mgr:
new_data = new_data.copy(deep=copy)
elif using_copy_on_write() and new_data is self._mgr:
new_data = new_data.copy(deep=copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (copy or copy is None) and new_data is self._mgr:
new_data = new_data.copy(deep=copy)
elif using_copy_on_write() and new_data is self._mgr:
new_data = new_data.copy(deep=copy)
if (copy or copy is None or using_copy_on_write()) and new_data is self._mgr:
new_data = new_data.copy(deep=copy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an explicit test that covers this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer my version, this is easier to read imo. The conditions are nicely aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The align copy false test covers this

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an explicit test that covers this?

Sorry, that wasn't clear ;) This change is in _reindex_with_indexers, which is used by align, but is also used through reindex? (I assumed that align would use it through reindex, but that's actually not the case I see now, it is using it directly) And so I was wondering if this change is also needed for correct behaviour of reindex (and so if we missed something there), and if so, it would be good to also have a test for reindex that covers this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got you.
We can't get there through reindex, because we have early exits preventing us from reaching _reindex_with_indexers if both axes are equal to the object that is reindexed.

I also looked into avoiding this function call when using align, but there is nothing to be gained there. It makes more sense covering this here.

@@ -1137,3 +1137,38 @@ def test_isetitem(using_copy_on_write):
assert np.shares_memory(get_array(df, "c"), get_array(df2, "c"))
else:
assert not np.shares_memory(get_array(df, "c"), get_array(df2, "c"))


def test_align_copy_false(using_copy_on_write):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already tests for align in this file, can you move them together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@phofl phofl added this to the 2.0 milestone Jan 27, 2023
@phofl
Copy link
Member Author

phofl commented Jan 27, 2023

Merging, will open a separate pr for return self

@phofl phofl merged commit 7674874 into pandas-dev:main Jan 27, 2023
@phofl phofl deleted the cow_align_nocopy branch January 27, 2023 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants