Skip to content

ENH: Add lazy copy for swapaxes no op #50573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 12, 2023
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 4, 2023

@phofl
Copy link
Member Author

phofl commented Jan 4, 2023

Not totally sure what to do in the other copy case. This can return views in some cases, but copy is performed on the underlying array

if copy:
return self.copy()
return self
if not copy and copy is not None and not _using_copy_on_write():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not copy and copy is not None and not _using_copy_on_write():
if copy is False and not _using_copy_on_write():

@jorisvandenbossche
Copy link
Member

Not totally sure what to do in the other copy case. This can return views in some cases, but copy is performed on the underlying array

What I think we ideally should do is in case that self.values is a view on the underlying data (i.e. only if we have a single block I assume) and so the swapped values is also a view of the original data, in that case we should construct a new BlockManager with a ref to the original manager's block.

This can be done here using some internal APIs, but to do this properly, it might be easier to have this as a method on the Manager.

@phofl
Copy link
Member Author

phofl commented Jan 7, 2023

I've used ndarray_to_mgr for now, that's basically what we need here.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's a bit cleaner to move this logic into the internals (we will need to same logic in DataFrame.transpose() as well), but fine for me to keep it like this for now.

typ="block",
)
assert isinstance(new_mgr, BlockManager)
assert isinstance(self._mgr, BlockManager)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is in the if check above, not needed to repeat here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought so as well, but mypy disagrees

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah .. When do we use a cast and when an assert to deal with such issues?

Copy link
Member Author

@phofl phofl Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the assert here, but not sure if there is a general rule

Edit: I prefer assert not cast, stupid mistake in the initial post

@jorisvandenbossche
Copy link
Member

we will need to same logic in DataFrame.transpose() as well

Another thought here: now that we are limited to max 2 dimensions in NDFrame, isn't swapaxes just the same as transpose? (for the case i and j are not the same) We could then maybe also consolidate those implementations (eg swapaxes calling transpose when appropriate)

@phofl
Copy link
Member Author

phofl commented Jan 11, 2023

Another thought here: now that we are limited to max 2 dimensions in NDFrame, isn't swapaxes just the same as transpose? (for the case i and j are not the same) We could then maybe also consolidate those implementations (eg swapaxes calling transpose when appropriate)

Sounds good! But would take a look as follow up, the implementations are quite different and want to understand how this would impact stuff. I think transpose is better, as it keeps dtypes more often if possible

@phofl
Copy link
Member Author

phofl commented Jan 12, 2023

Merging for now, happy to address comments in a follow up if there are any

@phofl phofl merged commit 8b96ef2 into pandas-dev:main Jan 12, 2023
@phofl phofl deleted the cow_swapapxes branch January 12, 2023 21:30
phofl added a commit to phofl/pandas that referenced this pull request Jan 12, 2023
phofl added a commit to phofl/pandas that referenced this pull request Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants