-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Avoid copying unnecessary columns in setitem by splitting blocks for CoW #51031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Conflicts: # pandas/core/internals/managers.py # pandas/tests/internals/test_internals.py
# Conflicts: # pandas/core/internals/managers.py
I think this is ready, will look into the single block case separately. The split allows us to only copy columns that are modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked (previously) at the _iset_split_block
logic in detail, but a few minor comments/questions
) | ||
def test_set_value_loc_copy_only_necessary_column(using_copy_on_write, method, val): | ||
# Only copy column that is modified, multi-block only for now | ||
df = DataFrame({"a": [1, 2, 3], "b": 1, "c": 1.5}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df = DataFrame({"a": [1, 2, 3], "b": 1, "c": 1.5}) | |
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]}) |
(small nitpick, but almost every test in this file is already using that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
values = values.copy() | ||
else: | ||
values = values[[blk_loc]] | ||
self._iset_split_block(blkno, [blk_loc], values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to call this for the values.ndim == 1
case that already copied the values, or does this just also does the "wrapping in block and updating self.blocks" for that case?
And _iset_split_block
assumes that values
is always already copied, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and Yes
Not calling it would duplicate the logic
def test_set_value_loc_copy_only_necessary_column(using_copy_on_write, method, val): | ||
# Only copy column that is modified, multi-block only for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_set_value_loc_copy_only_necessary_column(using_copy_on_write, method, val): | |
# Only copy column that is modified, multi-block only for now | |
def test_set_value_copy_only_necessary_column(using_copy_on_write, method, val): | |
# When setting inplace, only copy column that is modified instead of the whole | |
# block (by splitting the block) | |
# TODO multi-block only for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
df_orig = df.copy() | ||
view = df[:] | ||
|
||
df.loc[0, "a"] = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use method
? Hmm, but that doesn't work for setitem. So you will have to parametrize with tuples of (method, indexer), so you can do method(df)[indexer] = val
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's probably why I missed it. That's a pita...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the 2 inline suggestions, looks good to me!
xref #48998
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.