Skip to content

ENH: Add lazy copy to replace #50746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 17, 2023
23 changes: 12 additions & 11 deletions pandas/tests/copy_view/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,15 @@ def test_switch_options():
@pytest.mark.parametrize(
"locs, arr",
[
([0], np.array([-1, -2, -3])),
([1], np.array([-1, -2, -3])),
([5], np.array([-1, -2, -3])),
([0, 1], np.array([-1, -2, -3])),
([0, 2], np.array([-1, -2, -3])),
([0, 1, 2], np.array([-1, -2, -3])),
([1, 2], np.array([-1, -2, -3])),
([1, 3], np.array([-1, -2, -3])),
([1, 3], np.array([[-1, -2, -3], [-4, -5, -6]]).T),
([0], np.array([-1, -2, -3], dtype=np.intp)),
([1], np.array([-1, -2, -3], dtype=np.intp)),
([5], np.array([-1, -2, -3], dtype=np.intp)),
([0, 1], np.array([-1, -2, -3], dtype=np.intp)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @lithomas1 I don't think that this is valid or can be reached in this way. I think the dimension of your values has to be the same as your indexer. This raises if blk.should_store_value is False, because it does not get broadcast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Even for the non CoW, the iset operation doesn't fail(strangely), but printing the DataFrame afterwards does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you get a block that has dimension 2 according to block placement but the underlying array has only one column, doesn’t really error when creating but as soon as you access it youll get into trouble :)

([0, 2], np.array([-1, -2, -3], dtype=np.intp)),
([0, 1, 2], np.array([-1, -2, -3], dtype=np.intp)),
([1, 2], np.array([-1, -2, -3], dtype=np.intp)),
([1, 3], np.array([-1, -2, -3], dtype=np.intp)),
([1, 3], np.array([[-1, -2, -3], [-4, -5, -6]], dtype=np.intp).T),
],
)
def test_iset_splits_blocks_inplace(using_copy_on_write, locs, arr):
Expand All @@ -122,9 +122,10 @@ def test_iset_splits_blocks_inplace(using_copy_on_write, locs, arr):
"c": [7, 8, 9],
"d": [10, 11, 12],
"e": [13, 14, 15],
"f": ["foo", "bar", "baz"],
}
},
dtype=np.intp,
)
df["f"] = ["a", "b", "c"]
df_orig = df.copy()
df2 = df.copy(deep=None) # Trigger a CoW (if enabled, otherwise makes copy)
df2._mgr.iset(locs, arr, inplace=True)
Expand Down