Skip to content

TST: test explicit copy keyword with CoW enabled #50536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 21, 2023

Conversation

jorisvandenbossche
Copy link
Member

xref #50535, part about that we need to test the copy keyword as well in those CoW tests.


if using_copy_on_write:
if (using_copy_on_write and copy is not True) or copy is False:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current PR experiments with combining testing the copy=True/False/None keyword in those tests that we have been adding to test the CoW behaviour of those methods.

Combining it would give the least duplication, but, I have to say that the if expressions are getting quite complicated and hard to follow ..
For example here if (using_copy_on_write and copy is not True) or copy is False and below if using_copy_on_write or copy is not False and if using_copy_on_write and copy is not True, all slightly different.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes this got complicated :) Do you think moving these checks into separate functions makes it easier to read?

@jorisvandenbossche
Copy link
Member Author

That depends a bit on what exactly we would be testing. Maybe it is enough to restrict ourselves to just the method call, and have an additional test like this (for this example in addition to test_rename_columns):

@pytest.mark.parametrize("copy", [True, None, False])
def test_rename_columns_copy(using_copy_on_write, copy):
    df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]})
    df2 = df.rename(columns=str.upper, copy=copy)

    if (using_copy_on_write and copy is not True) or copy is False:
        assert np.shares_memory(get_array(df2, "A"), get_array(df, "a"))
    else:
        assert not np.shares_memory(get_array(df2, "A"), get_array(df, "a"))

In that case it would probably keep it generally simpler (you still have the complex if statement here, but only once). I think it is not necessarily needed to also test the actual modification step (df2.iloc[0, 0] = 0) and the resulting behaviour of that, with the different options of copy. If we test that a copy=True already results in a copy of the data after the method, then we know for sure that modifying the data will not update the original (so no need to test this explicitly?) Similarly for non-COW and copy=False, if we test that df2 shares data with df, then we know that updating df2 might update df (and that's not necessarily something to repeatedly test here, as it is strictly speaking not related to the actual method calls like rename)

So yes, if we think the above test is sufficient for parametrizing with copy (in constrast to the combined test right now in the diff), then it will be simpler to have separate tests.

@jorisvandenbossche
Copy link
Member Author

@phofl thoughts? Shall I do the separate test?

@phofl
Copy link
Member

phofl commented Jan 11, 2023

I'll take a close look later today

@phofl phofl self-requested a review January 11, 2023 13:43
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just a quick question: Do you think it's possible to test the copy=True case in one test and parametrising over all functions? I guess it would be sufficient that the objects don't share memory here without checking the actual result? I think this would be easier to read.

Just asking if you think this is feasible

@jorisvandenbossche
Copy link
Member Author

Yeah, my example test above (#50536 (comment)) only does the memory check, so indeed we could parametrize that for several methods (to the extent they keep the same column names / structure, to easily refer to the column to check)

@jorisvandenbossche
Copy link
Member Author

@phofl added a parametrized test. I think I added all methods of the list in #50535 that fit in a single test function. align, swapaxes/transpose, merge, concat and to_numpy will have to be tested separately.

@phofl
Copy link
Member

phofl commented Jan 16, 2023

Looks good, can we add swapaxes doing a no-op?

@phofl phofl merged commit d1b7244 into pandas-dev:main Jan 21, 2023
@phofl
Copy link
Member

phofl commented Jan 21, 2023

Thx @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the cow-copy-keyword branch January 21, 2023 09:06
pooja-subramaniam pushed a commit to pooja-subramaniam/pandas that referenced this pull request Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants