-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API / CoW: return read-only numpy arrays in .values/to_numpy() #51082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API / CoW: return read-only numpy arrays in .values/to_numpy() #51082
Conversation
This needs some more tests, and currently also doesn't yet handle EAs (eg np.asarray(Series[Int64])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
request.node.add_marker( | ||
pytest.mark.xfail(reason="transpose doesn't yet do CoW") | ||
) | ||
assert (float_frame.values[5:10] != 5).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transpose if doing CoW now, I think you can remove the xfail (CI is failing as well here)
# 10264 | ||
df = DataFrame( | ||
np.zeros((5, 5), dtype="int64"), | ||
columns=["a", "b", "c", "d", "e"], | ||
index=range(5), | ||
) | ||
df["f"] = 0 | ||
df_orig = df.copy() | ||
# TODO(CoW) protect underlying values of being written to? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the TODO?
Couple smaller comments. I think this is something we should document explicitly. But can do as follow up I guesss |
I added an explicit test for the failing The problem with
And so the In this case if the |
…s in .values/to_numpy()
… arrays in .values/to_numpy()) (#51933) Backport PR #51082: API / CoW: return read-only numpy arrays in .values/to_numpy() Co-authored-by: Joris Van den Bossche <[email protected]>
cc @seberg @phofl my comment above at #51082 (comment) is the case that I was remembering where our change to read-only broke a legit case of calling a numpy function on a Series object: normally, the np.ceil call should make it writeable, but because this is wrapped again in a Series (since |
The In either case, since we call it along the way, you are right that it would be fine to add an |
Context: with the Copy-on-Write implementation (see overview follow up issue #48998), we can avoid that mutating one pandas object doesn't update another pandas object. But users can still easily get a viewing numpy array, and mutate that one. And at that point, we don't have any control over how this mutation propagates (it might update more objects than just the one from which the user obtained it, for example if other Series/DataFrames were sharing data with this object with CoW).
This is a draft PR just starting to explore this (see how much fails in our own test suite, to get an idea of the impact)