-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Copy method does not make truly deep copies of dtype object arrays #12663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is not supported. pandas objects are stored in numpy arrays (generally) which if they have It would be expensive to do this. Not really even sure of a usecase for it; generally the actual data are scalar type data (e.g. float, int, string), not actual python objects themselves. This is an anti-pattern to store python objects here. I suppose you could add a note to the doc-string. |
Thanks, that's helpful, I understand why the copy method behaves as it does. I'm wondering if it could be made more clear in the documentation since the word "deep" seems at least slightly ambiguous here. I typically use a DataFrame for numbers, but sometimes I like to have a column that holds some other kind of meta-data in an object. This way I get all the benefits of indexing and mergeing and can keep the meta-data associated with the data (even if this effectively disables many of numpy's nice features and efficiencies) |
@agartland sure, a doc-string update would be fine. you can even put your example there to make it clear (maybe slim it down a bit). |
I added a note in the copy method docstring that should help. I didn't know where/how to add an example but here's a slimmed down version if you think its useful:
|
@agartland For me the original is affected in both cases. import copy
import pandas as pd
a = pd.Series([1, 'a', [4,5,6]])
b = a.copy(deep=True)
c = copy.deepcopy(a)
print(a)
"""Changes to the copy.deepcopy don't affect the original."""
c.loc[2].append(0)
print(a.loc[2], b.loc[2], c.loc[2])
"""Changes to the a.copy(deep=True) are reflected in the original."""
b.loc[2].append(-9)
print(a.loc[2], b.loc[2], c.loc[2]) Output:
Versions:
|
The
copy
method ofpd.Series
andpd.DataFrame
has a parameterdeep
which claims to Make a deep copy, i.e. also copy data. The example below seems to show that this isn't a truly deep copy (as infrom copy import deepcopy
) I can't seem to find the implementation in the source. I am wondering if this behavior below is expected, if something different should be done fordtype=object
to make it a truly deep copy, or if we could at least add a note to the documentation that notes this behavior?Thanks!
The text was updated successfully, but these errors were encountered: