-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: should constructing Index from a Series make a copy? #42934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A similar example but with a DataFrame and
|
This probably doesn't fit in this thread but just for consideration alongside this issue is that while
I wasn't expecting the columns' names to be changed, I intuitively expected a new index object was created by the constructor. I agree your examples should be corrected. |
Hmm, since you are explicitly passing the same Index object as columns and rows index, this can maybe be considered as expected behaviour. Not really sure .. (but indeed a different issue). |
Since properties of the index are assumed immutable, they are cached, and this can lead to invalid states:
gives
|
It seems that this issue caused random SIGSEGV and SIGBUS in my code (MacOS, pandas=1.1.4=py38hcf432d8_0 from conda_forge). Unfortunately, I cannot reproduce them in a minimal example. My code looks something like this:
This modifies the index, as described above. However, if I repeat the line
once more, the assignment sometimes works (apparently re-using the previous index values) and sometimes crashes the interpreter. |
@ehansis - thanks for adding in here, however this appears to be a separate issue. If you add in
before and after the |
@rhshadrach OK, thanks, I'll try to check that if I ever manage to get the segmentation faults reproduced. Let me know if I can be of further help. |
BTW, the other way around, when constructing a Series from an Index, we do take a copy to avoid such issues. Lines 401 to 409 in 57d8d3a
|
From a comment of @jbrockmendel at #41878 (comment):
In the above, we create an Index from a Series, then mutate the Series, which also updated the Index, while an Index is assumed to be immutable.
Changing the example a bit, you can obtain wrong values with indexing this way:
So
ser[0]
is still giving a result, while that key doesn't actually exist in the Series' index at that point.I know that generally we consider this a user error if you would do this with a numpy array (
idx = pd.Index(arr)
and mutating the array), but here you get that by only using high-level pandas objects itself. In which case we should prevent this from happening?The text was updated successfully, but these errors were encountered: