API / BUG: copy non-Index arrays in Index construction to avoid data corruption #51930
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Would close #34364 and #42934
The idea here is that since the Index is assumed to be immutable and caches things like the hashtable, we should avoid that a user can actually mutate those values through normal pandas functionality by making a copy when creating an Index from array-like data.
I made the exception for an object that is already an Index (since this is immutable, you can't mutate this through normal pandas functionality, and typically this will already made a copy when first being created). This should avoid too many copies on repeated
ensure_index
calls.Still need to add more tests (covering the segfault case), docs, and do the same in the Index subclasses'
__new__
, in case we want something like this.doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.