-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CoW: Add reference tracking to index when created from series #51803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if not copy and isinstance(data, (ABCSeries, Index)): | ||
refs = data._references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I was also wondering if my related PR: this tackles it for Series and Index, but in theory we also have the problem with arrays:
arr = np.array([1, 2, 3])
ser = pd.Series(arr, index=arr)
And with then mutating ser
, you can also trigger faulty behaviour / crashes.
So while for Index/Series this avoids a copy, we might still want to copy anyway for other array-likes (like we are going to do in the DataFrame/Series constructors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly. But I wanted to wait for the pr that tackles this for the DataFrame case to get merged before adding this for index
Currently, this doesn't yet create new BlockValuesRefs objects, right? But only store then on the Index if the input data already has refs. For example, consider |
And the approach generally certainly looks good. It's a bit unfortunate that we will need to start keeping track of this in index operations as well (considering which ones gives views etc), while it would have been nice not to have to do this (but yeah, the only option then is to always copy data when creating an Index, if we want to optimize this, this is unavoidable). |
Yes exactly, we are doing a copy in this case right now if I am not mistaken, so I kept it out of here on purpose. But this is certainly a sensible follow up |
# Conflicts: # pandas/core/indexes/base.py
So you'd say this is generally ok to merge? |
For me, yes |
Rebasing once more |
merging to get started with the follow ups |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
…dex when created from series) (#52000)
|
||
Parameters | ||
---------- | ||
index: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an Index object right? (even if we can't put it in the annotation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.cc @jbrockmendel this takes a shot at #34364 under CoW. Still a rough poc that is missing tests for the other index classes, but would appreciate if you could take a quick look. This avoids modifying an index by modifying the object it was created from.