-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: DataFrame.take always returns a copy #30784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -3370,14 +3368,7 @@ class max_speed | |||
new_data = self._data.take( | |||
indices, axis=self._get_block_manager_axis(axis), verify=True | |||
) | |||
result = self._constructor(new_data).__finalize__(self) | |||
|
|||
# Maybe set copy if we didn't actually change the index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if its relevant, but DTI.take sometimes returns a view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case the indices can represent a slice, right?
Personally I don't think we should make that optimization, as IMO it is much clearer for take to always be a copy, but I suppose for Index which is supposed to be immutable it might matter less.
So the reason that this is (expectedly) failing, is due to missing SettingWithCopyWarnings. I suppose we are internally relying on the current So if we want to go through with this change, we would need to track all internal usage of |
return result | ||
See the docstring of `take` for full explanation of the parameters. | ||
""" | ||
return self.take(indices=indices, axis=axis, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am including a Series._take_with_is_copy
to override the generic one. This is because the defaults for the old is_copy
kwargs were different for Series (False) and DataFrame (True).
I am not fully sure if the code paths that use _take_with_is_copy
actually can have a Series calling it, but included this to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your _take_with_is_copy
makes sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that asserts that no SettingWithCopy warning is raised by .asof and .sample (e.g. the test cases that had to change and your example)
Added tests for sample and asof |
looks fine. needs a rebase. can you run some approporiate asv's to make sure doesn't hit perf much. |
The specific asv benchmark that had a regression because of this, indicates it is fixed:
0.25.3:
master:
PR:
|
great thanks @jorisvandenbossche |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
@jorisvandenbossche if you can backport this. |
Closes #27357
This adds an internal version of
take
with the behaviour of setting_is_copy
for DataFrames that the publictake
did before, so this version can be used in the indexing code (where we want to keep track of parent dataframe with_is_copy
for SettingWithCopyWarnings).I named it
_take_with_is_copy
which is literally what it is doing, but happy to hear alternatives.This then updates the deprecation to fully ignore the keyword and indicate in the deprecation message the keyword has no effect anymore.
I checked #27349 and #30615 to ensure that where previously the internal version was used or
is_copy
was specified, now the appropriate function is used. I suppose that in some of the cases where I now use the internal_take_with_is_copy
this is not actually needed, but it's the safest thing anyway (it will do the same as it did before).