-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add __copy__ and __deepcopy__ to NDFrame #15444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/generic.py
Outdated
@@ -3156,6 +3156,13 @@ def copy(self, deep=True): | |||
data = self._data.copy(deep=deep) | |||
return self._constructor(data).__finalize__(self) | |||
|
|||
__copy__ = copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but a child class that overrides copy
won't override __copy__
without a def __copy__...
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand -- def copy
is defined immediately above, and the PR adds the __copy__ = copy
alias. Are you saying that this won't work unless I also do def __copy__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is better as def __copy__(self):
.
also let's change the same in pandas/indexes/base.py
. hopefully nothing will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee but if a class (NDFrameChild
) inherits from NDFrame
and overrides .copy
: with the proposed code NDFrameChild.__copy__
remains NDFrame.copy
, rather than NDFFrameChild.copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I see, and agree with @jreback that it should probably be fixed in indexes/base
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your prior way seems more elegant (and faster)! Unfortunately Python fails the elegant = correct question here, though
pandas/core/generic.py
Outdated
@@ -3156,6 +3156,13 @@ def copy(self, deep=True): | |||
data = self._data.copy(deep=deep) | |||
return self._constructor(data).__finalize__(self) | |||
|
|||
__copy__ = copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is better as def __copy__(self):
.
also let's change the same in pandas/indexes/base.py
. hopefully nothing will break.
pandas/tests/frame/test_misc_api.py
Outdated
@@ -332,6 +332,15 @@ def test_deepcopy(self): | |||
for idx, value in compat.iteritems(series): | |||
self.assertNotEqual(self.frame['A'][idx], value) | |||
|
|||
def test_deepcopy_empty(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests instead in test_generic.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is better as def copy(self):.
also let's change the same in pandas/indexes/base.py. hopefully nothing will break.
Do you want both of those in this PR?
add tests instead in test_generic.py
Sure, I can move it. I put it here because I figured it ought to be near the other deepcopy
test and didn't see any other obvious place for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move those as well
@@ -723,7 +723,8 @@ def copy(self, name=None, deep=False, dtype=None, **kwargs): | |||
new_index = new_index.astype(dtype) | |||
return new_index | |||
|
|||
__copy__ = copy | |||
def __copy__(self, **kwargs): | |||
return self.copy(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add (and test) __deepcopy__
here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__deepcopy__
already exists in this class, see line 1483. Did you want me to move it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, then just move it next to this one
@@ -1539,6 +1540,15 @@ def test_to_xarray(self): | |||
expected, | |||
check_index_type=False) | |||
|
|||
def test_deepcopy_empty(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at how the other tests are done. We need systematic testing on all structures. So move to Generic
and use _construct
and _compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment below on how test should be for this. you need to tests .copy()
, copy(...)
, .deepcopy()
, deepcopy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't understand your comment here. Can you point me to a specific example to base the tests on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think I might not have made the motivation for this test sufficiently clear.
It's not intended to be a full test of deepcopy
, for which _construct
might be useful. Rather, it's designed specifically to trigger the bug I mentioned in #15370 where the data/index are empty but not the columns. I don't see how _construct
would be appropriate for this, but maybe I'm doing it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee well you are changing fundamental things, which need full testing.
_construct
is run with Series, DataFrame,Panel, to simply make a new one. Then you call a method/function and test if its a copy or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback sure, but that would be a separate test from the one you said to move. That issue never affected series; as far as I can tell, there's no way that it could have since it depends on having columns.
I have no problem adding generic tests for copy/deepcopy, but I want to make sure it's the right test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee you can add the test you wanted. but you are making fundamental changes. These HAVE to be tested on all classes that are affected. (and you ARE affecting everything)
pandas/tests/test_generic.py
Outdated
empty_frame = DataFrame(data=[], index=[], columns=['A']) | ||
empty_frame_copy = deepcopy(empty_frame) | ||
|
||
self.assertEqual(empty_frame, empty_frame_copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are already generic tests in pandas/indexes/common.py
but need to expand them to include using
from copy import copy, deepcopy
....
copy(...)
deepcopy(..)
these should match self.copy()
and self.deepcopy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, when you say self.deepcopy
do you mean self.__deepcopy__
? There is not currently an exposed function of that name.
Unrelated to this PR, I've noticed several other places where there are assignment method aliases in base classes, eg, |
@bmcfee if you want to make an issue would be great for other things |
I made the requested changes. At this point, it looks like there are a bunch of test failures that have nothing to do with this PR. Should I assume master is in a not-very-good state right now? |
not sure what you mean? master has been all green for a while |
@jreback appveyor had one build fail to install (?!); travis has one build failing on a panel4d setitem test, but only one of the builds. FWIW, I had an appveyor / conda install fail (checksum mismatch) on a separate project this morning, and restarting the build fixed it. Maybe there's just something screwy in conda land today. |
no your change causes this to fail |
Really? Nothing in the log indicates as such:
|
look on travis |
appveyor is just for confirmation builds |
Okay, I'm looking, and I don't understand anything about what's going on here, or what's different between the run where it passes (eg here) and the one where it fails (eg, here). Since the test log doesn't provide much useful diagnostic information, maybe you can help explain what the difference is so I'm not just stumbling in the dark? |
f832a44
to
bf36f35
Compare
Codecov Report
@@ Coverage Diff @@
## master #15444 +/- ##
==========================================
- Coverage 90.37% 90.36% -0.01%
==========================================
Files 135 135
Lines 49466 49473 +7
==========================================
+ Hits 44705 44707 +2
- Misses 4761 4766 +5
Continue to review full report at Codecov.
|
@bmcfee I rebased and added a commit. hopefully this should pass. the issue is that |
@jreback Thanks! |
This guy's green now -- what's next? |
thanks for the patience @bmcfee |
closes pandas-dev#15370 Author: Brian McFee <[email protected]> Author: Jeff Reback <[email protected]> Closes pandas-dev#15444 from bmcfee/deepcopy-ndframe and squashes the following commits: bf36f35 [Jeff Reback] TST: skip the panel4d deepcopy tests d58b1f6 [Brian McFee] added tests for copy and deepcopy 35f3e0f [Brian McFee] relocated Index.__deepcopy__ to live near __copy__ 1aea940 [Brian McFee] switched deepcopy test to using generic comparator 7e67e7d [Brian McFee] ndframe and index __copy__ are now proper methods 820664c [Brian McFee] moved deepcopy test to generic.py 9721041 [Brian McFee] added copy/deepcopy to ndframe, fixes pandas-dev#15370
git diff upstream/master | flake8 --diff