Skip to content

Add __copy__ and __deepcopy__ to NDFrame #15444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

bmcfee
Copy link
Contributor

@bmcfee bmcfee commented Feb 17, 2017

@@ -3156,6 +3156,13 @@ def copy(self, deep=True):
data = self._data.copy(deep=deep)
return self._constructor(data).__finalize__(self)

__copy__ = copy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but a child class that overrides copy won't override __copy__ without a def __copy__... here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand -- def copy is defined immediately above, and the PR adds the __copy__ = copy alias. Are you saying that this won't work unless I also do def __copy__?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is better as def __copy__(self):.

also let's change the same in pandas/indexes/base.py. hopefully nothing will break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee but if a class (NDFrameChild) inherits from NDFrame and overrides .copy: with the proposed code NDFrameChild.__copy__ remains NDFrame.copy, rather than NDFFrameChild.copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see, and agree with @jreback that it should probably be fixed in indexes/base as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your prior way seems more elegant (and faster)! Unfortunately Python fails the elegant = correct question here, though

@@ -3156,6 +3156,13 @@ def copy(self, deep=True):
data = self._data.copy(deep=deep)
return self._constructor(data).__finalize__(self)

__copy__ = copy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is better as def __copy__(self):.

also let's change the same in pandas/indexes/base.py. hopefully nothing will break.

@@ -332,6 +332,15 @@ def test_deepcopy(self):
for idx, value in compat.iteritems(series):
self.assertNotEqual(self.frame['A'][idx], value)

def test_deepcopy_empty(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests instead in test_generic.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is better as def copy(self):.

also let's change the same in pandas/indexes/base.py. hopefully nothing will break.

Do you want both of those in this PR?

add tests instead in test_generic.py

Sure, I can move it. I put it here because I figured it ought to be near the other deepcopy test and didn't see any other obvious place for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move those as well

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 17, 2017
@@ -723,7 +723,8 @@ def copy(self, name=None, deep=False, dtype=None, **kwargs):
new_index = new_index.astype(dtype)
return new_index

__copy__ = copy
def __copy__(self, **kwargs):
return self.copy(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add (and test) __deepcopy__ here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__deepcopy__ already exists in this class, see line 1483. Did you want me to move it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then just move it next to this one

@@ -1539,6 +1540,15 @@ def test_to_xarray(self):
expected,
check_index_type=False)

def test_deepcopy_empty(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at how the other tests are done. We need systematic testing on all structures. So move to Generic and use _construct and _compare

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below on how test should be for this. you need to tests .copy(), copy(...), .deepcopy(), deepcopy()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't understand your comment here. Can you point me to a specific example to base the tests on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think I might not have made the motivation for this test sufficiently clear.

It's not intended to be a full test of deepcopy, for which _construct might be useful. Rather, it's designed specifically to trigger the bug I mentioned in #15370 where the data/index are empty but not the columns. I don't see how _construct would be appropriate for this, but maybe I'm doing it wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee well you are changing fundamental things, which need full testing.

_construct is run with Series, DataFrame,Panel, to simply make a new one. Then you call a method/function and test if its a copy or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback sure, but that would be a separate test from the one you said to move. That issue never affected series; as far as I can tell, there's no way that it could have since it depends on having columns.

I have no problem adding generic tests for copy/deepcopy, but I want to make sure it's the right test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee you can add the test you wanted. but you are making fundamental changes. These HAVE to be tested on all classes that are affected. (and you ARE affecting everything)

empty_frame = DataFrame(data=[], index=[], columns=['A'])
empty_frame_copy = deepcopy(empty_frame)

self.assertEqual(empty_frame, empty_frame_copy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are already generic tests in pandas/indexes/common.py but need to expand them to include using

from copy import copy, deepcopy

....
copy(...)
deepcopy(..)

these should match self.copy() and self.deepcopy()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, when you say self.deepcopy do you mean self.__deepcopy__? There is not currently an exposed function of that name.

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 18, 2017

Unrelated to this PR, I've noticed several other places where there are assignment method aliases in base classes, eg, Index.__bool__. I'm not interested in fixing that in this PR, but maybe someone else ought to take a look down the road.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

@bmcfee if you want to make an issue would be great for other things

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 18, 2017

I made the requested changes. At this point, it looks like there are a bunch of test failures that have nothing to do with this PR. Should I assume master is in a not-very-good state right now?

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

not sure what you mean?

master has been all green for a while

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 18, 2017

@jreback appveyor had one build fail to install (?!); travis has one build failing on a panel4d setitem test, but only one of the builds.

FWIW, I had an appveyor / conda install fail (checksum mismatch) on a separate project this morning, and restarting the build fixed it. Maybe there's just something screwy in conda land today.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

no your change causes this to fail

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 18, 2017

no your change causes this to fail

Really? Nothing in the log indicates as such:

installing requirements from ci\requirements-2.7-64.run - done"
conda install -n pandas (conda build ci\appveyor.recipe -q --output)
conda : Error: no such directory: C:\projects\pandas-465\ci\appveyor.recipe
At line:1 char:26
+ conda install -n pandas (conda build ci\appveyor.recipe -q --output)
+                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Error: no such ...appveyor.recipe:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
 
conda : 
At line:1 char:1
+ conda install -n pandas (conda build ci\appveyor.recipe -q --output)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
 
CondaValueError: Value error: too few arguments, must supply command line package specs or --file
Command executed with exception: 
CondaValueError: Value error: too few arguments, must supply command line package specs or --file

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

look on travis

@jreback
Copy link
Contributor

jreback commented Feb 18, 2017

appveyor is just for confirmation builds
they are behind and i has to cancel a few

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 18, 2017

look on travis

Okay, I'm looking, and I don't understand anything about what's going on here, or what's different between the run where it passes (eg here) and the one where it fails (eg, here). Since the test log doesn't provide much useful diagnostic information, maybe you can help explain what the difference is so I'm not just stumbling in the dark?

@codecov-io
Copy link

codecov-io commented Feb 20, 2017

Codecov Report

Merging #15444 into master will decrease coverage by -0.01%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master   #15444      +/-   ##
==========================================
- Coverage   90.37%   90.36%   -0.01%     
==========================================
  Files         135      135              
  Lines       49466    49473       +7     
==========================================
+ Hits        44705    44707       +2     
- Misses       4761     4766       +5
Impacted Files Coverage Δ
pandas/core/generic.py 96.29% <83.33%> (-0.05%)
pandas/indexes/base.py 96.24% <83.33%> (ø)
pandas/core/common.py 91.02% <ø> (-0.34%)
pandas/util/testing.py 81.91% <ø> (-0.19%)
pandas/indexes/multi.py 96.65% <ø> (-0.09%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12f2c6a...bf36f35. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2017

@bmcfee I rebased and added a commit. hopefully this should pass.

the issue is that test_generic.py runs tests on Panel4D (which we are deprecating), and it is testing on that, so the addtl _consturct shows a warning, and later tests thus fail. easy enough just to skip that particular test.

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 20, 2017

@jreback Thanks!

@jreback jreback added this to the 0.20.0 milestone Feb 20, 2017
@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 20, 2017

This guy's green now -- what's next?

@jreback jreback closed this in 0b4fdf9 Feb 20, 2017
@jreback
Copy link
Contributor

jreback commented Feb 20, 2017

thanks for the patience @bmcfee

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15370

Author: Brian McFee <[email protected]>
Author: Jeff Reback <[email protected]>

Closes pandas-dev#15444 from bmcfee/deepcopy-ndframe and squashes the following commits:

bf36f35 [Jeff Reback] TST: skip the panel4d deepcopy tests
d58b1f6 [Brian McFee] added tests for copy and deepcopy
35f3e0f [Brian McFee] relocated Index.__deepcopy__ to live near __copy__
1aea940 [Brian McFee] switched deepcopy test to using generic comparator
7e67e7d [Brian McFee] ndframe and index __copy__ are now proper methods
820664c [Brian McFee] moved deepcopy test to generic.py
9721041 [Brian McFee] added copy/deepcopy to ndframe, fixes pandas-dev#15370
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deepcopy failure on empty dataframes with non-empty column set (numpy 1.12 compatibility)
4 participants