Skip to content

BUG: yield correct Series subclass in df.iterrows() (#13977) #13978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kernc
Copy link
Contributor

@kernc kernc commented Aug 12, 2016

@jreback
Copy link
Contributor

jreback commented Aug 12, 2016

pls read the instructions. we explicity call for tests.

@kernc kernc force-pushed the iterrows-with-constructor-sliced branch from 392d057 to eebbaf4 Compare August 12, 2016 19:13
@kernc
Copy link
Contributor Author

kernc commented Aug 12, 2016

Ah, sorry. I made sure existing tests passed. Now I also added a new one.

@kernc kernc force-pushed the iterrows-with-constructor-sliced branch from eebbaf4 to 7777704 Compare August 12, 2016 19:16
@codecov-io
Copy link

codecov-io commented Aug 12, 2016

Current coverage is 85.27% (diff: 100%)

Merging #13978 into master will decrease coverage by <.01%

@@             master     #13978   diff @@
==========================================
  Files           139        139          
  Lines         50455      50447     -8   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43026      43018     -8   
  Misses         7429       7429          
  Partials          0          0          

Powered by Codecov. Last update 6645b2b...2820315

@@ -693,8 +693,9 @@ def iterrows(self):

"""
columns = self.columns
_Series = self._constructor_sliced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this klass

@jreback jreback added the Bug label Aug 13, 2016
@jreback
Copy link
Contributor

jreback commented Aug 13, 2016

minor changes. ping when green.

@kernc kernc force-pushed the iterrows-with-constructor-sliced branch 4 times, most recently from 4c61879 to 9b78a5a Compare August 14, 2016 19:50
@kernc
Copy link
Contributor Author

kernc commented Aug 15, 2016

Revised. Hope it's ok.

I did notice some more Series and DataFrame calls where self._constructor_sliced or self._constructor might make sense (e.g. in DataFrame.apply() and submethods). What is the rationale for ever using Series instead of self._constructor_sliced?

# GH 13977
df = tm.SubclassedDataFrame({'a': [1]})
for i, row in df.iterrows():
tm.assert_series_equal(row, df.loc[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also add assertIsInstance explicitly to check it is SubclassedSeries? assert_series_equal only checks class identity, not check what it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is already tested in test_indexing_sliced which tests df.loc?

@sinhrks
Copy link
Member

sinhrks commented Aug 15, 2016

Thx for the update. Yes fixing relevant cases are appreciated (OK for separate PR). The remaining normal constructor is mostly because of historical reason, and can be replaced to constructor....

@kernc
Copy link
Contributor Author

kernc commented Aug 15, 2016

But what are relevant cases? If DataFrame._constructor must always be a subtype of DataFrame (and so on for _constructor_sliced and _constructor_expanddim), why not just do a mass renaming of DataFrame into self._constructor (etc.) in all methods?

Using Python's inspect module, one could even write a simple test ensuring no references to Series, DataFrame and Panel are hardcoded. 💡

@sinhrks
Copy link
Member

sinhrks commented Aug 15, 2016

It's simply because no one can take a time to do it (add tests). Note that sigatures must be changed sometimes, see #13208.

@kernc kernc force-pushed the iterrows-with-constructor-sliced branch 2 times, most recently from b62559e to 2794c37 Compare August 15, 2016 20:46
@kernc
Copy link
Contributor Author

kernc commented Aug 15, 2016

Thanks, will look into it.

@kernc
Copy link
Contributor Author

kernc commented Aug 15, 2016

Adding just __finalize__() made tests no longer pass.

I added a RFC commit with one proposed solution. It basically makes only DataFrame inherit DataFrame's _metadata properties, Series Series', and so on. I don't know whether this opposes any established behavior / documentation. Doesn't seem to. The tests will pass. 🙏

@jreback
Copy link
Contributor

jreback commented Aug 15, 2016

the inheriting of metadata should be on a like object. However this would belong in another PR.

@sinhrks
Copy link
Member

sinhrks commented Aug 16, 2016

@kernc thx for theck. I've thought finalize has been generally called, sorry.

@kernc kernc force-pushed the iterrows-with-constructor-sliced branch from 2794c37 to 8d0e8fc Compare August 16, 2016 00:48
@kernc
Copy link
Contributor Author

kernc commented Aug 16, 2016

So this appears to be it for this PR. I'll be opening the other one shortly. Thanks.

@kernc kernc force-pushed the iterrows-with-constructor-sliced branch 2 times, most recently from e052101 to 2820315 Compare August 22, 2016 19:01
@kernc kernc force-pushed the iterrows-with-constructor-sliced branch from 2820315 to 47f5894 Compare August 24, 2016 21:23
@kernc kernc force-pushed the iterrows-with-constructor-sliced branch from 47f5894 to 9aaac80 Compare August 24, 2016 21:26
@jreback jreback added this to the 0.19.0 milestone Aug 24, 2016
@jreback
Copy link
Contributor

jreback commented Aug 24, 2016

lgtm. ping on green.

@kernc
Copy link
Contributor Author

kernc commented Aug 26, 2016

So this is green. I'm not sure why Xcode build is marked as failed.

@jreback
Copy link
Contributor

jreback commented Aug 26, 2016

thanks!

@jreback jreback closed this in 042b6f0 Aug 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.iterrows() constructs Series instead of its proper subclass
4 participants