Skip to content

CLN: replace usage internally of .iteritems with .items #26114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 10, 2019

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Apr 16, 2019

  • xref CLN: remove PY2 #25725
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Now that Python2 is being removed, .iteritems should be deprecated, and .items should be used instead.

EDIT: This PR ended up not deprecating .iteritems, but only change its usage internally with using .items

@topper-123 topper-123 force-pushed the deprecate_iteritems branch from d68d4f8 to 397bf66 Compare April 17, 2019 01:13
@topper-123 topper-123 marked this pull request as ready for review April 17, 2019 01:15
@topper-123
Copy link
Contributor Author

The error is unrelated to this PR:

CondaHTTPError: HTTP 504 GATEWAY_TIMEOUT for url <https://conda.anaconda.org/c3i_test/noarch/repodata.json>
Elapsed: 00:59.894676
CF-RAY: 4c8a8f7d1f385558-ORD
A remote server error occurred when trying to retrieve this URL.
A 500-type error (e.g. 500, 501, 502, 503, etc.) indicates the server failed to
fulfill a valid request.  The problem may be spurious, and will resolve itself if you
try your request again.  If the problem persists, consider notifying the maintainer
of the remote server.

@codecov
Copy link

codecov bot commented Apr 17, 2019

Codecov Report

Merging #26114 into master will decrease coverage by 0.01%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26114      +/-   ##
==========================================
- Coverage   91.99%   91.97%   -0.02%     
==========================================
  Files         175      175              
  Lines       52384    52392       +8     
==========================================
+ Hits        48189    48190       +1     
- Misses       4195     4202       +7
Flag Coverage Δ
#multiple 90.53% <90%> (-0.01%) ⬇️
#single 40.73% <30%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/panel.py 35.43% <0%> (ø) ⬆️
pandas/core/reshape/pivot.py 96.54% <100%> (ø) ⬆️
pandas/io/json/table_schema.py 98.3% <100%> (ø) ⬆️
pandas/core/indexes/multi.py 95.61% <100%> (ø) ⬆️
pandas/plotting/_core.py 83.76% <100%> (ø) ⬆️
pandas/core/util/hashing.py 98.4% <100%> (ø) ⬆️
pandas/core/reshape/reshape.py 99.56% <100%> (ø) ⬆️
pandas/core/strings.py 98.86% <100%> (ø) ⬆️
pandas/core/sparse/frame.py 95.49% <100%> (ø) ⬆️
pandas/io/formats/style.py 96.68% <100%> (ø) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68b1da7...397bf66. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 17, 2019

Codecov Report

Merging #26114 into master will increase coverage by 0.07%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26114      +/-   ##
==========================================
+ Coverage   91.79%   91.86%   +0.07%     
==========================================
  Files         180      174       -6     
  Lines       50934    50700     -234     
==========================================
- Hits        46756    46577     -179     
+ Misses       4178     4123      -55
Flag Coverage Δ
#multiple 90.4% <90%> (-0.08%) ⬇️
#single 41.78% <30%> (-0.32%) ⬇️
Impacted Files Coverage Δ
pandas/core/panel.py 17.61% <0%> (+1.74%) ⬆️
pandas/core/reshape/pivot.py 96.52% <100%> (ø) ⬆️
pandas/io/json/table_schema.py 98.3% <100%> (ø) ⬆️
pandas/core/indexes/multi.py 95.67% <100%> (+0.81%) ⬆️
pandas/plotting/_core.py 83.77% <100%> (-5.61%) ⬇️
pandas/core/util/hashing.py 98.41% <100%> (ø) ⬆️
pandas/core/reshape/reshape.py 99.56% <100%> (ø) ⬆️
pandas/core/strings.py 98.92% <100%> (-0.02%) ⬇️
pandas/core/sparse/frame.py 95.63% <100%> (+2.06%) ⬆️
pandas/io/formats/style.py 96.68% <100%> (-0.47%) ⬇️
... and 115 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c407b73...4fec810. Read the comment docs.

@jreback jreback added API Design Deprecate Functionality to remove in pandas labels Apr 17, 2019
@jreback
Copy link
Contributor

jreback commented Apr 17, 2019

yeah I guess we need to deprecate this. let me have a look.

@jorisvandenbossche
Copy link
Member

Is it needed to deprecate this?

I understand it that from the stdlib python 2/3 point of view, it makes sense: items + iteritems were originally modelled after dict's items and iteritems methods, and now with python 3 only, we only need to keep items.

However, leaving that aspect aside, from a consistency within pandas, I think having the methods iteritems / itertuples / iterrows is also nice. And I personally prefer to keep this consistency compared to the one with dict.

@topper-123
Copy link
Contributor Author

topper-123 commented Apr 17, 2019

@jorisvandenbossche

iterrows and itertuples are code smells really and if encountered in a code base, the instinct should be to try to refactor into something vectorized.

iteritems/items OTOH is idiomatic and useful for a lot of problem types. So I don't have a problem with distancing these methods from each other by just deprecating iteritems. The grouping of these methods is deceptional and if you've learned that iteritems is a good method you may be lead to believe that iterrows and itertuples can be used in a similar manner, but really you shouldn't in 99% of the cases.

The dict argument is also important IMO, as having DataFrames/Series be dict-like is a nice pedagogical concept, and keepingiteritems will require more explanation for the user (it's a python2 concept used to avoid returning lists, but that we chose to keep in order to keep similarity with iterrows and itertuples (but don't use those, because...), but it's really the same as items, so just use items ...).

So, I think removing iteritems will make the basic pandas concepts a bit easier to learn/use and will encourage healthier habits for non-expert users.

@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

I actually like this deprecation; it clearly delineates that that we are iterating over columns, rather than the row-iterators; plus it does make sense post python 2 for more compatibility in Series with dict.

@jorisvandenbossche ?

@jreback jreback added this to the 0.25.0 milestone Apr 20, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with this one

@jorisvandenbossche
Copy link
Member

My main problem with deprecating this is that we have consistently documented to use iteritems() instead of items() (in the user guide, in the docstrings, in see also's, ... even the docstring of items() uses iteritems() in the example. items() is basically not mentioned a single time). So ideally, we should have thought about this much earlier to update that ...
Because now we would want to deprecate that method that is documented in favor of an alias probably almost no one uses.

@topper-123
Copy link
Contributor Author

A solution to your concern could be to keep the deprecations in 0.25 in v1.0 (+ maybe 2 minor versions)? That would not be unreasonable anyway for changes so close to 1.0.

Another alternative could be to make items the "advertised" method in 0.25, and just have a warning text in the iteritems doc string about a future deprecation, without actually deprecating it? This would be kind of like we did with tolist when to_list was added.

Alternatively IMO advertising DataFrames/Seried having a dict-like interface should probably be de-emphasised. E.g. the phrase here

Consistent with the dict-like interface, iteritems() iterates through key-value pairs

isn't really true for Pandas on Python3 (while it was true for Pandas on Python2).

@WillAyd
Copy link
Member

WillAyd commented May 10, 2019

@jorisvandenbossche AFAICT there's only a handful of references to iteritems in the user guide (specifically four in the basics file) so I would think this is a pretty easy deprecation? Bundling it with dropping Python2 also seems to make a lot of sense from an API perspective.

If you feel strongly against this then something we can discuss at the sprint, but just wanted to offer that counterpoint of this I think not being too hard of a deprecation

@simonjayhawkins
Copy link
Member

@topper-123 can you merge master.

Conflicting files
doc/source/whatsnew/v0.25.0.rst
pandas/core/generic.py
pandas/core/indexes/multi.py

@topper-123 topper-123 force-pushed the deprecate_iteritems branch 2 times, most recently from 5d4b238 to 86c220a Compare June 1, 2019 11:45
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm.

@@ -1524,10 +1524,10 @@ To iterate over the rows of a DataFrame, you can use the following methods:

df

iteritems
items
~~~~~~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be same length as the header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@jreback
Copy link
Contributor

jreback commented Jun 1, 2019

@jorisvandenbossche are you still -1 on this?

@topper-123 topper-123 force-pushed the deprecate_iteritems branch 2 times, most recently from 9657a34 to 136c6c0 Compare June 4, 2019 01:05
@WillAyd
Copy link
Member

WillAyd commented Jul 1, 2019

@topper-123 can you merge master?

@jorisvandenbossche still objecting to this?

@jreback jreback removed this from the 0.25.0 milestone Jul 2, 2019
@topper-123 topper-123 force-pushed the deprecate_iteritems branch from 136c6c0 to ec3e1a2 Compare July 2, 2019 23:24
@jreback
Copy link
Contributor

jreback commented Jul 3, 2019

@TomAugspurger what do you think about this? I am +0 on this, @jorisvandenbossche ?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 3, 2019 via email

@jorisvandenbossche
Copy link
Member

Sorry for the slow reply here. I don't feel stronly about it, so you can count me as a -0 as well (or -0.2 :-))

Basically I feel still the way as I said above (#26114 (comment)), that it feels "strange" to deprecate the method that we have always documented in favor of something that was basically not mentioned in the docs.

(BTW, apart from actually deprecation, we could also already start with changing the docs)

@topper-123 topper-123 force-pushed the deprecate_iteritems branch from 4fec810 to f83f906 Compare July 3, 2019 17:00
@topper-123 topper-123 force-pushed the deprecate_iteritems branch 3 times, most recently from 0de0e84 to 556dcc0 Compare July 10, 2019 09:57
@topper-123 topper-123 force-pushed the deprecate_iteritems branch from 556dcc0 to 5307152 Compare July 10, 2019 12:18
@pandas-dev pandas-dev deleted a comment from pep8speaks Jul 10, 2019
@topper-123
Copy link
Contributor Author

I've taken out the proposed deprecation.

I would like to get the rest of the PR in though, so internally we use items rather than iteritems. If no one objects, I'll merge the PR in a few days (without deprecating iteritems).

@topper-123 topper-123 added this to the 0.25.0 milestone Jul 10, 2019
@topper-123 topper-123 changed the title DEPR: deprecate DataFrame.iteritems and Series.iteritems CLN: replace usage internally of .iteritems with .items Jul 10, 2019
@topper-123 topper-123 added Clean and removed API Design Deprecate Functionality to remove in pandas labels Jul 10, 2019
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 10, 2019 via email

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • do we need a note in whatsnew? (just saying this is now not recommended)?
  • can we add a code_check to prohitbit this?

Series.items
Series.iteritems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to add this back?

Copy link
Contributor Author

@topper-123 topper-123 Jul 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as we’re not deprecating iteritems anymore?

EDIT: iteritems, not itertools....

@@ -826,6 +826,11 @@ def iteritems(self):
for i, k in enumerate(self.columns):
yield k, self._ixs(i, axis=1)

@Appender(items.__doc__)
def iteritems(self):
for key, value in self.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't want to return self.items() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've changed that.

The issue was that items is a generator, while iteritems is a function, that returns a generator. So flake8 requires the items doc string to use "Yields" and the iteritems doc string to use "Returns".

This is a bit silly in either case, and it feels off to do stuff just to satisfy a linter.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2019

cool thanks, ping on green.

@topper-123
Copy link
Contributor Author

topper-123 commented Jul 10, 2019

* do we need a note in whatsnew? (just saying this is now not recommended)?

* can we add a code_check to prohibit this?

I can look into these two in a follow-up.

@jreback jreback merged commit 472af55 into pandas-dev:master Jul 10, 2019
@jreback
Copy link
Contributor

jreback commented Jul 10, 2019

thanks @topper-123

@topper-123 topper-123 deleted the deprecate_iteritems branch July 10, 2019 19:32
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify in the iteritems docstring that it is an alias of items, and that items is preferred or something like that ?

if self.columns.is_unique and hasattr(self, "_item_cache"):
for k in self.columns:
yield k, self._get_item_cache(k)
else:
for i, k in enumerate(self.columns):
yield k, self._ixs(i, axis=1)

@Appender(_shared_docs["items"] % "Returns\n -------")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both return a generator, so why the different docstring here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the "return " part of the method that makes flake8 think this is not a generator, so it won't allow a "Yields" section in the iteritems doc string.

I think flake8 is tecnically right, but it makes for uglier code in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think maybe if the Appender is followed by a # noqa comment, we can make flake8 ignore that section. I'll try it tomorrow.

This is index for Series, columns for DataFrame and so on.
This is index for Series and columns for DataFrame.

Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fr the other docstring you used Yields ?

"""
for h in self._info_axis:
yield h, self[h]

@Appender(items.__doc__)
def iteritems(self):
return self.items()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think both Series and DataFrame have their own implementation, so this one is nowhere used. So can rather be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants