-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make itertuples really an iterator/generator in implementation, not just return type #20783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
looks like a generator to me
|
That's just because you return a
It looks like iterator because of |
You can test the issue here by doing: d = pandas.DataFrame({'a': range(100000000)})
for a in d.itertuples(index=False, name=None):
print(a) Do this in Python interpreter. Note the time it takes to create |
and if u want to fix it pls submit a PR so what u describe is an implementation detail |
Implementation detail which blows up memory and performance? Anything can be made look like iterator. But if does not really behave like iterator, it is not an iterator. I think this is a bug. Please reopen this. And then me or somebody else can make a pull request. |
This goes even deeper. Also iterating over a series constructs a list internally: d = pandas.DataFrame({'a': range(100000000)})
for a in d['a']:
print(a) This will also have a large delay before starting sending results back. |
I agree that ideally it would be more lazy the iteration (although not really a priority issue for me), and since we would accept a PR to fix, let's keep the issue open. |
@mitar you are welcome to submit a PR, however, this method follows exactly the pandas paradigm. We create a new copy of things then hand it back to you, here the handing back is an iterator. If you can optimize this, great, but you are fighting standard practice. Further you may slow things down by doing this in the common case. |
This would be a welcome fix if possible. @mitar a couple things to watch out for, which we just hit with
|
I made: #20796 |
Calling |
I think that the PR #20796 is ready to be reviewed. |
itertuples
is not really an iterator/generator and constructs a copy of whole DataFrame in memory. Ideally it would return just an iterator and construct row by row as it is being iterated over.The text was updated successfully, but these errors were encountered: