-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Implement __iter__ for Rolling and Expanding #34201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-17 08:43:55 UTC |
Could you also test:
|
@mroeschke thanks, I added those two cases of tests, and any further feedbacks are very welcome! |
pandas/core/window/rolling.py
Outdated
window_size = len(start) | ||
for i in range(window_size): | ||
result = obj.iloc[slice(start[i], end[i])] | ||
if result.count().min() >= iter_threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO since min_periods
is supposed to influence the the aggregation result, and we're just returning the window here, I think we should always return the result and not filter results based on iter_threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, i am not sure about this actually, because df.rolling(window=2, min_periods=3).sum()
for instance is not allowed, and will raise a ValueError
because min_periods has to be equal or smaller than window in aggregation, that is why i think maybe it makes a bit more sense to have the minimum here between window
and min_periods
.
but I also do not know if this matters in iter
? maybe raise an error here, does it sound more reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I am thinking it doesn't matter in __iter__
because the aggregation hasn't happened yet (sum
), and it's up to the user to decide what they want to do with each window.
From the user perspective, I can see a potential source of confusion if not all the windows are returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, leaving it to users sounds more convincing!
thanks, @mroeschke ! i have updated the PR to remove this check (also no error raising) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! Might want to document this behavior in computation.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, added! thanks for the reviews! @mroeschke
|
||
.. versionadded:: 1.1.0 | ||
|
||
It is allowed to iterate over the Series and DataFrame with the ``rolling`` and ``expanding`` window are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just:
``Rolling`` and ``Expanding`` objects now support iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shortened
It is allowed to iterate over the Series and DataFrame with the ``rolling`` and ``expanding`` window are | ||
support. | ||
|
||
Be noted that for ``rolling``, even if ``min_periods`` is greater than ``window`` size , we can still iterate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sentence can just note that min_periods
is ignored. Aggregation won't raise an error since we're just returning DataFrame or Series objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, rephrased
Be noted that for ``rolling``, even if ``min_periods`` is greater than ``window`` size , we can still iterate | ||
all the result out, although it will raise an error during aggregation. | ||
|
||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think showing the dataframe case is enough. Also think we should print(i)
in the iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, removed series case, and also slightly change the ipython block to get web/doc check green.
thanks @charlesdong1991 very nice! |
In which pandas version it will appear? The |
1.1 which should come out in a few months |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff