Skip to content

ENH: Implement __iter__ for Rolling and Expanding #34201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 17, 2020

Conversation

charlesdong1991
Copy link
Member

@charlesdong1991 charlesdong1991 commented May 15, 2020

@pep8speaks
Copy link

pep8speaks commented May 15, 2020

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-17 08:43:55 UTC

@mroeschke
Copy link
Member

Could you also test:

  • rolling with an offset: 1D
  • where on is specified

@mroeschke mroeschke added Enhancement Window rolling, ewma, expanding labels May 16, 2020
@charlesdong1991
Copy link
Member Author

@mroeschke thanks, I added those two cases of tests, and any further feedbacks are very welcome!

@charlesdong1991 charlesdong1991 requested a review from mroeschke May 16, 2020 09:20
window_size = len(start)
for i in range(window_size):
result = obj.iloc[slice(start[i], end[i])]
if result.count().min() >= iter_threshold:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO since min_periods is supposed to influence the the aggregation result, and we're just returning the window here, I think we should always return the result and not filter results based on iter_threshold

Copy link
Member Author

@charlesdong1991 charlesdong1991 May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, i am not sure about this actually, because df.rolling(window=2, min_periods=3).sum() for instance is not allowed, and will raise a ValueError because min_periods has to be equal or smaller than window in aggregation, that is why i think maybe it makes a bit more sense to have the minimum here between window and min_periods.

but I also do not know if this matters in iter? maybe raise an error here, does it sound more reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I am thinking it doesn't matter in __iter__ because the aggregation hasn't happened yet (sum), and it's up to the user to decide what they want to do with each window.

From the user perspective, I can see a potential source of confusion if not all the windows are returned.

Copy link
Member Author

@charlesdong1991 charlesdong1991 May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, leaving it to users sounds more convincing!

thanks, @mroeschke ! i have updated the PR to remove this check (also no error raising) here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Might want to document this behavior in computation.rst

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added! thanks for the reviews! @mroeschke

@charlesdong1991 charlesdong1991 requested a review from mroeschke May 16, 2020 20:14

.. versionadded:: 1.1.0

It is allowed to iterate over the Series and DataFrame with the ``rolling`` and ``expanding`` window are
Copy link
Member

@mroeschke mroeschke May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just:

``Rolling`` and ``Expanding`` objects now support iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shortened

It is allowed to iterate over the Series and DataFrame with the ``rolling`` and ``expanding`` window are
support.

Be noted that for ``rolling``, even if ``min_periods`` is greater than ``window`` size , we can still iterate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence can just note that min_periods is ignored. Aggregation won't raise an error since we're just returning DataFrame or Series objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, rephrased

Be noted that for ``rolling``, even if ``min_periods`` is greater than ``window`` size , we can still iterate
all the result out, although it will raise an error during aggregation.

.. ipython:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think showing the dataframe case is enough. Also think we should print(i) in the iteration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, removed series case, and also slightly change the ipython block to get web/doc check green.

@charlesdong1991 charlesdong1991 requested a review from mroeschke May 17, 2020 09:27
@jreback jreback added this to the 1.1 milestone May 17, 2020
@jreback jreback merged commit 83530bd into pandas-dev:master May 17, 2020
@jreback
Copy link
Contributor

jreback commented May 17, 2020

thanks @charlesdong1991 very nice!

@charlesdong1991 charlesdong1991 deleted the iter_window branch May 19, 2020 08:10
@banderlog
Copy link

In which pandas version it will appear?

The pandas-1.04 still shows NotImplementedError

@mroeschke
Copy link
Member

1.1 which should come out in a few months

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: implement __iter__ for window objects
5 participants