Skip to content

DOC: Add example for pandas.DataFrame.rolling() with on #50139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 14, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -1110,6 +1110,29 @@ class Window(BaseWindow):
2 2.958621
3 NaN
4 NaN

**on**

Rolling sum with a window length of 2 on specific columon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 days?


>>> df = pd.DataFrame({'A' : [1, 3, 5, np.nan, 7],
... 'B' : [2, 4, np.nan, 6, 8]})

>>> df
A B
0 1.0 2.0
1 3.0 4.0
2 5.0 NaN
3 NaN 6.0
4 7.0 8.0

>>> df.rolling(2, on='A').sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this give the same result even without on='A'? might be more illustrative to have an example in which on= makes a difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without on='A', the result will be like this:

>>> df.rolling(2).sum()
     A     B
0  NaN   NaN
1  4.0   6.0
2  8.0   NaN
3  NaN   NaN
4  NaN  14.0

Do I need to add this result as comparison?

Copy link
Member

@MarcoGorelli MarcoGorelli Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more something like

In [62]: df = pd.DataFrame({
    ...:     'A': to_datetime(['2020-01-01', '2020-01-01', '2020-01-02']),
    ...:     'B': [1,2,3],
    ...:     },
    ...:     index=date_range('2020', periods=3))

in which if you do rolling('D'), then the values of 'B' differ if you do on='A' (instead of the default, which uses the index)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what that means. How about the following:

>>> df = pd.DataFrame({
...     'A': [pd.to_datetime('2020-01-01'),
...           pd.to_datetime('2020-01-01'),
...           pd.to_datetime('2020-01-02'),],
...     'B': [1, 2, 3], },
...     index=pd.date_range('2020', periods=3))

>>> df
                    A  B
2020-01-01 2020-01-01  1
2020-01-02 2020-01-01  2
2020-01-03 2020-01-02  3

>>> df['B'].rolling('2D').sum()  # to avoid warning when sum on 'A'
2020-01-01    1.0
2020-01-02    3.0
2020-01-03    5.0
Freq: D, Name: B, dtype: float64

>>> df.rolling('2D', on='A').sum()   # value of 'B' is differ from above
                    A    B
2020-01-01 2020-01-01  1.0
2020-01-02 2020-01-01  3.0
2020-01-03 2020-01-02  6.0

Copy link
Member

@MarcoGorelli MarcoGorelli Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# value of 'B' differs from above

other than that, looks fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the correction, my English skills are a little rusty.

A B
0 1.0 NaN
1 3.0 6.0
2 5.0 NaN
3 NaN NaN
4 7.0 14.0
"""

_attributes = [
Expand Down