Skip to content

BUG: Series.rolling min_period is ignored and NA behaves strangely #26996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jun 22, 2019 · 1 comment · Fixed by #30923
Closed

BUG: Series.rolling min_period is ignored and NA behaves strangely #26996

ghost opened this issue Jun 22, 2019 · 1 comment · Fixed by #30923
Labels
Bug Window rolling, ewma, expanding
Milestone

Comments

@ghost
Copy link

ghost commented Jun 22, 2019

Using current d39a6de

This may be two separate issues, I can't tell.

Case 1.

import pandas as pd
ser=pd.Series(range(10))
ser[:5]=None
print(ser)
pd.Series(ser).rolling(3, min_periods=3).count()

returns

0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
5    1.0
6    2.0
7    3.0
8    3.0
9    3.0
dtype: float64

I expected the first few results to be NA. perhaps related:

pd.Series(ser).rolling(3, min_periods=999).count()

doesn't raise an exception.

Case 2.

for series with DatetimeIndex and rolling with an offset, the behavior is also strange, but different.

import pandas as pd
import random
from itertools import chain
dates = ["2001-01-01"]*2 + ["2001-01-02"]*2 + ["2001-01-03"]*2 + ["2001-01-04"]*2 + ["2001-01-05"]*2+ ["2001-01-06"]*2
ser=pd.Series(index=pd.DatetimeIndex(dates))
ser[0]=111
ser[2]=222
ser[6]=333
res=ser.rolling("2D", min_periods=1).count()
df=pd.DataFrame(dict(data=ser,count=res))
print(df)

result

             data  count
2001-01-01  111.0    1.0   # OK. The day has one non-NA value, 
2001-01-01    NaN    1.0  # and min_period is satisfied
2001-01-02  222.0    2.0  # OK
2001-01-02    NaN    2.0  # OK
2001-01-03    NaN    1.0  # why is this 1.0? this date has no values in it.
2001-01-03    NaN    1.0 # again
2001-01-04  333.0    1.0  # OK
2001-01-04    NaN    1.0 # OK
2001-01-05    NaN    1.0 # why is this 1.0? this date has no values in it.
2001-01-05    NaN    1.0 # again
2001-01-06    NaN    NaN # why is this NaN? min_periods is satisfied
2001-01-06    NaN    NaN # and non-NA count should 0.

encountered in the course of #26959

@WillAyd
Copy link
Member

WillAyd commented Jun 23, 2019

Generally looks like count may be returning 0 instead of NA for non-frequency data. Should return NA as you mention in these cases so would certainly take a PR if you see a way to make it work

@WillAyd WillAyd added Bug Window rolling, ewma, expanding labels Jun 23, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Jun 23, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0.0 Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants