Skip to content

BUG: ewm*() functions interpret min_periods off by one? #7884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seth-p opened this issue Jul 31, 2014 · 3 comments · Fixed by #7898
Closed

BUG: ewm*() functions interpret min_periods off by one? #7884

seth-p opened this issue Jul 31, 2014 · 3 comments · Fixed by #7898
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@seth-p
Copy link
Contributor

seth-p commented Jul 31, 2014

In the examples below with min_periods=2, expanding_mean/std start to give values once there are two values, but ewma/std only once there are three values. Is this intentional? It looks like a bug to me.

In [1]: from pandas import Series, expanding_mean, expanding_std, ewma, ewmstd, show_versions

In [2]: s = Series(range(4))

In [3]: expanding_mean(s, min_periods=2)
Out[3]:
0    NaN
1    0.5
2    1.0
3    1.5
dtype: float64

In [4]: expanding_std(s, min_periods=2)
Out[4]:
0         NaN
1    0.707107
2    1.000000
3    1.290994
dtype: float64

In [5]: ewma(s, min_periods=2, halflife=3.)
Out[5]:
0         NaN
1         NaN
2    1.152678
3    1.784530
dtype: float64

In [6]: ewmstd(s, min_periods=2, halflife=3.)
Out[6]:
0         NaN
1         NaN
2    0.856493
3    1.162100
dtype: float64

In [7]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.2
numpy: 1.9.0b1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
@seth-p seth-p changed the title BUG: ewm*() functions interpret min_periods of by one? BUG: ewm*() functions interpret min_periods off by one? Jul 31, 2014
@seth-p
Copy link
Contributor Author

seth-p commented Jul 31, 2014

In particular, these lines in ewma() look fishy to me:

        first_index = _first_valid_index(v)
        result[first_index: first_index + min_periods] = NaN

I would have expected something like:

        if min_periods > 0:
            first_index = _first_valid_index(v)
            result[first_index: first_index + min_periods - 1] = NaN

@seth-p
Copy link
Contributor Author

seth-p commented Jul 31, 2014

Note that the meaning of min_periods is inconsistent between, say, ewma and rolling_mean: in ewma it affects only the beginning of the series, and it doesn't care about whether values are missing or not; in rolling_mean it refers to the number of non-null observations. This is documented as such, but it seems confusing.

For example,

In [20]: ewma(Series([5,None,None,None,None,5]), halflife=3., min_periods=3)
Out[20]:
0   NaN
1   NaN
2   NaN
3     5
4     5
5     5

returns 5 for the final 3 entries, even though there are not three non-null values in the whole series.
Again, this is documented as such, but I would have thought that the meaning of min_periods in ewma (and all the other ewm* functions) should be the same as in the rolling_* functions.

@seth-p
Copy link
Contributor Author

seth-p commented Aug 1, 2014

Also, I just noticed that ewmvar, ewmstd, ewmvol, ewmcov, rolling_var, rolling_std, returns 0.0 for a single value (assuming min_periods=0); whereas Series.std, Series.var, ewmcorr, expanding_cov, expanding_corr, expanding_std, expanding_vol, and expanding_var, rolling_cov, and rolling_corr all return NaN for a single value. Yikes. I think all of these should return NaN for a single value.

I created #7900 to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants