Skip to content

BUG: ewma() weights incorrect when some values are missing #7603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

seth-p
Copy link
Contributor

@seth-p seth-p commented Jun 28, 2014

Closes #7543.

@jreback
Copy link
Contributor

jreback commented Jun 29, 2014

can u put the example here before and after

@seth-p
Copy link
Contributor Author

seth-p commented Jun 29, 2014

Here's "before". Note that the final value in the result (the weighted average of 1 and 100) is the same in [2] and [3] and the same in [4] and [5], i.e. the None in the middle does not affect the calculation.

In [1]: from pandas import Series, ewma

In [2]: ewma(Series([1., 100.]), com=2.5, adjust=True)
Out[2]:
0     1.00
1    58.75
dtype: float64

In [3]: ewma(Series([1., None, 100.]), com=2.5, adjust=True)
Out[3]:
0     1.00
1     1.00
2    58.75
dtype: float64

In [4]: ewma(Series([1., 100.]), com=2.5, adjust=False)
Out[4]:
0     1.000000
1    29.285714
dtype: float64

In [5]: ewma(Series([1., None, 100.]), com=2.5, adjust=False)
Out[5]:
0     1.000000
1     1.000000
2    29.285714
dtype: float64

Here's the comparable "after". Note that now the weighted average of 1 and 100 in [3] differs from [2], and that that in [5] differs from [4]. (Note also that in the absence of missing values, the new values are identical to the old, e.g. [2] and [4].)

In [1]: from pandas import Series, ewma

In [2]: ewma(Series([1., 100.]), com=2.5, adjust=True)
Out[2]:
0     1.00
1    58.75
dtype: float64

In [3]: ewma(Series([1., None, 100.]), com=2.5, adjust=True)
Out[3]:
0     1.000000
1     1.000000
2    66.554054
dtype: float64

In [4]: ewma(Series([1., 100.]), com=2.5, adjust=False)
Out[4]:
0     1.000000
1    29.285714
dtype: float64

In [5]: ewma(Series([1., None, 100.]), com=2.5, adjust=False)
Out[5]:
0     1.000000
1     1.000000
2    36.538462
dtype: float64

Finally, the following code shows the weights used by the "old" and "new" methods to calculate the weighted average of 1 and 100 in the presence of a missing value in between them, i.e. [3] and [5] above. First, in [11] and [12] I reproduce the old calculations (which simply ignore the missing value). Then in [13] and [14] I reproduce the new calculations. Note that the only difference is in whether the weight of the two-periods-ago value is (1.-alpha) (old method) or (1.-alpha)**2 (new method).

In [9]: def fake_ewma_using_weights(s, w):
   ...:         return (s.multiply(w).cumsum() / w.cumsum()).fillna(method='ffill')
   ...:

In [10]: alpha = 1. / (2.5 + 1.)

In [11]: fake_ewma_using_weights(Series([1., None, 100.]), Series([(1.-alpha), None, 1.])) # adjust=True, old method
Out[11]:
0     1.00
1     1.00
2    58.75
dtype: float64

In [12]: fake_ewma_using_weights(Series([1., None, 100.]), Series([(1.-alpha), None, alpha])) # adjust=False, old method
Out[12]:
0     1.000000
1     1.000000
2    29.285714
dtype: float64

In [13]: fake_ewma_using_weights(Series([1., None, 100.]), Series([(1.-alpha)**2, None, 1.])) # adjust=True, new method
Out[13]:
0     1.000000
1     1.000000
2    66.554054
dtype: float64

In [14]: fake_ewma_using_weights(Series([1., None, 100.]), Series([(1.-alpha)**2, None, alpha])) # adjust=False, new method
Out[14]:
0     1.000000
1     1.000000
2    36.538462
dtype: float64

@jreback jreback added this to the 0.15.0 milestone Jul 6, 2014
@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

@seth-p this is technically an API change, yes?

@seth-p
Copy link
Contributor Author

seth-p commented Jul 21, 2014

Yes, I guess this should be considered an API change.

On Jul 21, 2014, at 8:17 AM, jreback [email protected] wrote:

@seth-p this is technically an API change, yes?


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

ok, can you add a release note (API section), with an example showing what is changing? thxs

@seth-p
Copy link
Contributor Author

seth-p commented Jul 21, 2014

Will do.

On Jul 21, 2014, at 9:15 AM, jreback [email protected] wrote:

ok, can you add a release note (API section), with an example showing what is changing? thxs


Reply to this email directly or view it on GitHub.

@seth-p
Copy link
Contributor Author

seth-p commented Jul 23, 2014

I've updated v0.15.0.txt to reflect this API change. Apologies if it doesn't build properly, as I can't build the docs (using Python 3.4 on Windows).

I also fixed the the description of #7766, which appeared to have been truncated.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

no you don't need to do anythjng like that

that's the point of the ipython block

it runs the code and generates the output
the only reason I has to do it with the other one it's it's a code-block which is basically verbatim pasting

reverse all that out pls

@seth-p
Copy link
Contributor Author

seth-p commented Jul 23, 2014

Oh..... I see. OK, will fix.

@seth-p
Copy link
Contributor Author

seth-p commented Jul 23, 2014

OK, I reversed that. I hope I got it more or less right...

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

I am thinking that this should be a keyword to control this behavior (mainly to provide backward compat
)

maybe dropna=True (meaning use the new weighting scheme)
and False use the old

how hard to allow this?

@seth-p
Copy link
Contributor Author

seth-p commented Jul 23, 2014

Hmm. Probably not too hard, though I would think (a) dropna=True would be the old behavior, and (b) the default should be dropna=False (new behavior). [Though maybe "ignorena" is more descriptive than "dropna", as "dropna" suggests (at least to me) that what's being done is equivalent to ewma(s.dropna()), which isn't quite the case.]

Personally I think the old behavior is just plain wrong, but I can live with it if you want it to be the default (and maybe we can change the default in the future...).

BTW, is there a forum for discussing such things? I've seen discussions like this in the comments of issues, but I can't imagine that has a wide audience. And my post to https://groups.google.com/forum/#!topic/pydata/woxbB8na3C8 didn't elicit too much feedback...

@jreback
Copy link
Contributor

jreback commented Jul 23, 2014

I find it easier to discuss here
granted the audience is not necessarily that wide - if it's a big enough deal can send to the mailing list

I am ok with having this new behavior the default
just want to have a way to have compat if needed

ignore_na is ok for an argument

@seth-p
Copy link
Contributor Author

seth-p commented Jul 23, 2014

OK, will try to add ignore_na=False to all the ewm*() functions.

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

I added ignore_na=False to all ewm*() functions. Let me know how this looks.

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

One of the Travis builds failed, but I can't say I understand why: https://travis-ci.org/pydata/pandas/jobs/30710499.
The others all passed: https://travis-ci.org/pydata/pandas/builds/30710494.


Prior to 0.15.0

.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is block is not necessary (as the ignore_na arg shows what you are doing (you can put a comment their),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this block.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

looks good...ping when green

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

All is well — The Travis CI build passed

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

merged via 24b309f

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ewma() doesn't adjust weights properly for missing values
2 participants