Skip to content

Commit 3da1f79

Browse files
committed
Merge pull request #7926 from seth-p/ewmvar_bias_correction
API/BUG/ENH: ewmvar/cov debiasing factors; add 'adjust' to ewmvar/std/vol/cov/corr; ewm*() min_periods
2 parents e52efb6 + f82f396 commit 3da1f79

File tree

5 files changed

+718
-152
lines changed

5 files changed

+718
-152
lines changed

doc/source/computation.rst

+94-36
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,8 @@ columns using ``ix`` indexing:
413413
@savefig rolling_corr_pairwise_ex.png
414414
correls.ix[:, 'A', 'C'].plot()
415415
416+
.. _stats.moments.expanding:
417+
416418
Expanding window moment functions
417419
---------------------------------
418420
A common alternative to rolling statistics is to use an *expanding* window,
@@ -485,60 +487,79 @@ relative impact of an individual data point. As an example, here is the
485487
@savefig expanding_mean_frame.png
486488
expanding_mean(ts).plot(style='k')
487489
490+
.. _stats.moments.exponentially_weighted:
491+
488492
Exponentially weighted moment functions
489493
---------------------------------------
490494

491-
A related set of functions are exponentially weighted versions of many of the
492-
above statistics. A number of EW (exponentially weighted) functions are
493-
provided using the blending method. For example, where :math:`y_t` is the
494-
result and :math:`x_t` the input, we compute an exponentially weighted moving
495-
average as
495+
A related set of functions are exponentially weighted versions of several of
496+
the above statistics. A number of expanding EW (exponentially weighted)
497+
functions are provided:
498+
499+
.. csv-table::
500+
:header: "Function", "Description"
501+
:widths: 20, 80
502+
503+
``ewma``, EW moving average
504+
``ewmvar``, EW moving variance
505+
``ewmstd``, EW moving standard deviation
506+
``ewmcorr``, EW moving correlation
507+
``ewmcov``, EW moving covariance
508+
509+
In general, a weighted moving average is calculated as
496510

497511
.. math::
498512
499-
y_t = (1 - \alpha) y_{t-1} + \alpha x_t
513+
y_t = \frac{\sum_{i=0}^t w_i x_{t-i}}{\sum_{i=0}^t w_i},
500514
501-
One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
502-
directly, it's easier to think about either the **span**, **center of mass
503-
(com)** or **halflife** of an EW moment:
515+
where :math:`x_t` is the input at :math:`y_t` is the result.
516+
517+
The EW functions support two variants of exponential weights:
518+
The default, ``adjust=True``, uses the weights :math:`w_i = (1 - \alpha)^i`.
519+
When ``adjust=False`` is specified, moving averages are calculated as
504520

505521
.. math::
506522
507-
\alpha =
508-
\begin{cases}
509-
\frac{2}{s + 1}, s = \text{span}\\
510-
\frac{1}{1 + c}, c = \text{center of mass}\\
511-
1 - \exp^{\frac{\log 0.5}{h}}, h = \text{half life}
523+
y_0 &= x_0 \\
524+
y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,
525+
526+
which is equivalent to using weights
527+
528+
.. math::
529+
530+
w_i = \begin{cases}
531+
\alpha (1 - \alpha)^i & \text{if } i < t \\
532+
(1 - \alpha)^i & \text{if } i = t.
512533
\end{cases}
513534
514535
.. note::
515536

516-
the equation above is sometimes written in the form
537+
These equations are sometimes written in terms of :math:`\alpha' = 1 - \alpha`, e.g.
538+
539+
.. math::
517540
518-
.. math::
541+
y_t = \alpha' y_{t-1} + (1 - \alpha') x_t.
519542
520-
y_t = \alpha' y_{t-1} + (1 - \alpha') x_t
543+
One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
544+
directly, it's easier to think about either the **span**, **center of mass
545+
(com)** or **halflife** of an EW moment:
521546

522-
where :math:`\alpha' = 1 - \alpha`.
547+
.. math::
523548
524-
You can pass one of the three to these functions but not more. **Span**
549+
\alpha =
550+
\begin{cases}
551+
\frac{2}{s + 1}, & s = \text{span}\\
552+
\frac{1}{1 + c}, & c = \text{center of mass}\\
553+
1 - \exp^{\frac{\log 0.5}{h}}, & h = \text{half life}
554+
\end{cases}
555+
556+
One must specify precisely one of the three to the EW functions. **Span**
525557
corresponds to what is commonly called a "20-day EW moving average" for
526558
example. **Center of mass** has a more physical interpretation. For example,
527559
**span** = 20 corresponds to **com** = 9.5. **Halflife** is the period of
528-
time for the exponential weight to reduce to one half. Here is the list of
529-
functions available:
530-
531-
.. csv-table::
532-
:header: "Function", "Description"
533-
:widths: 20, 80
534-
535-
``ewma``, EW moving average
536-
``ewmvar``, EW moving variance
537-
``ewmstd``, EW moving standard deviation
538-
``ewmcorr``, EW moving correlation
539-
``ewmcov``, EW moving covariance
560+
time for the exponential weight to reduce to one half.
540561

541-
Here are an example for a univariate time series:
562+
Here is an example for a univariate time series:
542563

543564
.. ipython:: python
544565
@@ -548,8 +569,45 @@ Here are an example for a univariate time series:
548569
@savefig ewma_ex.png
549570
ewma(ts, span=20).plot(style='k')
550571
551-
.. note::
572+
All the EW functions have a ``min_periods`` argument, which has the same
573+
meaning it does for all the ``expanding_`` and ``rolling_`` functions:
574+
no output values will be set until at least ``min_periods`` non-null values
575+
are encountered in the (expanding) window.
576+
(This is a change from versions prior to 0.15.0, in which the ``min_periods``
577+
argument affected only the ``min_periods`` consecutive entries starting at the
578+
first non-null value.)
579+
580+
All the EW functions also have an ``ignore_na`` argument, which deterines how
581+
intermediate null values affect the calculation of the weights.
582+
When ``ignore_na=False`` (the default), weights are calculated based on absolute
583+
positions, so that intermediate null values affect the result.
584+
When ``ignore_na=True`` (which reproduces the behavior in versions prior to 0.15.0),
585+
weights are calculated by ignoring intermediate null values.
586+
For example, assuming ``adjust=True``, if ``ignore_na=False``, the weighted
587+
average of ``3, NaN, 5`` would be calculated as
588+
589+
.. math::
590+
591+
\frac{(1-\alpha)^2 \cdot 3 + 1 \cdot 5}{(1-\alpha)^2 + 1}
592+
593+
Whereas if ``ignore_na=True``, the weighted average would be calculated as
594+
595+
.. math::
596+
597+
\frac{(1-\alpha) \cdot 3 + 1 \cdot 5}{(1-\alpha) + 1}.
598+
599+
The ``ewmvar``, ``ewmstd``, and ``ewmcov`` functions have a ``bias`` argument,
600+
specifying whether the result should contain biased or unbiased statistics.
601+
For example, if ``bias=True``, ``ewmvar(x)`` is calculated as
602+
``ewmvar(x) = ewma(x**2) - ewma(x)**2``;
603+
whereas if ``bias=False`` (the default), the biased variance statistics
604+
are scaled by debiasing factors
605+
606+
.. math::
607+
608+
\frac{\left(\sum_{i=0}^t w_i\right)^2}{\left(\sum_{i=0}^t w_i\right)^2 - \sum_{i=0}^t w_i^2}.
552609
553-
The EW functions perform a standard adjustment to the initial observations
554-
whereby if there are fewer observations than called for in the span, those
555-
observations are reweighted accordingly.
610+
(For :math:`w_i = 1`, this reduces to the usual :math:`N / (N - 1)` factor,
611+
with :math:`N = t + 1`.)
612+
See http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance
613+
for further details.

doc/source/v0.15.0.txt

+105-25
Original file line numberDiff line numberDiff line change
@@ -83,25 +83,8 @@ API changes
8383

8484
rolling_min(s, window=10, min_periods=5)
8585

86-
- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
87-
now have an optional ``ignore_na`` argument.
88-
When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
89-
When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
90-
(:issue:`7543`)
91-
92-
.. ipython:: python
93-
94-
ewma(Series([None, 1., 100.]), com=2.5)
95-
ewma(Series([1., None, 100.]), com=2.5, ignore_na=True) # pre-0.15.0 behavior
96-
ewma(Series([1., None, 100.]), com=2.5, ignore_na=False) # default
97-
98-
- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcorr`, and :func:`ewmcov`
99-
now set to ``NaN`` the first ``min_periods-1`` entries of the result (for ``min_periods>1``).
100-
Previously the first ``min_periods`` entries of the result were set to ``NaN``.
101-
The new behavior accords with the existing documentation. (:issue:`7884`)
102-
10386
- :func:`rolling_max`, :func:`rolling_min`, :func:`rolling_sum`, :func:`rolling_mean`, :func:`rolling_median`,
104-
:func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, and :func:`rolling_quantile`,
87+
:func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, :func:`rolling_quantile`,
10588
:func:`rolling_cov`, :func:`rolling_corr`, :func:`rolling_corr_pairwise`,
10689
:func:`rolling_window`, and :func:`rolling_apply` with ``center=True`` previously would return a result of the same
10790
structure as the input ``arg`` with ``NaN`` in the final ``(window-1)/2`` entries.
@@ -112,27 +95,75 @@ API changes
11295

11396
.. code-block:: python
11497

115-
In [7]: rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
98+
In [7]: rolling_sum(Series(range(4)), window=3, min_periods=0, center=True)
11699
Out[7]:
117100
0 1
118101
1 3
119102
2 6
120-
3 9
121-
4 NaN
103+
3 NaN
122104
dtype: float64
123-
124-
New behavior (note final value is ``7 = sum([3, 4, NaN])``):
105+
106+
New behavior (note final value is ``5 = sum([2, 3, NaN])``):
125107

126108
.. ipython:: python
127109

128-
rolling_sum(Series(range(5)), window=3, min_periods=0, center=True)
110+
rolling_sum(Series(range(4)), window=3, min_periods=0, center=True)
129111

130112
- Removed ``center`` argument from :func:`expanding_max`, :func:`expanding_min`, :func:`expanding_sum`,
131113
:func:`expanding_mean`, :func:`expanding_median`, :func:`expanding_std`, :func:`expanding_var`,
132114
:func:`expanding_skew`, :func:`expanding_kurt`, :func:`expanding_quantile`, :func:`expanding_count`,
133115
:func:`expanding_cov`, :func:`expanding_corr`, :func:`expanding_corr_pairwise`, and :func:`expanding_apply`,
134116
as the results produced when ``center=True`` did not make much sense. (:issue:`7925`)
135117

118+
- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
119+
now interpret ``min_periods`` in the same manner that the ``rolling_*`` and ``expanding_*`` functions do:
120+
a given result entry will be ``NaN`` if the (expanding, in this case) window does not contain
121+
at least ``min_periods`` values. The previous behavior was to set to ``NaN`` the ``min_periods`` entries
122+
starting with the first non- ``NaN`` value. (:issue:`7977`)
123+
124+
Prior behavior (note values start at index ``2``, which is ``min_periods`` after index ``0``
125+
(the index of the first non-empty value)):
126+
127+
.. ipython:: python
128+
129+
s = Series([1, None, None, None, 2, 3])
130+
131+
.. code-block:: python
132+
133+
In [51]: ewma(s, com=3., min_periods=2)
134+
Out[51]:
135+
0 NaN
136+
1 NaN
137+
2 1.000000
138+
3 1.000000
139+
4 1.571429
140+
5 2.189189
141+
dtype: float64
142+
143+
New behavior (note values start at index ``4``, the location of the 2nd (since ``min_periods=2``) non-empty value):
144+
145+
.. ipython:: python
146+
147+
ewma(s, com=3., min_periods=2)
148+
149+
- :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
150+
now have an optional ``adjust`` argument, just like :func:`ewma` does,
151+
affecting how the weights are calculated.
152+
The default value of ``adjust`` is ``True``, which is backwards-compatible.
153+
See :ref:`Exponentially weighted moment functions <stats.moments.exponentially_weighted>` for details. (:issue:`7911`)
154+
155+
- :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr`
156+
now have an optional ``ignore_na`` argument.
157+
When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation.
158+
When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation.
159+
(:issue:`7543`)
160+
161+
.. ipython:: python
162+
163+
ewma(Series([None, 1., 8.]), com=2.)
164+
ewma(Series([1., None, 8.]), com=2., ignore_na=True) # pre-0.15.0 behavior
165+
ewma(Series([1., None, 8.]), com=2., ignore_na=False) # new default
166+
136167
- Bug in passing a ``DatetimeIndex`` with a timezone that was not being retained in DataFrame construction from a dict (:issue:`7822`)
137168

138169
In prior versions this would drop the timezone.
@@ -583,12 +614,61 @@ Bug Fixes
583614
- Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
584615
- Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)
585616
- Bug in ``StataReader`` where strings were always converted to 244 characters-fixed width irrespective of underlying string size (:issue:`7858`)
586-
- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``
617+
618+
- Bug in :func:`expanding_cov`, :func:`expanding_corr`, :func:`rolling_cov`, :func:`rolling_cor`, :func:`ewmcov`, and :func:`ewmcorr`
587619
returning results with columns sorted by name and producing an error for non-unique columns;
588620
now handles non-unique columns and returns columns in original order
589621
(except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`)
590622
- Bug in :func:`rolling_count` and ``expanding_*`` functions unnecessarily producing error message for zero-length data (:issue:`8056`)
591623
- Bug in :func:`rolling_apply` and :func:`expanding_apply` interpreting ``min_periods=0`` as ``min_periods=1`` (:issue:`8080`)
624+
- Bug in :func:`expanding_std` and :func:`expanding_var` for a single value producing a confusing error message (:issue:`7900`)
625+
- Bug in :func:`rolling_std` and :func:`rolling_var` for a single value producing ``0`` rather than ``NaN`` (:issue:`7900`)
626+
627+
- Bug in :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, and :func:`ewmcov`
628+
calculation of de-biasing factors when ``bias=False`` (the default).
629+
Previously an incorrect constant factor was used, based on ``adjust=True``, ``ignore_na=True``,
630+
and an infinite number of observations.
631+
Now a different factor is used for each entry, based on the actual weights
632+
(analogous to the usual ``N/(N-1)`` factor).
633+
In particular, for a single point a value of ``NaN`` is returned when ``bias=False``,
634+
whereas previously a value of (approximately) ``0`` was returned.
635+
636+
For example, consider the following pre-0.15.0 results for ``ewmvar(..., bias=False)``,
637+
and the corresponding debiasing factors:
638+
639+
.. ipython:: python
640+
641+
s = Series([1., 2., 0., 4.])
642+
643+
.. code-block:: python
644+
645+
In [69]: ewmvar(s, com=2., bias=False)
646+
Out[69]:
647+
0 -2.775558e-16
648+
1 3.000000e-01
649+
2 9.556787e-01
650+
3 3.585799e+00
651+
dtype: float64
652+
653+
In [70]: ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)
654+
Out[70]:
655+
0 1.25
656+
1 1.25
657+
2 1.25
658+
3 1.25
659+
dtype: float64
660+
661+
Note that entry ``0`` is approximately 0, and the debiasing factors are a constant 1.25.
662+
By comparison, the following 0.15.0 results have a ``NaN`` for entry ``0``,
663+
and the debiasing factors are decreasing (towards 1.25):
664+
665+
.. ipython:: python
666+
667+
ewmvar(s, com=2., bias=False)
668+
ewmvar(s, com=2., bias=False) / ewmvar(s, com=2., bias=True)
669+
670+
See :ref:`Exponentially weighted moment functions <stats.moments.exponentially_weighted>` for details. (:issue:`7912`)
671+
592672
- Bug in ``DataFrame.plot`` and ``Series.plot`` may ignore ``rot`` and ``fontsize`` keywords (:issue:`7844`)
593673

594674

0 commit comments

Comments
 (0)