@@ -312,14 +312,36 @@ We provide a number of common statistical functions:
312
312
:meth: `~Rolling.median `, Arithmetic median of values
313
313
:meth: `~Rolling.min `, Minimum
314
314
:meth: `~Rolling.max `, Maximum
315
- :meth: `~Rolling.std `, Bessel-corrected sample standard deviation
316
- :meth: `~Rolling.var `, Unbiased variance
315
+ :meth: `~Rolling.std `, Sample standard deviation
316
+ :meth: `~Rolling.var `, Sample variance
317
317
:meth: `~Rolling.skew `, Sample skewness (3rd moment)
318
318
:meth: `~Rolling.kurt `, Sample kurtosis (4th moment)
319
319
:meth: `~Rolling.quantile `, Sample quantile (value at %)
320
320
:meth: `~Rolling.apply `, Generic apply
321
- :meth: `~Rolling.cov `, Unbiased covariance (binary)
322
- :meth: `~Rolling.corr `, Correlation (binary)
321
+ :meth: `~Rolling.cov `, Sample covariance (binary)
322
+ :meth: `~Rolling.corr `, Sample correlation (binary)
323
+
324
+ .. _computation.window_variance.caveats :
325
+
326
+ .. note ::
327
+
328
+ Please note that :meth: `~Rolling.std ` and :meth: `~Rolling.var ` use the sample
329
+ variance formula by default, i.e. the sum of squared differences is divided by
330
+ ``window_size - 1 `` and not by ``window_size `` during averaging. In statistics,
331
+ we use sample when the dataset is drawn from a larger population that we
332
+ don't have access to. Using it implies that the data in our window is a
333
+ random sample from the population, and we are interested not in the variance
334
+ inside the specific window but in the variance of some general window that
335
+ our windows represent. In this situation, using the sample variance formula
336
+ results in an unbiased estimator and so is preferred.
337
+
338
+ Usually, we are instead interested in the variance of each window as we slide
339
+ it over the data, and in this case we should specify ``ddof=0 `` when calling
340
+ these methods to use population variance instead of sample variance. Using
341
+ sample variance under the circumstances would result in a biased estimator
342
+ of the variable we are trying to determine.
343
+
344
+ The same caveats apply to using any supported statistical sample methods.
323
345
324
346
.. _stats.rolling_apply :
325
347
@@ -360,8 +382,8 @@ and their default values are set to ``False``, ``True`` and ``False`` respective
360
382
.. note ::
361
383
362
384
In terms of performance, **the first time a function is run using the Numba engine will be slow **
363
- as Numba will have some function compilation overhead. However, `` rolling `` objects will cache
364
- the function and subsequent calls will be fast. In general, the Numba engine is performant with
385
+ as Numba will have some function compilation overhead. However, the compiled functions are cached,
386
+ and subsequent calls will be fast. In general, the Numba engine is performant with
365
387
a larger amount of data points (e.g. 1+ million).
366
388
367
389
.. code-block :: ipython
@@ -848,14 +870,23 @@ Method summary
848
870
:meth: `~Expanding.median `, Arithmetic median of values
849
871
:meth: `~Expanding.min `, Minimum
850
872
:meth: `~Expanding.max `, Maximum
851
- :meth: `~Expanding.std `, Unbiased standard deviation
852
- :meth: `~Expanding.var `, Unbiased variance
853
- :meth: `~Expanding.skew `, Unbiased skewness (3rd moment)
854
- :meth: `~Expanding.kurt `, Unbiased kurtosis (4th moment)
873
+ :meth: `~Expanding.std `, Sample standard deviation
874
+ :meth: `~Expanding.var `, Sample variance
875
+ :meth: `~Expanding.skew `, Sample skewness (3rd moment)
876
+ :meth: `~Expanding.kurt `, Sample kurtosis (4th moment)
855
877
:meth: `~Expanding.quantile `, Sample quantile (value at %)
856
878
:meth: `~Expanding.apply `, Generic apply
857
- :meth: `~Expanding.cov `, Unbiased covariance (binary)
858
- :meth: `~Expanding.corr `, Correlation (binary)
879
+ :meth: `~Expanding.cov `, Sample covariance (binary)
880
+ :meth: `~Expanding.corr `, Sample correlation (binary)
881
+
882
+ .. note ::
883
+
884
+ Using sample variance formulas for :meth: `~Expanding.std ` and
885
+ :meth: `~Expanding.var ` comes with the same caveats as using them with rolling
886
+ windows. See :ref: `this section <computation.window_variance.caveats >` for more
887
+ information.
888
+
889
+ The same caveats apply to using any supported statistical sample methods.
859
890
860
891
.. currentmodule :: pandas
861
892
0 commit comments