@@ -59,6 +59,19 @@ The ``Series`` object has a method ``cov`` to compute covariance between series
59
59
Analogously, ``DataFrame `` has a method ``cov `` to compute pairwise covariances
60
60
among the series in the DataFrame, also excluding NA/null values.
61
61
62
+ .. _computation.covariance.caveats :
63
+
64
+ .. note ::
65
+
66
+ Assuming the missing data are missing at random this results in an estimate
67
+ for the covariance matrix which is unbiased. However, for many applications
68
+ this estimate may not be acceptable because the estimated covariance matrix
69
+ is not guaranteed to be positive semi-definite. This could lead to
70
+ estimated correlations having absolute values which are greater than one,
71
+ and/or a non-invertible covariance matrix. See `Estimation of covariance
72
+ matrices <http://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices> `_
73
+ for more details.
74
+
62
75
.. ipython :: python
63
76
64
77
frame = DataFrame(randn(1000 , 5 ), columns = [' a' , ' b' , ' c' , ' d' , ' e' ])
@@ -99,6 +112,12 @@ correlation methods are provided:
99
112
100
113
All of these are currently computed using pairwise complete observations.
101
114
115
+ .. note ::
116
+
117
+ Please see the :ref: `caveats <computation.covariance.caveats >` associated
118
+ with this method of calculating correlation matrices in the
119
+ :ref: `covariance section <computation.covariance >`.
120
+
102
121
.. ipython :: python
103
122
104
123
frame = DataFrame(randn(1000 , 5 ), columns = [' a' , ' b' , ' c' , ' d' , ' e' ])
@@ -325,11 +344,14 @@ Binary rolling moments
325
344
two ``Series `` or any combination of ``DataFrame/Series `` or
326
345
``DataFrame/DataFrame ``. Here is the behavior in each case:
327
346
328
- - two ``Series ``: compute the statistic for the pairing
347
+ - two ``Series ``: compute the statistic for the pairing.
329
348
- ``DataFrame/Series ``: compute the statistics for each column of the DataFrame
330
- with the passed Series, thus returning a DataFrame
331
- - ``DataFrame/DataFrame ``: compute statistic for matching column names,
332
- returning a DataFrame
349
+ with the passed Series, thus returning a DataFrame.
350
+ - ``DataFrame/DataFrame ``: by default compute the statistic for matching column
351
+ names, returning a DataFrame. If the keyword argument ``pairwise=True `` is
352
+ passed then computes the statistic for each pair of columns, returning a
353
+ ``Panel `` whose ``items `` are the dates in question (see :ref: `the next section
354
+ <stats.moments.corr_pairwise>`).
333
355
334
356
For example:
335
357
@@ -340,20 +362,42 @@ For example:
340
362
341
363
.. _stats.moments.corr_pairwise :
342
364
343
- Computing rolling pairwise correlations
344
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
365
+ Computing rolling pairwise covariances and correlations
366
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
345
367
346
- In financial data analysis and other fields it's common to compute correlation
347
- matrices for a collection of time series. More difficult is to compute a
348
- moving-window correlation matrix. This can be done using the
349
- ``rolling_corr_pairwise `` function, which yields a ``Panel `` whose ``items ``
350
- are the dates in question:
368
+ In financial data analysis and other fields it's common to compute covariance
369
+ and correlation matrices for a collection of time series. Often one is also
370
+ interested in moving-window covariance and correlation matrices. This can be
371
+ done by passing the ``pairwise `` keyword argument, which in the case of
372
+ ``DataFrame `` inputs will yield a ``Panel `` whose ``items `` are the dates in
373
+ question. In the case of a single DataFrame argument the ``pairwise `` argument
374
+ can even be omitted:
375
+
376
+ .. note ::
377
+
378
+ Missing values are ignored and each entry is computed using the pairwise
379
+ complete observations. Please see the :ref: `covariance section
380
+ <computation.covariance>` for :ref: `caveats
381
+ <computation.covariance.caveats>` associated with this method of
382
+ calculating covariance and correlation matrices.
351
383
352
384
.. ipython :: python
353
385
354
- correls = rolling_corr_pairwise(df, 50 )
386
+ covs = rolling_cov(df[[' B' ,' C' ,' D' ]], df[[' A' ,' B' ,' C' ]], 50 , pairwise = True )
387
+ covs[df.index[- 50 ]]
388
+
389
+ .. ipython :: python
390
+
391
+ correls = rolling_corr(df, 50 )
355
392
correls[df.index[- 50 ]]
356
393
394
+ .. note ::
395
+
396
+ Prior to version 0.14 this was available through ``rolling_corr_pairwise ``
397
+ which is now simply syntactic sugar for calling ``rolling_corr(...,
398
+ pairwise=True) `` and deprecated. This is likely to be removed in a future
399
+ release.
400
+
357
401
You can efficiently retrieve the time series of correlations between two
358
402
columns using ``ix `` indexing:
359
403
0 commit comments