Skip to content

Commit aa166bf

Browse files
committed
Merge pull request #4950 from snth/pairwise
Pairwise versions for rolling_cov, ewmcov and expanding_cov
2 parents 1ff776a + 1fcb94e commit aa166bf

File tree

4 files changed

+295
-177
lines changed

4 files changed

+295
-177
lines changed

doc/source/computation.rst

+56-12
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,19 @@ The ``Series`` object has a method ``cov`` to compute covariance between series
5959
Analogously, ``DataFrame`` has a method ``cov`` to compute pairwise covariances
6060
among the series in the DataFrame, also excluding NA/null values.
6161

62+
.. _computation.covariance.caveats:
63+
64+
.. note::
65+
66+
Assuming the missing data are missing at random this results in an estimate
67+
for the covariance matrix which is unbiased. However, for many applications
68+
this estimate may not be acceptable because the estimated covariance matrix
69+
is not guaranteed to be positive semi-definite. This could lead to
70+
estimated correlations having absolute values which are greater than one,
71+
and/or a non-invertible covariance matrix. See `Estimation of covariance
72+
matrices <http://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices>`_
73+
for more details.
74+
6275
.. ipython:: python
6376
6477
frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
@@ -99,6 +112,12 @@ correlation methods are provided:
99112
100113
All of these are currently computed using pairwise complete observations.
101114

115+
.. note::
116+
117+
Please see the :ref:`caveats <computation.covariance.caveats>` associated
118+
with this method of calculating correlation matrices in the
119+
:ref:`covariance section <computation.covariance>`.
120+
102121
.. ipython:: python
103122
104123
frame = DataFrame(randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
@@ -325,11 +344,14 @@ Binary rolling moments
325344
two ``Series`` or any combination of ``DataFrame/Series`` or
326345
``DataFrame/DataFrame``. Here is the behavior in each case:
327346

328-
- two ``Series``: compute the statistic for the pairing
347+
- two ``Series``: compute the statistic for the pairing.
329348
- ``DataFrame/Series``: compute the statistics for each column of the DataFrame
330-
with the passed Series, thus returning a DataFrame
331-
- ``DataFrame/DataFrame``: compute statistic for matching column names,
332-
returning a DataFrame
349+
with the passed Series, thus returning a DataFrame.
350+
- ``DataFrame/DataFrame``: by default compute the statistic for matching column
351+
names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
352+
passed then computes the statistic for each pair of columns, returning a
353+
``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
354+
<stats.moments.corr_pairwise>`).
333355

334356
For example:
335357

@@ -340,20 +362,42 @@ For example:
340362
341363
.. _stats.moments.corr_pairwise:
342364

343-
Computing rolling pairwise correlations
344-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
365+
Computing rolling pairwise covariances and correlations
366+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
345367

346-
In financial data analysis and other fields it's common to compute correlation
347-
matrices for a collection of time series. More difficult is to compute a
348-
moving-window correlation matrix. This can be done using the
349-
``rolling_corr_pairwise`` function, which yields a ``Panel`` whose ``items``
350-
are the dates in question:
368+
In financial data analysis and other fields it's common to compute covariance
369+
and correlation matrices for a collection of time series. Often one is also
370+
interested in moving-window covariance and correlation matrices. This can be
371+
done by passing the ``pairwise`` keyword argument, which in the case of
372+
``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
373+
question. In the case of a single DataFrame argument the ``pairwise`` argument
374+
can even be omitted:
375+
376+
.. note::
377+
378+
Missing values are ignored and each entry is computed using the pairwise
379+
complete observations. Please see the :ref:`covariance section
380+
<computation.covariance>` for :ref:`caveats
381+
<computation.covariance.caveats>` associated with this method of
382+
calculating covariance and correlation matrices.
351383

352384
.. ipython:: python
353385
354-
correls = rolling_corr_pairwise(df, 50)
386+
covs = rolling_cov(df[['B','C','D']], df[['A','B','C']], 50, pairwise=True)
387+
covs[df.index[-50]]
388+
389+
.. ipython:: python
390+
391+
correls = rolling_corr(df, 50)
355392
correls[df.index[-50]]
356393
394+
.. note::
395+
396+
Prior to version 0.14 this was available through ``rolling_corr_pairwise``
397+
which is now simply syntactic sugar for calling ``rolling_corr(...,
398+
pairwise=True)`` and deprecated. This is likely to be removed in a future
399+
release.
400+
357401
You can efficiently retrieve the time series of correlations between two
358402
columns using ``ix`` indexing:
359403

doc/source/v0.14.0.txt

+13
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,19 @@ These are out-of-bounds selections
183183

184184
Because of the default `align` value changes, coordinates of bar plots are now located on integer values (0.0, 1.0, 2.0 ...). This is intended to make bar plot be located on the same coodinates as line plot. However, bar plot may differs unexpectedly when you manually adjust the bar location or drawing area, such as using `set_xlim`, `set_ylim`, etc. In this cases, please modify your script to meet with new coordinates.
185185

186+
- ``pairwise`` keyword was added to the statistical moment functions
187+
``rolling_cov``, ``rolling_corr``, ``ewmcov``, ``ewmcorr``,
188+
``expanding_cov``, ``expanding_corr`` to allow the calculation of moving
189+
window covariance and correlation matrices (:issue:`4950`). See
190+
:ref:`Computing rolling pairwise covariances and correlations
191+
<stats.moments.corr_pairwise>` in the docs.
192+
193+
.. ipython:: python
194+
195+
df = DataFrame(np.random.randn(10,4),columns=list('ABCD'))
196+
covs = rolling_cov(df[['A','B','C']], df[['B','C','D']], 5, pairwise=True)
197+
covs[df.index[-1]]
198+
186199

187200
MultiIndexing Using Slicers
188201
~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)