Skip to content

Commit 3ee62d3

Browse files
committed
API: df.rolling(..).corr()/cov() when pairwise=True to return MI DataFrame
xref pandas-dev#15601
1 parent a23365b commit 3ee62d3

File tree

4 files changed

+297
-205
lines changed

4 files changed

+297
-205
lines changed

doc/source/computation.rst

+15-5
Original file line numberDiff line numberDiff line change
@@ -505,13 +505,18 @@ two ``Series`` or any combination of ``DataFrame/Series`` or
505505
- ``DataFrame/DataFrame``: by default compute the statistic for matching column
506506
names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
507507
passed then computes the statistic for each pair of columns, returning a
508-
``Panel`` whose ``items`` are the dates in question (see :ref:`the next section
508+
``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see :ref:`the next section
509509
<stats.moments.corr_pairwise>`).
510510

511511
For example:
512512

513513
.. ipython:: python
514514
515+
df = pd.DataFrame(np.random.randn(1000, 4),
516+
index=pd.date_range('1/1/2000', periods=1000),
517+
columns=['A', 'B', 'C', 'D'])
518+
df = df.cumsum()
519+
515520
df2 = df[:20]
516521
df2.rolling(window=5).corr(df2['B'])
517522
@@ -520,11 +525,16 @@ For example:
520525
Computing rolling pairwise covariances and correlations
521526
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
522527

528+
.. warning::
529+
530+
Prior to version 0.20.0 if ``pairwise=True`` was passed, a ``Panel`` would be returned.
531+
This will now return a 2-level MultiIndexed DataFrame, see the whatsnew :ref:`here <whatsnew_0200.api_breaking.rolling_pairwise>`
532+
523533
In financial data analysis and other fields it's common to compute covariance
524534
and correlation matrices for a collection of time series. Often one is also
525535
interested in moving-window covariance and correlation matrices. This can be
526536
done by passing the ``pairwise`` keyword argument, which in the case of
527-
``DataFrame`` inputs will yield a ``Panel`` whose ``items`` are the dates in
537+
``DataFrame`` inputs will yield a ``MultiIndexed DataFrame`` whose ``index`` are the dates in
528538
question. In the case of a single DataFrame argument the ``pairwise`` argument
529539
can even be omitted:
530540

@@ -539,12 +549,12 @@ can even be omitted:
539549
.. ipython:: python
540550
541551
covs = df[['B','C','D']].rolling(window=50).cov(df[['A','B','C']], pairwise=True)
542-
covs[df.index[-50]]
552+
covs.iloc[-50].unstack()
543553
544554
.. ipython:: python
545555
546556
correls = df.rolling(window=50).corr()
547-
correls[df.index[-50]]
557+
correls.iloc[-50].unstack()
548558
549559
You can efficiently retrieve the time series of correlations between two
550560
columns using ``.loc`` indexing:
@@ -557,7 +567,7 @@ columns using ``.loc`` indexing:
557567
.. ipython:: python
558568
559569
@savefig rolling_corr_pairwise_ex.png
560-
correls.loc[:, 'A', 'C'].plot()
570+
correls[('A', 'C')].plot()
561571
562572
.. _stats.aggregate:
563573

doc/source/whatsnew/v0.20.0.txt

+46
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ Highlights include:
1313
- The ``.ix`` indexer has been deprecated, see :ref:`here <whatsnew_0200.api_breaking.deprecate_ix>`
1414
- Switched the test framework to `pytest`_ (:issue:`13097`)
1515
- A new orient for JSON serialization, ``orient='table'``, that uses the Table Schema spec, see :ref:`here <whatsnew_0200.enhancements.table_schema>`
16+
- Window Binary Corr/Cov operations return a MultiIndex DataFrame rather than a Panel, see :ref:`here <whhatsnew_0200.api_breaking.rolling_pairwise>`
17+
1618

1719
.. _pytest: http://doc.pytest.org/en/latest/
1820

@@ -715,6 +717,50 @@ New Behavior:
715717

716718
df.groupby('A').agg([np.mean, np.std, np.min, np.max])
717719

720+
.. _whatsnew_0200.api_breaking.rolling_pairwise:
721+
722+
Window Binary Corr/Cov operations return a MultiIndex DataFrame
723+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
724+
725+
A binary window operation, like ``.corr()`` or ``.cov()``, when operating on a ``.rolling(..)``, ``.expanding(..)``, or ``.ewm(..)`` object,
726+
will now return a 2-level ``MultiIndexed DataFrame`` rather than a ``Panel``. These are equivalent in function,
727+
but MultiIndexed DataFrames enjoy more support in pandas.
728+
See the section on :ref:`Windowed Binary Operations <stats.moments.binary>` for more information. (:issue:`15677`)
729+
730+
.. ipython:: python
731+
732+
np.random.seed(1234)
733+
df = DataFrame(np.random.rand(100, 2),
734+
columns=['A', 'B'],
735+
index=pd.date_range('20160101', periods=100, freq='D'))
736+
df
737+
738+
Old Behavior:
739+
740+
.. code-block:: ipython
741+
742+
In [2]: df.rolling(12).corr()
743+
Out[2]:
744+
<class 'pandas.core.panel.Panel'>
745+
Dimensions: 100 (items) x 2 (major_axis) x 2 (minor_axis)
746+
Items axis: 2016-01-01 00:00:00 to 2016-04-09 00:00:00
747+
Major_axis axis: A to B
748+
Minor_axis axis: A to B
749+
750+
New Behavior:
751+
752+
.. ipython:: python
753+
754+
res = df.rolling(12).corr()
755+
res
756+
757+
Retrieving a correlation matrix for a specified index
758+
759+
.. ipython:: python
760+
761+
res.iloc[-1].unstack()
762+
763+
718764
.. _whatsnew_0200.api_breaking.hdfstore_where:
719765

720766
HDFStore where string comparison

pandas/core/window.py

+25-2
Original file line numberDiff line numberDiff line change
@@ -1652,7 +1652,8 @@ def _cov(x, y):
16521652

16531653

16541654
def _flex_binary_moment(arg1, arg2, f, pairwise=False):
1655-
from pandas import Series, DataFrame, Panel
1655+
from pandas import Series, DataFrame
1656+
16561657
if not (isinstance(arg1, (np.ndarray, Series, DataFrame)) and
16571658
isinstance(arg2, (np.ndarray, Series, DataFrame))):
16581659
raise TypeError("arguments to moment function must be of type "
@@ -1703,12 +1704,34 @@ def dataframe_from_int_dict(data, frame_template):
17031704
else:
17041705
results[i][j] = f(*_prep_binary(arg1.iloc[:, i],
17051706
arg2.iloc[:, j]))
1707+
1708+
# TODO: not the most efficient (perf-wise)
1709+
# though not bad code-wise
1710+
from pandas import Panel, MultiIndex
17061711
p = Panel.from_dict(results).swapaxes('items', 'major')
17071712
if len(p.major_axis) > 0:
17081713
p.major_axis = arg1.columns[p.major_axis]
17091714
if len(p.minor_axis) > 0:
17101715
p.minor_axis = arg2.columns[p.minor_axis]
1711-
return p
1716+
1717+
if len(p.items):
1718+
result = pd.concat(
1719+
[p.iloc[i].T for i in range(len(p.items))],
1720+
keys=p.items)
1721+
else:
1722+
1723+
result = DataFrame(
1724+
index=MultiIndex(levels=[arg1.columns, arg2.index],
1725+
labels=[[], []]),
1726+
columns=arg1.columns,
1727+
dtype='float64')
1728+
1729+
# reset our names
1730+
result.columns.name = None
1731+
result.index.names = ['major', 'minor']
1732+
1733+
return result
1734+
17121735
else:
17131736
raise ValueError("'pairwise' is not True/False")
17141737
else:

0 commit comments

Comments
 (0)