Skip to content

BUG: exponential moving window covariance fails for multiIndexed DataFrame #34943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 25, 2020
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -978,6 +978,7 @@ MultiIndex

- Bug in :meth:`MultiIndex.intersection` was not guaranteed to preserve order when ``sort=False``. (:issue:`31325`)
- Bug in :meth:`DataFrame.truncate` was dropping :class:`MultiIndex` names. (:issue:`34564`)
- Bug in :meth:`DataFrame.ewm.cov` was throwing ``AssertionError`` for :class:`MultiIndex` inputs (:issue:`34440`)

.. ipython:: python

Expand Down
5 changes: 4 additions & 1 deletion pandas/core/window/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,10 @@ def dataframe_from_int_dict(data, frame_template):
result.index = MultiIndex.from_product(
arg2.columns.levels + [result_index]
)
result = result.reorder_levels([2, 0, 1]).sort_index()
# GH 34440
num_levels = len(result.index.levels)
new_order = [num_levels - 1] + list(range(num_levels - 1))
result = result.reorder_levels(new_order).sort_index()
else:
result.index = MultiIndex.from_product(
[range(len(arg2.columns)), range(len(result_index))]
Expand Down
32 changes: 31 additions & 1 deletion pandas/tests/window/test_pairwise.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import numpy as np
import pytest

from pandas import DataFrame, Series, date_range
from pandas import DataFrame, MultiIndex, Series, date_range
import pandas._testing as tm
from pandas.core.algorithms import safe_sort

Expand Down Expand Up @@ -189,3 +189,33 @@ def test_corr_freq_memory_error(self):
result = s.rolling("12H").corr(s)
expected = Series([np.nan] * 5, index=date_range("2020", periods=5))
tm.assert_series_equal(result, expected)

def test_cov_mulittindex(self):
# GH 34440

# create multiindexed DataFrame
columns = MultiIndex.from_product([["a", "b"], ["x", "y"], [0, 1]])
index = range(3)
df = DataFrame(
np.random.normal(size=(len(index), len(columns))),
index=index,
columns=columns,
)

# flatten index then compute covariance
df_fi = df.copy()
df_fi.columns = ["".join(str(c) for c in col) for col in df_fi.columns.values]
cov_fi = df_fi.ewm(alpha=0.1).cov()
cov_fi.index = [
"".join(str(symbol) for symbol in row_label) for row_label in cov_fi.index
]

# compute covariance matrix then flatten its multtindex
df_mi = df.copy()
cov_mi = df_mi.ewm(alpha=0.1).cov()
cov_mi.columns = ["".join(str(c) for c in col) for col in cov_mi.columns.values]
cov_mi.index = [
"".join(str(symbol) for symbol in row_label) for row_label in cov_mi.index
]

tm.assert_frame_equal(cov_fi, cov_mi)