Skip to content

BUG: pd.crosstab fails when passed multiple columns, margins True and normalize True #35150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 19, 2020
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,7 @@ Reshaping

- Bug in :meth:`DataFrame.pivot_table` with ``aggfunc='count'`` or ``aggfunc='sum'`` returning ``NaN`` for missing categories when pivoted on a ``Categorical``. Now returning ``0`` (:issue:`31422`)
- Bug in :func:`union_indexes` where input index names are not preserved in some cases. Affects :func:`concat` and :class:`DataFrame` constructor (:issue:`13475`)
- Bug in func :meth:`crosstab` when using multiple columns with ``margins=True`` and ``normalize=True`` (:issue:`35144`)
-

Sparse
Expand Down
7 changes: 3 additions & 4 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -670,12 +670,11 @@ def _normalize(table, normalize, margins: bool, margins_name="All"):
# keep index and column of pivoted table
table_index = table.index
table_columns = table.columns
last_ind_or_col = table.iloc[-1, :].name

# check if margin name is in (for MI cases) or equal to last
# check if margin name is not in (for MI cases) and not equal to last
# index/column and save the column and index margin
if (margins_name not in table.iloc[-1, :].name) | (
margins_name != table.iloc[:, -1].name
):
if (margins_name not in last_ind_or_col) & (margins_name != last_ind_or_col):
raise ValueError(f"{margins_name} not in pivoted DataFrame")
column_margin = table.iloc[:-1, -1]
index_margin = table.iloc[-1, :-1]
Expand Down
45 changes: 45 additions & 0 deletions pandas/tests/reshape/test_crosstab.py
Original file line number Diff line number Diff line change
Expand Up @@ -698,3 +698,48 @@ def test_margin_normalize(self):
names=["A", "B"],
)
tm.assert_frame_equal(result, expected)

def test_margin_normalize_multiple_columns(self):
# GH 35144
# use multiple columns with margins and normalization
df = DataFrame(
{
"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
"C": [
"small",
"large",
"large",
"small",
"small",
"large",
"small",
"small",
"large",
],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
}
)
result = crosstab(
index=df.C,
columns=[df.A, df.B],
margins=True,
margins_name="margin",
normalize=True,
)
expected = DataFrame(
[
[0.111111, 0.111111, 0.222222, 0.000000, 0.444444],
[0.111111, 0.111111, 0.111111, 0.222222, 0.555556],
[0.222222, 0.222222, 0.333333, 0.222222, 1.0],
],
index=["large", "small", "margin"],
)
expected.columns = MultiIndex(
levels=[["bar", "foo", "margin"], ["", "one", "two"]],
codes=[[0, 0, 1, 1, 2], [1, 2, 1, 2, 0]],
names=["A", "B"],
)
expected.index.name = "C"
tm.assert_frame_equal(result, expected)