Skip to content

BUG: pd.crosstab fails when passed multiple columns, margins True and normalize True #35150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 19, 2020
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1125,6 +1125,7 @@ Reshaping
- Bug in :meth:`Series.where` with an empty Series and empty ``cond`` having non-bool dtype (:issue:`34592`)
- Fixed regression where :meth:`DataFrame.apply` would raise ``ValueError`` for elements whth ``S`` dtype (:issue:`34529`)
- Bug in :meth:`DataFrame.append` leading to sorting columns even when ``sort=False`` is specified (:issue:`35092`)
- Bug in func :meth:`crosstab` when using multiple columns with ``margins=True`` and ``normalize=True`` (:issue:`35144`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 1.2


Sparse
^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,9 +671,9 @@ def _normalize(table, normalize, margins: bool, margins_name="All"):
table_index = table.index
table_columns = table.columns

# check if margin name is in (for MI cases) or equal to last
# check if margin name is not in (for MI cases) and not equal to last
# index/column and save the column and index margin
if (margins_name not in table.iloc[-1, :].name) | (
if (margins_name not in table.iloc[-1, :].name) & (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pull out
table_name = table.iloc[-1, :].name and use that as a comparison

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

margins_name != table.iloc[:, -1].name
):
raise ValueError(f"{margins_name} not in pivoted DataFrame")
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/reshape/test_crosstab.py
Original file line number Diff line number Diff line change
Expand Up @@ -698,3 +698,28 @@ def test_margin_normalize(self):
names=["A", "B"],
)
tm.assert_frame_equal(result, expected)

# GH 35144
# use multiple columns with margins and normalization
result = crosstab(
index=df.C,
columns=[df.A, df.B],
margins=True,
margins_name="margin",
normalize=True,
)
expected = DataFrame(
[
[0.111111, 0.111111, 0.222222, 0.000000, 0.444444],
[0.111111, 0.111111, 0.111111, 0.222222, 0.555556],
[0.222222, 0.222222, 0.333333, 0.222222, 1.0],
],
index=["large", "small", "margin"],
)
expected.columns = MultiIndex(
levels=[["bar", "foo", "margin"], ["", "one", "two"]],
codes=[[0, 0, 1, 1, 2], [1, 2, 1, 2, 0]],
names=["A", "B"],
)
expected.index.name = "C"
tm.assert_frame_equal(result, expected)