Skip to content

crosstabs doesn't work with margin and normalize together #27500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
min2bro opened this issue Jul 21, 2019 · 2 comments · Fixed by #27663
Closed

crosstabs doesn't work with margin and normalize together #27500

min2bro opened this issue Jul 21, 2019 · 2 comments · Fixed by #27663
Labels
Bug MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@min2bro
Copy link

min2bro commented Jul 21, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

pd.crosstab([df.A,df.B],df.C,margins=True,margins_name='Sub-Total',normalize=0)

Problem description

pandas.Crosstab:
As per the Documentation, For parameter normalize If margins is True, will also normalize margin values. However when I give normalize as True and margins as True with margin_names as a string then it throws following exception:

KeyError: "['Sub-Total'] not found in axis"

where Sub-Total is the margins_name String.

Expected Output

Normalize value of the Margin rows and columns

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 3.2.1
pip: 18.0
setuptools: 39.1.0
Cython: 0.26.1
numpy: 1.15.4
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.7
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None

@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2019

I think the issue here is the combination of normalize across the rows with a MultiIndex. Note if you did the following this works:

pd.crosstab([df.A],df.C, margins=True, normalize=0)

But the MultiIndex you are creating doesn't

pd.crosstab([df.A, df.B],df.C, margins=True, normalize=0)

The source for that is here - if you'd like to investigate and submit a PR to patch would certainly be welcome:

if normalize is not False:

@WillAyd WillAyd added Bug MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 21, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Jul 21, 2019
@charlesdong1991
Copy link
Member

charlesdong1991 commented Jul 30, 2019

I think i found the bug, i will submit PR tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
4 participants