BUG: crosstab cannot normalize multiple columns for the index #15150

hurcy · 2017-01-18T08:04:03Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 6, 'B': ['A', 'B', 'C'] * 8, 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4, 'D': np.random.randn(24), 'E': np.random.randn(24)})

pd.crosstab([df.A, df.B], df.C, values=df.D, aggfunc=np.sum, normalize=True, margins=True)

Problem description

This gives "ValueError: labels ['All'] not contained in axis"

Currently, crosstab only normalize single index.
I expect crosstab can normalize with multiple columns for the index, but I got ValueError: labels ['All'] not contained in axis.

Expected Output

normalized result

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-01-18T09:11:41Z

@hurcy Thanks for the report!

I think the reason is the following: by specifying two columns for the index, a MultiIndex is created (the same problem will arise for the columns). And the normalizing code is not made robust for that, as it tries to access the 'All' column/row, but this isn't present, as in the MultiIndex this is the ('All', '') column/row.

Patches to fix is welcome!

cmohl2013 · 2017-05-28T16:19:03Z

Just ran into the same bug.

Possible solution: calling pivot_table in crosstab always with margins=False, then
do normalization and finally call _add_margins, if margins=True.

I could try to fix it that way..

…based on normalization type, corrected expected margin values in test_margin_dropna

mroeschke · 2019-10-14T01:03:26Z

This looks fixed in master. Could use a test.

In [210]: pd.crosstab([df.A, df.B], df.C, values=df.D, aggfunc=np.sum, normalize=True, margins=True)
     ...:
Out[210]:
C             bar       foo       All
A     B
one   A  0.344761  0.235317  0.580078
      B  0.085762  0.217868  0.303630
      C  0.218771 -0.180590  0.038180
three A -0.151981  0.000000 -0.151981
      B  0.000000 -0.292231 -0.292231
      C  0.131227  0.000000  0.131227
two   A  0.000000  0.456823  0.456823
      B  0.026522  0.000000  0.026522
      C  0.000000 -0.092249 -0.092249
All      0.655062  0.344938  1.000000

In [211]: pd.__version__
Out[211]: '0.26.0.dev0+555.gf7d162b18'

jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 18, 2017

jorisvandenbossche added this to the Next Major Release milestone Jan 18, 2017

cmohl2013 mentioned this issue Jun 5, 2017

BUG #15150 normalization of crosstable with multiindex and margins #16599

Closed

4 tasks

cmohl2013 pushed a commit to cmohl2013/pandas that referenced this issue Jun 14, 2017

pandas-dev#15150 added conditional calculation of crosstable margins …

bbb979c

…based on normalization type, corrected expected margin values in test_margin_dropna

toobaz added a commit to toobaz/pandas that referenced this issue Jul 19, 2017

Hack to fix pandas-dev#15150

0d4f3e3

toobaz added a commit to toobaz/pandas that referenced this issue Jul 19, 2017

Hack to fix pandas-dev#15150

3c58106

toobaz added a commit to toobaz/pandas that referenced this issue Jul 19, 2017

Hack to fix pandas-dev#15150

a6af3ac

cmohl2013 pushed a commit to cmohl2013/pandas that referenced this issue Jul 25, 2017

pandas-dev#15150 added conditional calculation of crosstable margins …

66ef8df

…based on normalization type, corrected expected margin values in test_margin_dropna

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 14, 2019

mroeschke mentioned this issue Jan 21, 2020

TST: Add more regression tests for fixed issues #31171

Merged

10 tasks

jreback modified the milestones: Contributions Welcome, 1.1 Jan 21, 2020

mroeschke closed this as completed in #31171 Jan 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: crosstab cannot normalize multiple columns for the index #15150

BUG: crosstab cannot normalize multiple columns for the index #15150

hurcy commented Jan 18, 2017 •

edited by jorisvandenbossche

Loading

INSTALLED VERSIONS

jorisvandenbossche commented Jan 18, 2017

cmohl2013 commented May 28, 2017

mroeschke commented Oct 14, 2019

BUG: crosstab cannot normalize multiple columns for the index #15150

BUG: crosstab cannot normalize multiple columns for the index #15150

Comments

hurcy commented Jan 18, 2017 • edited by jorisvandenbossche Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Jan 18, 2017

cmohl2013 commented May 28, 2017

mroeschke commented Oct 14, 2019

hurcy commented Jan 18, 2017 •

edited by jorisvandenbossche

Loading

Output of `pd.show_versions()`