Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351

pshMsGsAZL2n07ohWBCAGX9cC · 2019-12-19T17:26:29Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

fig = plt.figure()

data = [['Auto', 'A', 10], ['Auto', 'A', 2], ['Telco', 'B', 15], ['Manf', 'C'],
        ['Retail', 'A', ], ['Retail', 'B', ]] 
df = pd.DataFrame(data, columns = ['INDUSTRY', 'PRODUCT', 'COST'])

table_byindustry = pd.pivot_table(df, index=['INDUSTRY'], columns=['PRODUCT'], values='COST',
                                  fill_value=0, margins=True, margins_name='Total', aggfunc='count')
ax = sns.heatmap(table_byindustry, cmap='RdYlGn', linewidths=.1, annot=True, vmin=0, vmax=100, fmt="g", xticklabels=True)
ax.set_ylabel("Industry", fontsize=12)
ax.set_xlabel("Product", fontsize=12, labelpad=1)
ax.xaxis.tick_bottom()
ax.set_ylim(bottom=5,top=0)

Problem description

Pivot Table Margins ("Total") isn't showing 0 values for rows, yet will do so for columns (e.g., Product C). As a result, the heatmap is empty for the "Total" column related to the rows ('Manf' and 'Retail') in my example. It is treating those rows as NaN for Margin, and setting fill_value at 0 doesn't fix it. See attached image. Thank you for your efforts to fix this issue.

Expected Output

Based on the color map, there should be a color for every cell in the pivot table, including the Margins. In addition, if the value is 0, it should be shown when the margins=True. It seems as if the treatment is inconsistent (i.e., view the total for Product C vs. the totals for 'Manf' and 'Retail' in Industry).

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.88-1-MANJARO
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
seaborn : 0.9.0
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None

![bug_image_19Dec2019](https://user-images.githubusercontent.com/56904329/71194005-0816e600-2283-11ea-90b8-7fc69a79e28d.png)

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-12-20T16:54:10Z

I'm not sure how the plotting stuff is related, but based on the title this sounds like #3729. Let me know if not.

pshMsGsAZL2n07ohWBCAGX9cC · 2019-12-24T02:00:07Z

No, this issue is different from #3729. This issue isn’t dropping entire NaN rows like #3729, and my issue already uses the dummy variables suggested in that post as a workaround. My issue is dropping a NaN total in the margin row (but not column), while the dummy variables appear fine. I would suggest that you re-open it. Thanks again, and happy holidays.

pshMsGsAZL2n07ohWBCAGX9cC · 2019-12-24T04:22:40Z

Forgot to add in first clarification. This issue occurs without doing a heatmap plot. That is, the row margin total shows NaN, despite zeros for the row elements. Column margin total doesn’t have this issue.

pshMsGsAZL2n07ohWBCAGX9cC · 2019-12-29T20:42:59Z

I'm not sure how the plotting stuff is related, but based on the title this sounds like #3729. Let me know if not.

Tom, I added a clarification about a week ago, saying that I didn't think it was a duplicate. Can you please take a look? Apologies in advance, as I'm relatively new to github and don't understand exactly how issues are "re-opened". Thanks.

TomAugspurger · 2019-12-30T12:00:20Z

@pshMsGsAZL2n07ohWBCAGX9cC can you make a minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports Is seaborn / matplotlib necessary, or does just

In [2]: table_byindustry
Out[2]:
PRODUCT   A  B  C  Total
INDUSTRY
Auto      2  0  0    2.0
Manf      0  0  0    NaN
Retail    0  0  0    NaN
Telco     0  1  0    1.0
Total     2  1  0    3.0

display the issue? Is it that the total for Manf and Retail are NaN, rather than 0?

pshMsGsAZL2n07ohWBCAGX9cC · 2019-12-31T02:54:48Z

@TomAugspurger Seaborn and matplotlib aren't necessary, although that's how I first saw the problem. The dataframe (table_byindustry) shows the issue, which you correctly summarized. Total for the Manf and Retail rows are NaN, rather than 0. Thanks for re-opening it and sharing how I can better report future bugs with minimal information.

anthropikos · 2024-04-07T00:59:37Z

Seems like this behavior is due to margin calculation using the groupby() method and na are default dropped by groupby().

My workaround is to just use isna() on the pivot_table output.

pshMsGsAZL2n07ohWBCAGX9cC · 2024-04-09T01:56:15Z

@aoot Thanks for the workaround. At the time, I wanted a table that would properly show rows with zero values (e.g., Manf and Retail) rather than NaNs without the need to handle edge cases (e.g., use isna() to identify such rows then replace them with zeros). Looking back, I probably should have address the issue in data prep/cleansing before creating the table. However, my expectation was that pivot_table() and groupby() would be more Excel-like in their behavior with respect to this issue ;) Thanks again for sharing.

TomAugspurger closed this as completed Dec 20, 2019

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Dec 20, 2019

TomAugspurger added this to the No action milestone Dec 20, 2019

pshMsGsAZL2n07ohWBCAGX9cC changed the title ~~Pivot Table Margins aren't correctly showing NaN values~~ Pivot Table Margins aren't correctly showing NaN values despite dummy variables Dec 24, 2019

pshMsGsAZL2n07ohWBCAGX9cC mentioned this issue Dec 29, 2019

Re-Opening: Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30549

Closed

TomAugspurger reopened this Dec 30, 2019

mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Duplicate Report Duplicate issue or pull request labels Apr 3, 2020

mroeschke removed this from the No action milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351

Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 19, 2019 •

edited by TomAugspurger

Loading

INSTALLED VERSIONS

TomAugspurger commented Dec 20, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 24, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 24, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 29, 2019

TomAugspurger commented Dec 30, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 31, 2019 •

edited

Loading

anthropikos commented Apr 7, 2024

pshMsGsAZL2n07ohWBCAGX9cC commented Apr 9, 2024

Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351

Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351

Comments

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 19, 2019 • edited by TomAugspurger Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Dec 20, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 24, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 24, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 29, 2019

TomAugspurger commented Dec 30, 2019

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 31, 2019 • edited Loading

anthropikos commented Apr 7, 2024

pshMsGsAZL2n07ohWBCAGX9cC commented Apr 9, 2024

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 19, 2019 •

edited by TomAugspurger

Loading

Output of `pd.show_versions()`

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 31, 2019 •

edited

Loading