Skip to content

Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pshMsGsAZL2n07ohWBCAGX9cC opened this issue Dec 19, 2019 · 8 comments
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@pshMsGsAZL2n07ohWBCAGX9cC
Copy link

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 19, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

fig = plt.figure()

data = [['Auto', 'A', 10], ['Auto', 'A', 2], ['Telco', 'B', 15], ['Manf', 'C'],
        ['Retail', 'A', ], ['Retail', 'B', ]] 
df = pd.DataFrame(data, columns = ['INDUSTRY', 'PRODUCT', 'COST'])

table_byindustry = pd.pivot_table(df, index=['INDUSTRY'], columns=['PRODUCT'], values='COST',
                                  fill_value=0, margins=True, margins_name='Total', aggfunc='count')
ax = sns.heatmap(table_byindustry, cmap='RdYlGn', linewidths=.1, annot=True, vmin=0, vmax=100, fmt="g", xticklabels=True)
ax.set_ylabel("Industry", fontsize=12)
ax.set_xlabel("Product", fontsize=12, labelpad=1)
ax.xaxis.tick_bottom()
ax.set_ylim(bottom=5,top=0)

Problem description

Pivot Table Margins ("Total") isn't showing 0 values for rows, yet will do so for columns (e.g., Product C). As a result, the heatmap is empty for the "Total" column related to the rows ('Manf' and 'Retail') in my example. It is treating those rows as NaN for Margin, and setting fill_value at 0 doesn't fix it. See attached image. Thank you for your efforts to fix this issue.

Expected Output

Based on the color map, there should be a color for every cell in the pivot table, including the Margins. In addition, if the value is 0, it should be shown when the margins=True. It seems as if the treatment is inconsistent (i.e., view the total for Product C vs. the totals for 'Manf' and 'Retail' in Industry).

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.88-1-MANJARO
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
seaborn : 0.9.0
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None

![bug_image_19Dec2019](https://user-images.githubusercontent.com/56904329/71194005-0816e600-2283-11ea-90b8-7fc69a79e28d.png)
@TomAugspurger
Copy link
Contributor

I'm not sure how the plotting stuff is related, but based on the title this sounds like #3729. Let me know if not.

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Dec 20, 2019
@TomAugspurger TomAugspurger added this to the No action milestone Dec 20, 2019
@pshMsGsAZL2n07ohWBCAGX9cC
Copy link
Author

No, this issue is different from #3729. This issue isn’t dropping entire NaN rows like #3729, and my issue already uses the dummy variables suggested in that post as a workaround. My issue is dropping a NaN total in the margin row (but not column), while the dummy variables appear fine. I would suggest that you re-open it. Thanks again, and happy holidays.

@pshMsGsAZL2n07ohWBCAGX9cC pshMsGsAZL2n07ohWBCAGX9cC changed the title Pivot Table Margins aren't correctly showing NaN values Pivot Table Margins aren't correctly showing NaN values despite dummy variables Dec 24, 2019
@pshMsGsAZL2n07ohWBCAGX9cC
Copy link
Author

Forgot to add in first clarification. This issue occurs without doing a heatmap plot. That is, the row margin total shows NaN, despite zeros for the row elements. Column margin total doesn’t have this issue.

@pshMsGsAZL2n07ohWBCAGX9cC
Copy link
Author

I'm not sure how the plotting stuff is related, but based on the title this sounds like #3729. Let me know if not.

Tom, I added a clarification about a week ago, saying that I didn't think it was a duplicate. Can you please take a look? Apologies in advance, as I'm relatively new to github and don't understand exactly how issues are "re-opened". Thanks.

@TomAugspurger
Copy link
Contributor

@pshMsGsAZL2n07ohWBCAGX9cC can you make a minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports Is seaborn / matplotlib necessary, or does just

In [2]: table_byindustry
Out[2]:
PRODUCT   A  B  C  Total
INDUSTRY
Auto      2  0  0    2.0
Manf      0  0  0    NaN
Retail    0  0  0    NaN
Telco     0  1  0    1.0
Total     2  1  0    3.0

display the issue? Is it that the total for Manf and Retail are NaN, rather than 0?

@TomAugspurger TomAugspurger reopened this Dec 30, 2019
@pshMsGsAZL2n07ohWBCAGX9cC
Copy link
Author

pshMsGsAZL2n07ohWBCAGX9cC commented Dec 31, 2019

@TomAugspurger Seaborn and matplotlib aren't necessary, although that's how I first saw the problem. The dataframe (table_byindustry) shows the issue, which you correctly summarized. Total for the Manf and Retail rows are NaN, rather than 0. Thanks for re-opening it and sharing how I can better report future bugs with minimal information.

@mroeschke mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Duplicate Report Duplicate issue or pull request labels Apr 3, 2020
@mroeschke mroeschke removed this from the No action milestone Oct 13, 2022
@anthropikos
Copy link

Seems like this behavior is due to margin calculation using the groupby() method and na are default dropped by groupby().

My workaround is to just use isna() on the pivot_table output.

@pshMsGsAZL2n07ohWBCAGX9cC
Copy link
Author

@aoot Thanks for the workaround. At the time, I wanted a table that would properly show rows with zero values (e.g., Manf and Retail) rather than NaNs without the need to handle edge cases (e.g., use isna() to identify such rows then replace them with zeros). Looking back, I probably should have address the issue in data prep/cleansing before creating the table. However, my expectation was that pivot_table() and groupby() would be more Excel-like in their behavior with respect to this issue ;) Thanks again for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants