-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pivot Table Margins aren't correctly showing NaN values despite dummy variables #30351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure how the plotting stuff is related, but based on the title this sounds like #3729. Let me know if not. |
No, this issue is different from #3729. This issue isn’t dropping entire NaN rows like #3729, and my issue already uses the dummy variables suggested in that post as a workaround. My issue is dropping a NaN total in the margin row (but not column), while the dummy variables appear fine. I would suggest that you re-open it. Thanks again, and happy holidays. |
Forgot to add in first clarification. This issue occurs without doing a heatmap plot. That is, the row margin total shows NaN, despite zeros for the row elements. Column margin total doesn’t have this issue. |
Tom, I added a clarification about a week ago, saying that I didn't think it was a duplicate. Can you please take a look? Apologies in advance, as I'm relatively new to github and don't understand exactly how issues are "re-opened". Thanks. |
@pshMsGsAZL2n07ohWBCAGX9cC can you make a minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports Is seaborn / matplotlib necessary, or does just In [2]: table_byindustry
Out[2]:
PRODUCT A B C Total
INDUSTRY
Auto 2 0 0 2.0
Manf 0 0 0 NaN
Retail 0 0 0 NaN
Telco 0 1 0 1.0
Total 2 1 0 3.0 display the issue? Is it that the total for Manf and Retail are NaN, rather than 0? |
@TomAugspurger Seaborn and matplotlib aren't necessary, although that's how I first saw the problem. The dataframe (table_byindustry) shows the issue, which you correctly summarized. Total for the Manf and Retail rows are NaN, rather than 0. Thanks for re-opening it and sharing how I can better report future bugs with minimal information. |
Seems like this behavior is due to margin calculation using the My workaround is to just use |
@aoot Thanks for the workaround. At the time, I wanted a table that would properly show rows with zero values (e.g., Manf and Retail) rather than NaNs without the need to handle edge cases (e.g., use isna() to identify such rows then replace them with zeros). Looking back, I probably should have address the issue in data prep/cleansing before creating the table. However, my expectation was that pivot_table() and groupby() would be more Excel-like in their behavior with respect to this issue ;) Thanks again for sharing. |
Code Sample, a copy-pastable example if possible
Problem description
Pivot Table Margins ("Total") isn't showing 0 values for rows, yet will do so for columns (e.g., Product C). As a result, the heatmap is empty for the "Total" column related to the rows ('Manf' and 'Retail') in my example. It is treating those rows as NaN for Margin, and setting fill_value at 0 doesn't fix it. See attached image. Thank you for your efforts to fix this issue.
Expected Output
Based on the color map, there should be a color for every cell in the pivot table, including the Margins. In addition, if the value is 0, it should be shown when the margins=True. It seems as if the treatment is inconsistent (i.e., view the total for Product C vs. the totals for 'Manf' and 'Retail' in Industry).
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.88-1-MANJARO
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.2
seaborn : 0.9.0
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None
The text was updated successfully, but these errors were encountered: