Skip to content

BUG: Fix pivot_table margins to include NaN groups when dropna=False #61524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

iabhi4
Copy link
Contributor

@iabhi4 iabhi4 commented May 31, 2025

Fix incorrect margin computation in pivot_table when index or columns contain NA values

This PR fixes an issue where the "All" row or column (i.e., margins=True) in pd.pivot_table does not account for rows that contain NA values in the index or column dimensions. These rows were incorrectly excluded from the overall aggregation used to compute the margin, leading to incorrect totals.

The fix modifies the margin calculation to ensure that rows with NA values are included in the aggregation, consistent with how the data is treated in the main table when dropna=False.

@iabhi4
Copy link
Contributor Author

iabhi4 commented May 31, 2025

This change fixes the margin behavior in pivot_table when dropna=False, but as expected, it also affects related logic, a few tests are now failing, especially in test_crosstab, where the old margin behavior is still being asserted.

Before I go ahead and update those tests to match the new behavior, I just wanted to double-check if this is the direction we want to take, treating dropna=False more consistently in how margins are computed. If this looks good, I’ll go ahead and fix up the failing tests accordingly

@rhshadrach rhshadrach added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels May 31, 2025
@rhshadrach
Copy link
Member

@iabhi4 - thanks for putting this up. The crosstab failures here look like bugs to me as well. E.g. in the penultimate column

b    one        two        NaN       All
c   dull shiny dull shiny dull shiny    
a                                       
bar    1     0    1     0    0     0   2
foo    2     0    1     1    0     1   5
All    3     0    2     1    0     0   7

All should be 1, as this PR is now producing.

@iabhi4
Copy link
Contributor Author

iabhi4 commented May 31, 2025

@rhshadrach Thanks for confirming! I’ve updated the test cases to reflect the corrected behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: margin for pivot_table is incorrect with NA column/index values
2 participants