-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG:Pivot table drops column/index names=nan when dropna=false #16142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -548,10 +548,6 @@ def _validate_categories(cls, categories, fastpath=False): | |||
|
|||
if not fastpath: | |||
|
|||
# Categories cannot contain NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have some unintentional changes in here? This shouldn't be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the Index, Index([None, u'A', u'B'], dtype='object')
, needs to be passed to Categorical
when doing MultiIndex
, as when dropna=False
, None
could also be the index/column name. Or I didn't get this correctly?
@@ -159,15 +159,15 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', | |||
if isinstance(table, DataFrame): | |||
table = table.sort_index(axis=1) | |||
|
|||
if fill_value is not None: | |||
table = table.fillna(value=fill_value, downcast='infer') | |||
|
|||
if margins: | |||
if dropna: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove this if dropna
, most of the tests pass (including a fix for this one) other than
df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan],
'b': [3, 3, 4, 4, 4, 4]})
actual = pd.crosstab(df.a, df.b, margins=True, dropna=False)
expected = pd.DataFrame([[1, 0, 1], [1, 3, 4], [2, 4, 6]])
expected.index = Index([1.0, 2.0, 'All'], name='a')
expected.columns = Index([3, 4, 'All'], name='b')
Here's the result and expected
(Pdb) pp actual
b 3 4 All
a
1.0 1 0 1
2.0 1 3 4
All 2 3 5
(Pdb) pp expected
b 3 4 All
a
1.0 1 0 1
2.0 1 3 4
All 2 4 6
You have more experience with this section of the code than I do, but the margins on the expected
look incorrect to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're definitely right. I think it should be (if dropna=False
):
b | 3 | 4 | All |
---|---|---|---|
a | |||
1.0 | 1 | 0 | 1 |
2.0 | 1 | 3 | 4 |
np.nan | 0 | 1 | 1 |
All | 2 | 4 | 6 |
- I need to fix the np.nan as it is still being ignored even with the current fix when
dropna=False
(i.e. the fix only works forNone
) - Actually I didn't get why removing
dropna
here would help yet. I'll check closer.
this is doing similar things to changes in #12607 |
Ah I see. That is a much larger change that the original issue I was looking at :) |
I think the change in |
Some tests will be failing and many are actually different/separate problems. I already located several and will update soon. |
can you rebase and update? |
needs a rebase. if you'd like to continue, pls comment. |
git diff upstream/master --name-only -- '*.py' | flake8 --diff
clean-up of Pivot table drops column/index names=nan when dropna=false #14246