BUG:Pivot table drops column/index names=nan when dropna=false #16142

OXPHOS · 2017-04-26T06:36:21Z

closes pivot_table margins bottom-left total does not correspond to other content when dropna=False #14072
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry
clean-up of Pivot table drops column/index names=nan when dropna=false #14246

TomAugspurger · 2017-04-26T18:34:26Z

pandas/core/categorical.py

@@ -548,10 +548,6 @@ def _validate_categories(cls, categories, fastpath=False):

        if not fastpath:

-            # Categories cannot contain NaN.


Do you have some unintentional changes in here? This shouldn't be removed.

I feel like the Index, Index([None, u'A', u'B'], dtype='object'), needs to be passed to Categorical when doing MultiIndex, as when dropna=False, None could also be the index/column name. Or I didn't get this correctly?

TomAugspurger · 2017-04-27T12:00:57Z

pandas/core/reshape/pivot.py

@@ -159,15 +159,15 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
    if isinstance(table, DataFrame):
        table = table.sort_index(axis=1)

-    if fill_value is not None:
-        table = table.fillna(value=fill_value, downcast='infer')
-
    if margins:
        if dropna:


If I remove this if dropna, most of the tests pass (including a fix for this one) other than

df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan], 'b': [3, 3, 4, 4, 4, 4]}) actual = pd.crosstab(df.a, df.b, margins=True, dropna=False) expected = pd.DataFrame([[1, 0, 1], [1, 3, 4], [2, 4, 6]]) expected.index = Index([1.0, 2.0, 'All'], name='a') expected.columns = Index([3, 4, 'All'], name='b')

Here's the result and expected

(Pdb) pp actual b 3 4 All a 1.0 1 0 1 2.0 1 3 4 All 2 3 5 (Pdb) pp expected b 3 4 All a 1.0 1 0 1 2.0 1 3 4 All 2 4 6

You have more experience with this section of the code than I do, but the margins on the expected look incorrect to me.

You're definitely right. I think it should be (if dropna=False):

b 3 4 All

a

1.0 1 0 1

2.0 1 3 4

np.nan 0 1 1

All 2 4 6

I need to fix the np.nan as it is still being ignored even with the current fix when dropna=False (i.e. the fix only works for None)

Actually I didn't get why removing dropna here would help yet. I'll check closer.

jreback · 2017-04-27T12:33:01Z

this is doing similar things to changes in #12607

TomAugspurger · 2017-04-27T12:35:21Z

this is doing similar things to changes in #12607

Ah I see. That is a much larger change that the original issue I was looking at :)

OXPHOS · 2017-04-27T17:16:27Z

I think the change in Cython is definitely required. The problem is how to pass dropna to it without disturbing too many existing structures.
I just reset my developing environment and am trying to use Anaconda Python2.7. Interestingly, numerous tests failed on me even with the master branch. So I just tested pivot and groupby at local. I'll do more research on the weekend.

OXPHOS · 2017-05-01T06:49:12Z

Some tests will be failing and many are actually different/separate problems. I already located several and will update soon.

jreback · 2017-06-10T19:03:59Z

can you rebase and update?

jreback · 2017-07-26T23:58:51Z

needs a rebase. if you'd like to continue, pls comment.

TomAugspurger reviewed Apr 26, 2017

View reviewed changes

TomAugspurger reviewed Apr 27, 2017

View reviewed changes

OXPHOS changed the title ~~Fix 14072 pivot_table dropna~~ BUG:Pivot table drops column/index names=nan when dropna=false Apr 27, 2017

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 27, 2017

OXPHOS added 2 commits April 30, 2017 18:48

Fix 14072 pivot_table dropna

db45f86

fix v2

99c240c

OXPHOS force-pushed the pivot_table_dropna branch from 8a2fcb0 to 99c240c Compare May 1, 2017 06:46

jreback closed this Jul 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG:Pivot table drops column/index names=nan when dropna=false #16142

BUG:Pivot table drops column/index names=nan when dropna=false #16142

OXPHOS commented Apr 26, 2017

TomAugspurger Apr 26, 2017

OXPHOS Apr 27, 2017 •

edited

Loading

TomAugspurger Apr 27, 2017

OXPHOS Apr 27, 2017

jreback commented Apr 27, 2017 •

edited

Loading

TomAugspurger commented Apr 27, 2017

OXPHOS commented Apr 27, 2017

OXPHOS commented May 1, 2017 •

edited

Loading

jreback commented Jun 10, 2017

jreback commented Jul 26, 2017

		@@ -548,10 +548,6 @@ def _validate_categories(cls, categories, fastpath=False):

		if not fastpath:

		# Categories cannot contain NaN.

BUG:Pivot table drops column/index names=nan when dropna=false #16142

BUG:Pivot table drops column/index names=nan when dropna=false #16142

Conversation

OXPHOS commented Apr 26, 2017

TomAugspurger Apr 26, 2017

Choose a reason for hiding this comment

OXPHOS Apr 27, 2017 • edited Loading

Choose a reason for hiding this comment

TomAugspurger Apr 27, 2017

Choose a reason for hiding this comment

OXPHOS Apr 27, 2017

Choose a reason for hiding this comment

jreback commented Apr 27, 2017 • edited Loading

TomAugspurger commented Apr 27, 2017

OXPHOS commented Apr 27, 2017

OXPHOS commented May 1, 2017 • edited Loading

jreback commented Jun 10, 2017

jreback commented Jul 26, 2017

OXPHOS Apr 27, 2017 •

edited

Loading

jreback commented Apr 27, 2017 •

edited

Loading

OXPHOS commented May 1, 2017 •

edited

Loading