BUG: pivot_table over Categorical Columns

Nicholas Ver Halen · jreback · commit ed2a2e499454 · 2017-03-04T16:41:19.000-05:00
closes #15193 Author: Nicholas Ver Halen <verhalenn@yahoo.com> Closes #15511 from verhalenn/issue15193 and squashes the following commits: bf0fdeb [Nicholas Ver Halen] Added description to code change. adf8616 [Nicholas Ver Halen] Added whatsnew for issue 15193 a643267 [Nicholas Ver Halen] Added test for issue 15193 d605251 [Nicholas Ver Halen] Made sure pivot_table propped na columns
diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
@@ -735,6 +735,7 @@ Bug Fixes
 
 
 - Bug in ``pd.merge_asof()`` where ``left_index``/``right_index`` together caused a failure when ``tolerance`` was specified (:issue:`15135`)
+- Bug in ``DataFrame.pivot_table()`` where ``dropna=True`` would not drop all-NaN columns when the columns was a ``category`` dtype (:issue:`15193`)
 
 
 - Bug in ``pd.read_hdf()`` passing a ``Timestamp`` to the ``where`` parameter with a non date column (:issue:`15492`)
diff --git a/pandas/tests/tools/test_pivot.py b/pandas/tests/tools/test_pivot.py
@@ -86,6 +86,39 @@ def test_pivot_table_dropna(self):
         tm.assert_index_equal(pv_col.columns, m)
         tm.assert_index_equal(pv_ind.index, m)
 
+    def test_pivot_table_dropna_categoricals(self):
+        # GH 15193
+        categories = ['a', 'b', 'c', 'd']
+
+        df = DataFrame({'A': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
+                        'B': [1, 2, 3, 1, 2, 3, 1, 2, 3],
+                        'C': range(0, 9)})
+
+        df['A'] = df['A'].astype('category', ordered=False,
+                                 categories=categories)
+        result_true = df.pivot_table(index='B', columns='A', values='C',
+                                     dropna=True)
+        expected_columns = Series(['a', 'b', 'c'], name='A')
+        expected_columns = expected_columns.astype('category', ordered=False,
+                                                   categories=categories)
+        expected_index = Series([1, 2, 3], name='B')
+        expected_true = DataFrame([[0.0, 3.0, 6.0],
+                                   [1.0, 4.0, 7.0],
+                                   [2.0, 5.0, 8.0]],
+                                  index=expected_index,
+                                  columns=expected_columns,)
+        tm.assert_frame_equal(expected_true, result_true)
+
+        result_false = df.pivot_table(index='B', columns='A', values='C',
+                                      dropna=False)
+        expected_columns = Series(['a', 'b', 'c', 'd'], name='A')
+        expected_false = DataFrame([[0.0, 3.0, 6.0, np.NaN],
+                                    [1.0, 4.0, 7.0, np.NaN],
+                                    [2.0, 5.0, 8.0, np.NaN]],
+                                   index=expected_index,
+                                   columns=expected_columns,)
+        tm.assert_frame_equal(expected_false, result_false)
+
     def test_pass_array(self):
         result = self.data.pivot_table(
             'D', index=self.data.A, columns=self.data.C)
diff --git a/pandas/tools/pivot.py b/pandas/tools/pivot.py
@@ -175,6 +175,10 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
     if len(index) == 0 and len(columns) > 0:
         table = table.T
 
+    # GH 15193 Makse sure empty columns are removed if dropna=True
+    if isinstance(table, DataFrame) and dropna:
+        table = table.dropna(how='all', axis=1)
+
     return table
 
 

Original file line number	Diff line number	Diff line change
`@@ -735,6 +735,7 @@ Bug Fixes`
`735`	`735`
`736`	`736`
`737`	`737`	- Bug in ``pd.merge_asof()`` where ``left_index``/``right_index`` together caused a failure when ``tolerance`` was specified (:issue:`15135`)
	`738`	+- Bug in ``DataFrame.pivot_table()`` where ``dropna=True`` would not drop all-NaN columns when the columns was a ``category`` dtype (:issue:`15193`)
`738`	`739`
`739`	`740`
`740`	`741`	- Bug in ``pd.read_hdf()`` passing a ``Timestamp`` to the ``where`` parameter with a non date column (:issue:`15492`)