Skip to content

KeyError when using pivot_table with an aggfunc and and empty column #9186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brandonkane opened this issue Jan 2, 2015 · 2 comments · Fixed by #30646
Closed

KeyError when using pivot_table with an aggfunc and and empty column #9186

brandonkane opened this issue Jan 2, 2015 · 2 comments · Fixed by #30646
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@brandonkane
Copy link

This is a bit of an edge case, but in Pandas 0.15.2 when you try to pivot on an empty column you should get back an empty dataframe. This is the behaviour when the default aggregation function is used, but if you specify an aggfunc argument it fails.

Setup dataframe with empty column:

>>> df = pd.DataFrame({'A': [2,2,3,3,2], 'id': [5,6,7,8,9], 'C':['p', 'q', 'q', 'p', 'q'], 'D':[None, None, None, None, None]})
>>> df
   A  C     D  id
0  2  p  None   5
1  2  q  None   6
2  3  q  None   7
3  3  p  None   8
4  2  q  None   9

Expected behaviour when pivoting on the empty column:

>>> df.pivot_table(index='A', columns='D', values='id')
Empty DataFrame
Columns: []
Index: []

But if you specify an aggfunc it blows up:

>>> df.pivot_table(index='A', columns='D', values='id', aggfunc=np.size)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/util/decorators.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/util/decorators.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/tools/pivot.py", line 151, in pivot_table
    table = table[values[0]]
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/core/frame.py", line 1780, in __getitem__
    return self._getitem_column(key)
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/core/frame.py", line 1787, in _getitem_column
    return self._get_item_cache(key)
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/core/generic.py", line 1068, in _get_item_cache
    values = self._data.get(item)
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/core/internals.py", line 2849, in get
    loc = self.items.get_loc(item)
  File "/data/virtualenv/default/lib/python2.7/site-packages/pandas/core/index.py", line 1402, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3812)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3692)
  File "pandas/hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12299)
  File "pandas/hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12250)
KeyError: 'id'
@jreback
Copy link
Contributor

jreback commented Jan 2, 2015

Problem is you are wanting to have columns that have None (the actual value), which is not a valid label. So I suspect that the empty frame that you have returned is actually wrong as well. You could raise a ValueError or simply number the columns in the returned frame.

Pull-requests accepted

In [1]: df = pd.DataFrame({'A': [2,2,3,3,2], 'id': [5,6,7,8,9], 'C':['p', 'q', 'q', 'p', 'q'], 'D':[None, None, None, None, None]})

In [4]: df['D'] = 'foo'

In [5]: df
Out[5]: 
   A  C    D  id
0  2  p  foo   5
1  2  q  foo   6
2  3  q  foo   7
3  3  p  foo   8
4  2  q  foo   9

In [6]: df.pivot_table(index='A', columns='D', values='id')
Out[6]: 
D       foo
A          
2  6.666667
3  7.500000

In [7]: df.pivot_table(index='A', columns='D', values='id', aggfunc=np.size)
Out[7]: 
D  foo
A     
2    3
3    2
In [8]: df = pd.DataFrame({'A': [2,2,3,3,2], 'id': [5,6,7,8,9], 'C':['p', 'q', 'q', 'p', 'q'], 'D':[None, None, None, None, None]})

In [9]: df.pivot_table(index='A', columns='D', values='id')
Out[9]: 
Empty DataFrame
Columns: []
Index: []

In [10]: df.pivot_table(index='A', columns='D', values='id', aggfunc=np.size)
KeyError: 'id'

@jreback jreback added Bug Error Reporting Incorrect or improved errors from pandas labels Jan 2, 2015
@jreback jreback added this to the 0.16.0 milestone Jan 2, 2015
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke
Copy link
Member

This looks fixed on master. Could use a regression test.

In [25]: df.pivot_table(index='A', columns='D', values='id', aggfunc=np.size)
Out[25]:
Empty DataFrame
Columns: []
Index: []

In [26]: pd.__version__
Out[26]: '0.26.0.dev0+490.g9cfb8b55b'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Error Reporting Incorrect or improved errors from pandas labels Oct 6, 2019
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.0 Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants