Skip to content

pivot_table passes junk to aggfunc when value column does not exist #10326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edparcell opened this issue Jun 10, 2015 · 5 comments · Fixed by #30646
Closed

pivot_table passes junk to aggfunc when value column does not exist #10326

edparcell opened this issue Jun 10, 2015 · 5 comments · Fixed by #30646
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@edparcell
Copy link

Expected behavior is that pivot_table would throw an exception explaining the missing column. Instead pivot_table passes DataFrames to aggfunc before checking.

Example:

def agg(l):
    it = iter(l)
    try:
        value = next(it)
    except StopIteration:
        raise Exception("0 items in iterator")
    try:
        next(it)
        raise Exception("More than 1 item in iterator")
    except StopIteration:
        return value

foo = pd.DataFrame({"X": [0, 0, 1, 1], "Y": [0, 1, 0, 1], "Z": [10, 20, 30, 40]})
foo.pivot_table('Z', 'X', 'Y', aggfunc=agg)

gives, as expected:

Y   0   1
X        
0  10  20
1  30  40

But the following throws an Exception from agg, rather than an exception from pivot_table about the missing value column:

foo.pivot_table('notpresent', 'X', 'Y', aggfunc=agg)
@jreback
Copy link
Contributor

jreback commented Jun 11, 2015

it will exclude the not-found columns in the values, see here. This should probably raise an intelligeble message that this is not found (though not sure if there is relied upon elsewhere)

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Novice Error Reporting Incorrect or improved errors from pandas labels Jun 11, 2015
@jreback jreback added this to the Next Major Release milestone Jun 11, 2015
@IamGianluca
Copy link

I'm working on this now. Should we throw an error message saying some columns are missing and then terminate the process before calling the function specified in the 'aggfun' argument? In this way we won't rely on the final user to check if the results are what he/she was expecting. This is clearly an example of wrong arguments passed to the function after all.

@chris-b1
Copy link
Contributor

From SO: This will crash python for me (2.7 win64, master), may be related? @IamGianluca

df = pd.DataFrame({
    'Student': ['A',  'B', 'B'],
    'Assessor': ['C',  'D', 'D'],
    'Score': ['foo', 'bar', 'foo']})
df = df.pivot_table(
    index='Student',
    columns='Assessor',
    values='Score',
    aggfunc=lambda x: x)

@IamGianluca
Copy link

@chris-b1 thanks for posting this in GitHub.

It crashes also on python 3.4. I didn't have the time to fully investigate it, but I'll probably tomorrow.

@mroeschke
Copy link
Member

This is raising a more sensible error now. Guess this could use a test.

In [60]: foo.pivot_table('notpresent', 'X', 'Y', aggfunc=agg)
    ...:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-60-874ca1e41f27> in <module>
----> 1 foo.pivot_table('notpresent', 'X', 'Y', aggfunc=agg)

~/pandas-mroeschke/pandas/core/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed)
   5981             dropna=dropna,
   5982             margins_name=margins_name,
-> 5983             observed=observed,
   5984         )
   5985

~/pandas-mroeschke/pandas/core/reshape/pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed)
     70         for i in values:
     71             if i not in data:
---> 72                 raise KeyError(i)
     73
     74         to_filter = []

KeyError: 'notpresent'

@mroeschke mroeschke added Needs Tests Unit test(s) needed to prevent regressions and removed Effort Low Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 14, 2019
@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.0 Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants