Skip to content

DataFrame Groupby Named Aggregation Using Lambda Results in a KeyError, Dict-of-Dicts Approach Works Fine #27863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tbarket opened this issue Aug 12, 2019 · 4 comments

Comments

@tbarket
Copy link

tbarket commented Aug 12, 2019

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
data = [['tom', 10, 'blue', 1000, 'a'], ['nick', 15, 'blue', 2000, 'b'], ['julie', 14, 'green', 3000, 'a'], ['bob', 11, 'green', 4000, 'a'], ['cindy', 16, 'red', 5000, 'b']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Color', 'Num', 'Letter'])

# Dict-style renaming works fine:
df.groupby(by='Color').agg({'Num': {'SumNum'  : np.sum, 'SumNumIfLetterA': lambda x: x[df.iloc[x.index].Letter=='a'].sum()}})

Out[2]: 
         Num                
      SumNum SumNumIfLetterA
Color                       
blue    3000            1000
green   7000            7000
red     5000               0

# Named Aggregation using function also works fine:
def simplefunc(x):
    return x[df.iloc[x.index].Letter=='a'].sum()
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num', simplefunc))

Out[3]: 
       SumNum  SumNumIfLetterA
Color                         
blue     3000             1000
green    7000             7000
red      5000                0

# Named Aggregation using identical lambda throws a KeyError:
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))

Problem description

A lambda function that previously worked fine (using a Dict-of-Dict) no longer works with Named Aggregation. Using a function with Named Aggregation works fine but replacing the function with an identical lambda does not work with Named Aggregation. Is this expected behavior or it is a bug?

Expected Output

       SumNum  SumNumIfLetterA
Color                         
blue     3000             1000
green    7000             7000
red      5000                0

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None

pandas : 0.25.0
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@jorisvandenbossche
Copy link
Member

@TomAugspurger I thought you did a follow-up PR (after the named aggregation PR) to deal with lambdas ?

@jorisvandenbossche
Copy link
Member

From #27519, it seems lambdas already work, but only if that is the only function for that specific column.

So this works:

In [34]: df.groupby(by='Color').agg(SumNumIfLetterA = ('Num', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))                                                                               
Out[34]: 
       SumNumIfLetterA
Color                 
blue              1000
green             7000
red                  0

but once you include an aggregation for the same column (with a lambda function or not) it errors.

@TomAugspurger
Copy link
Contributor

Duplicate of #27519 I think.

@TomAugspurger
Copy link
Contributor

@tbarket let us know if you're interested in working on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants