Skip to content

"SpecificationError: nested dictionary is ambiguous in aggregation" in a certain case of groupby-aggregation #25471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Khris777 opened this issue Feb 28, 2019 · 6 comments · Fixed by #50079
Assignees
Labels
good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions

Comments

@Khris777
Copy link

Khris777 commented Feb 28, 2019

All of these examples for using agg work fine:

import pandas as pd
df = pd.DataFrame({"A":['A','A','B','B','B'],
                   "B":[1,2,1,1,2],
                   "C":[9,8,7,6,5]})
df.groupby('A')[['B','C']].agg({'B':'sum','C':'count'})
df.groupby('A')[['B','C']].agg({'B':['sum','count'],'C':'count'})
df.groupby('A')[['B']].agg('sum')
df.groupby('A')['B'].agg('sum')

This one throws a future warning as mentioned here:

df.groupby('A')['B'].agg({'B':['sum','count']})

This one works just fine:

df.groupby('A')[['B','C']].agg({'B':'sum'})

But this one throws an error (I'm aware this expression isn't necessary):

df.groupby('A')[['B']].agg({'B':'sum'})

SpecificationError: nested dictionary is ambiguous in aggregation

Why does it throw this error?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.1
pytest: 3.9.1
pip: 19.0.1
setuptools: 40.8.0
Cython: 0.29.5
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.6.0
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml.etree: 4.3.1
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.18
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

Not sure, probably a bug. If you're interested in debugging further, let us know.

@stuarteberg
Copy link
Contributor

I can confirm in pandas 0.25.0.

These work fine:

In [27]: df = pd.DataFrame({'x': [1,1,2], 'y': [3,4,4]})

In [28]: df
Out[28]:
   x  y
0  1  3
1  1  4
2  2  4

In [29]: df.groupby('x').agg({'x': 'sum'})
Out[29]:
   x
x
1  2
2  2

In [30]: df.groupby('y').agg({'x': 'sum'})
Out[30]:
   x
y
3  1
4  3

But this raises an error:

In [31]: df.groupby('y')[['x']].agg({'x': 'sum'})

...

SpecificationError: nested dictionary is ambiguous in aggregation

And the .agg() call only requires a single column, but you provide more than one column when you select columns from the groupby, then you get a weird result. (Why does the result below include a y column at all?)

In [34]: df.groupby('y')[['x', 'y']].agg({'x': 'sum'})
Out[34]:
   x
   x  y
y
3  1  3
4  3  8

@SaiSridhar783
Copy link

df.groupby('A')['B'].agg({'B':['sum','count']})

Here, the column 'B' is returned as a series so aggregation is not possible.

@mroeschke
Copy link
Member

Looks to work on master now. Could use a test

In [26]: import pandas as pd
    ...: df = pd.DataFrame({"A":['A','A','B','B','B'],
    ...:                    "B":[1,2,1,1,2],
    ...:                    "C":[9,8,7,6,5]})

In [27]: df.groupby('A')[['B']].agg({'B':'sum'})
Out[27]:
   B
A
A  3
B  4

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Jun 27, 2021
@lakmalpin lakmalpin removed their assignment Dec 6, 2022
@talukder-sowrov
Copy link
Contributor

We are a university team looking for a good first issue.

@talukder-sowrov
Copy link
Contributor

Take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants