Skip to content

pandas.core.base.DataError: No numeric types to aggregate #34403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ishgupta opened this issue May 27, 2020 · 7 comments
Closed

pandas.core.base.DataError: No numeric types to aggregate #34403

ishgupta opened this issue May 27, 2020 · 7 comments
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@ishgupta
Copy link

ishgupta commented May 27, 2020

aggregating a boolean fields doesn't allow averaging the data column in the latest version. Is there a new alternative of doing this for boolean attributes, or it should only be handled by transforming it to a int/float only?

data[ group_fields + [ bool_field ]].groupby( group_fields ).mean() produces the error mentioned in subject.

@MarcoGorelli
Copy link
Member

Thanks for the report - to expedite resolution, could you include a reproducible example https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label May 27, 2020
@dsaxton
Copy link
Member

dsaxton commented May 27, 2020

I think this should be fixed after #34056

@TomAugspurger
Copy link
Contributor

Sounds like it. Let us know if it's not fixed by #34056 @ishgupta.

@ishgupta
Copy link
Author

ishgupta commented Jun 4, 2020

Hi @TomAugspurger, thanks for sharing that..
I refer to the similar issue mentioned in #34056, but with another extension..

with the below code, sum works fine on a boolean column, but mean() leads to error.
[ @MarcoGorelli ]
df = pd.DataFrame({"a": [1] * 50_000 + [2] * 50_000, "b": pd.array([True] * 100_000)})
df.groupby("a").sum() # works fine
df.groupby("a").mean() # DataError: No numeric types to aggregate

@dsaxton
Copy link
Member

dsaxton commented Jun 4, 2020

@ishgupta I just checked to confirm and this does indeed work on master:

[ins] In [3]: df = pd.DataFrame({"a": [1] * 50_000 + [2] * 50_000, "b": pd.array([True] * 100_000)})                                                                                                         

[ins] In [4]: df.groupby("a").sum()                                                                                                                                                                          
Out[4]: 
       b
a       
1  50000
2  50000

[ins] In [5]: df.groupby("a").mean()                                                                                                                                                                         
Out[5]: 
      b
a      
1  True
2  True

[ins] In [6]: pd.__version__                                                                                                                                                                                 
Out[6]: '1.1.0.dev0+1758.g035e1fe83'

@ishgupta
Copy link
Author

ishgupta commented Jun 5, 2020

@dsaxton Thank you for taking time out to respond my query.
Actually, I am using the latest version of pandas available, which is 1.0.4, and yes, it is older than the development version you're using. :)

Is it possible for you to guide me, if a stable release of v1.1.0 available on PyPI ?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 5, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants