-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Sum of grouped bool column has inconsistent type #7001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is a dupe of #3752, but I like your examples better, so will keep this issue! Its possible to fix, but hasn't been high on the list of priorities |
As for getting float64 instead of int64 as result, a possible workaround is to use >>> pd.DataFrame([True,True]).groupby(lambda x: 0).agg(pd.np.count_nonzero)[0]
0 2
Name: 0, dtype: int64 |
for some additional context - sometimes the user may not know they are dealing with a bool type. this may occur when performing a groupby on the result of pd.get_dummies, which may return columns of type uint8, but not always. if get_dummies returns a uint16, the issue above is not triggered, and dummies_result.groupby(...).sum() returns int types. if any of the counts in dummies is small enough, the groupby result will be float. |
This is really very confusing as it means some code might work well as expected on some data while running into an error on other data. I would much appreciate if this could be fixed. |
Summing a bool column after a groupby gives a bool result until there are two or more True values, when it becomes a float64. Seems like it should always be an (unsigned?) integer. Straight sum without a groupby always gives an int64. This is with 0.13.1.
The text was updated successfully, but these errors were encountered: