-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Groupby transform with missing groups #8955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
agree on the consistency on think a pass thru grouping is probably the most intuitive care to do a pr to fix? |
This is still a bug (v0.20.3), though interestingly the Series now works correctly for np.mean (which crashed for the OP) and crashes for pd.Series.mean. #9941 was a similar issue that got fixed recently. |
@kcarnold : Interesting...is the stack-trace the same as before? BTW, given what you said, we should probably add tests to confirm your first statement, though a PR to patch the still crashing behavior is still welcome! |
@jreback : Well, you have your consistency now it seems. They both work with |
Now all four ops do not raise, but give inconsistent results. Using I think the issue is here: pandas/pandas/core/groupby/generic.py Lines 559 to 567 in 53810fd
where it is incorrect to use The result when dropna=False is consistent and correct. |
All four results are now consistent and correct, according to
Tests were added to |
In a groupby/transform when some of the groups are missing, should the transformed values be set to missing (my preference), left unchanged, or should this be an error? Currently the behavior is inconsistent between Series and Frames, and between cythonized and non-cythonized transformations.
For a Series with a non-cythonized transformation, the values are left unchanged:
For a Series with cythonized functions, its an error (this changed between 0.14.1 and 0.15.0):
For DataFrames, the results are opposite:
The text was updated successfully, but these errors were encountered: