-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Order of groups in groupby and head method #17775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
see the docs http://pandas-docs.github.io/pandas-docs-travis/groupby.html#filtration.
I suppose the fact that you are manually iterating is in sorted order could be better documented. Can you submit a PR to that effect (may use your example in a warning or note box in http://pandas-docs.github.io/pandas-docs-travis/groupby.html#iterating-through-groups. agree it could be slightly unexpected. |
This is true and is well documented. The problem I find is not with iterating through groups but with
If you subtract the first value to the sum you should get the sum of the other two but since the order is different you do not. Edit: |
Hey @Ifnister, so to be clear: # Setup
data = [
{'species': 'setosa', 'sepal_length': 5.1},
{'species': 'versicolar', 'sepal_length': 5.6},
{'species': 'virginica', 'sepal_length': 5.7},
]
df = pd.DataFrame(data).sort_values('species', ascending=False)
df
This is what you currently get using df.groupby('species').head()
However, this is what you get if you use df.groupby('species').sepal_length.sum()
And you expect the output of
Is that right? |
There is a big problem with the docstrings here for
Edit, I see that @Ifnister already pointed this out. I think it would make a lot more sense to actually do |
Head is a filter; sort is only applied to reducers within groupby. To my knowledge this isn't documented, but I haven't checked. I think documentation on this should be added to the |
When
sort = True
is passed togroupby
(which is by default) the groups will be in sorted order. If you loop through them they are in sorted order, if you compute the mean, std... they are in sorted order but if you use the methodhead
they are NOT in sorted order.Is this expected? I have not found this behaviour documented.
I think it is confusing and has caused me a headache because I was combining output of the
mean
andhead
methods in the same DataFrame, and since the data was not ordered before those results were getting mixed because of these order issues. I have pandas 0.20.3The text was updated successfully, but these errors were encountered: