-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
GroupBy.apply repeats the first group N times when slicing and sorting in the applied function #25892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
why would you ever try to inplace modify in a groupby? this makes no sense as for double evaluations you should try on master as this was just merged |
why it makes no sense? in each group there's some data processing and then I want to sort the group data object itself. Of course I can use inplace=False and get a copy of the group data, but why should I? shouldn't I assume each group is a different object? as for the double evaluations issue - yes, I'm aware of it, thanks |
Modifying the contents of a groupby object is akin to modifying the contents of a collection you are iterating over. It's in general unpredictable and considered bad practice. In general,
if you want to sort a column you assign the sorted version back to itself:
Similarly, here you should not sort the group in place but return a the sorted group as a new object. The documentation gives a warning that |
Right the first group appearing more than once should have been resolved in #24748 and comments above provide good comment with respect to modifications made during iteration. |
Closing as no action |
Hi,
When using GroupBy.apply on function that applies pandas.DataFrame.sort_values() with inplace=True and also data-frame slicing, the first group is evaluated N times (N is the original number of the groups).
See the following code and output:
As you can see, out_sort_no_slice, out_no_sort_slice and out_no_sort_no_slice get the expected outputs, while the output of out_sort_slice contains the first group 4 times. while debugging, you can see the first group evaluated 4 times inside func.
Notice that the data is already sorted by field 'A' and the slicing part keeps all the data in the data-frame.
I'm using Anaconda Python 3.7 with Pandas 0.24.2 on Mac.
Thanks!
The text was updated successfully, but these errors were encountered: