-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix groupby.apply #28662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix groupby.apply #28662
Conversation
Somewhat unrelated, but does |
Correct the --recursive option only actually checks Even though one can explicitly sort a given pyx file eg. running cc. @jbrockmendel to comment on if its worth us bothering to isort |
@alimcmaster1, yes this is intended, for now. There has been an open ticket for some time to add full Cython awareness (cimports, cdefs, etc...) but for now it remains incomplete. Because of this, we currently intentionally don't automatically include pyx files. Once this ticket is resolved, isort will include pyx files by default. Hope this is helpful! ~Timothy |
I've tried to make these internally consistent, but isort wont work on these since they are not valid python. update looks like I was wrong about isort not working on them. still for now "get it close enough manually" makes sense to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
lgtm. can you rebase and small comment. ping on green. |
Another approach we could take would be to update some combination of libreduction._check_result_array and libreduction._extract_result to check for Index objects and redirect us back to the pure-python implementation, which already does this corrrectly. |
@dsaxton - feels like this is almost there? Seems like the test failures mentioned above are now resolved. |
can you merge master and will look again |
thanks @dsaxton not generally happy with the way we do reductions in cython, but a larger discussion / refactor effort. |
@dsaxton this caused a performance regression for some of our benchmarks, eg https://pandas.pydata.org/speed/pandas/#groupby.Apply.time_copy_overhead_single_col?commits=6efc2379-b9de33e3. Now, it might be this is unavoidable since the extra copy is to avoid a bug, but can you take a look at that benchmark? |
@jorisvandenbossche Interesting, it looks like the copy was previously avoided because each group was getting copied by the function in the benchmark, so the identity check on the indexes wasn't triggering ( Not really sure what the best approach would be here; would you recommend adding an explicit check like |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Makes sure that the output of
groupby.apply
is built up by value instead of by reference inreduction.pyx
to avoid the behavior from #28652.