BUG: Fix groupby.apply #28662

dsaxton · 2019-09-27T23:27:41Z

closes groupby indexing is giving the wrong index #28652
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Makes sure that the output of groupby.apply is built up by value instead of by reference in reduction.pyx to avoid the behavior from #28652.

pandas/_libs/reduction.pyx

pandas/tests/frame/test_apply.py

dsaxton · 2019-09-28T14:37:04Z

Somewhat unrelated, but does isort skip over pyx files during checks? I've noticed it doesn't like the import order / formatting of reduction.pyx.

pandas/tests/frame/test_apply.py

alimcmaster1 · 2019-09-29T20:03:55Z

Somewhat unrelated, but does isort skip over pyx files during checks? I've noticed it doesn't like the import order / formatting of reduction.pyx.

Correct the --recursive option only actually checks .py and .pyi files. As per isort code here.

Even though one can explicitly sort a given pyx file eg. running isort pandas/_libs/reduction.pyx works. Maybe @timothycrosley knows if this is expected behaviour?

cc. @jbrockmendel to comment on if its worth us bothering to isort .pyx files?

doc/source/whatsnew/v1.0.0.rst

timothycrosley · 2019-10-01T04:10:55Z

@alimcmaster1, yes this is intended, for now. There has been an open ticket for some time to add full Cython awareness (cimports, cdefs, etc...) but for now it remains incomplete. Because of this, we currently intentionally don't automatically include pyx files. Once this ticket is resolved, isort will include pyx files by default.

Hope this is helpful!

~Timothy

pandas/tests/frame/test_apply.py

pandas/_libs/reduction.pyx

jbrockmendel · 2019-10-01T15:09:27Z

to comment on if its worth us bothering to isort .pyx files?

I've tried to make these internally consistent, but isort wont work on these since they are not valid python.

update looks like I was wrong about isort not working on them. still for now "get it close enough manually" makes sense to me

WillAyd

lgtm @jreback

pandas/_libs/reduction.pyx

jreback · 2019-10-22T13:05:42Z

lgtm. can you rebase and small comment. ping on green.

pandas/_libs/reduction.pyx

jbrockmendel · 2019-11-19T23:48:42Z

Another approach we could take would be to update some combination of libreduction._check_result_array and libreduction._extract_result to check for Index objects and redirect us back to the pure-python implementation, which already does this corrrectly.

alimcmaster1 · 2019-12-23T04:10:39Z

@dsaxton - feels like this is almost there? Seems like the test failures mentioned above are now resolved.

jreback · 2019-12-27T19:59:19Z

can you merge master and will look again

jreback · 2020-01-01T16:21:49Z

thanks @dsaxton

not generally happy with the way we do reductions in cython, but a larger discussion / refactor effort.

jorisvandenbossche · 2020-01-06T15:50:31Z

@dsaxton this caused a performance regression for some of our benchmarks, eg https://pandas.pydata.org/speed/pandas/#groupby.Apply.time_copy_overhead_single_col?commits=6efc2379-b9de33e3. Now, it might be this is unavoidable since the extra copy is to avoid a bug, but can you take a look at that benchmark?

dsaxton · 2020-01-07T00:30:08Z

@dsaxton this caused a performance regression for some of our benchmarks, eg https://pandas.pydata.org/speed/pandas/#groupby.Apply.time_copy_overhead_single_col?commits=6efc2379-b9de33e3. Now, it might be this is unavoidable since the extra copy is to avoid a bug, but can you take a look at that benchmark?

@jorisvandenbossche Interesting, it looks like the copy was previously avoided because each group was getting copied by the function in the benchmark, so the identity check on the indexes wasn't triggering (if piece.index is chunk.index).

Not really sure what the best approach would be here; would you recommend adding an explicit check like piece.index is not chunk.index somewhere so that we know copying isn't necessary, or is that getting a bit too messy?

Daniel Saxton added 4 commits September 27, 2019 18:08

Add groupby.apply test

9376d3e

Copy before append

b093f98

Add whatsnew entry

ef968a5

Blacken

714f296

jbrockmendel reviewed Sep 27, 2019

View reviewed changes

pandas/_libs/reduction.pyx Outdated Show resolved Hide resolved

gfyoung added Bug Groupby labels Sep 28, 2019

gfyoung reviewed Sep 28, 2019

View reviewed changes

pandas/tests/frame/test_apply.py Outdated Show resolved Hide resolved

gfyoung reviewed Sep 28, 2019

View reviewed changes

pandas/tests/frame/test_apply.py Outdated Show resolved Hide resolved

gfyoung added the Indexing Related to indexing on series/frames, not to indexes themselves label Sep 28, 2019

Daniel Saxton added 2 commits September 28, 2019 09:22

Update test

abd1e36

Use is_scalar check

0d46af0

gfyoung reviewed Sep 29, 2019

View reviewed changes

pandas/tests/frame/test_apply.py Outdated Show resolved Hide resolved

Edit test

980b239

gfyoung approved these changes Sep 29, 2019

View reviewed changes

WillAyd requested changes Oct 1, 2019

View reviewed changes

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved

jreback requested changes Oct 1, 2019

View reviewed changes

pandas/tests/frame/test_apply.py Outdated Show resolved Hide resolved

jreback requested changes Oct 1, 2019

View reviewed changes

pandas/_libs/reduction.pyx Outdated Show resolved Hide resolved

WillAyd mentioned this pull request Oct 1, 2019

Added the GH Issue number note to Writing Tests in Docs #28705

Closed

1 task

Daniel Saxton added 7 commits October 1, 2019 17:50

Move test into groupby

80b2860

Fix

d07f576

Remove self

d74f53c

Merge branch 'master' into grp-apply

612f5b0

Merge branch 'master' into grp-apply

328ea12

Copy once

9673af5

Check for copy attribute

a1bbd9f

Edit release note

b3efdf8

WillAyd approved these changes Oct 21, 2019

View reviewed changes

jreback reviewed Oct 22, 2019

View reviewed changes

pandas/_libs/reduction.pyx Show resolved Hide resolved

jreback added this to the 1.0 milestone Oct 22, 2019

Daniel Saxton added 3 commits October 22, 2019 18:12

Add comment

f387083

Merge branch 'master' into grp-apply

5042c1b

Merge branch 'master' into grp-apply

aae177f

jreback reviewed Oct 25, 2019

View reviewed changes

pandas/_libs/reduction.pyx Show resolved Hide resolved

Merge branch 'master' into grp-apply

cdcb02f

jbrockmendel reviewed Nov 12, 2019

View reviewed changes

pandas/_libs/reduction.pyx Outdated Show resolved Hide resolved

Move import

8fcad2b

Merge branch 'master' into grp-apply

7468de8

Merge branch 'master' into grp-apply

882440b

jreback approved these changes Jan 1, 2020

View reviewed changes

jreback merged commit 7c9042a into pandas-dev:master Jan 1, 2020

dsaxton deleted the grp-apply branch January 1, 2020 18:52

hweecat pushed a commit to hweecat/pandas that referenced this pull request Jan 1, 2020

BUG: Fix groupby.apply (pandas-dev#28662)

65c9cb8

jbrockmendel mentioned this pull request Jan 31, 2020

TypeError: copy() takes no keyword arguments #31441

Closed

fjetter mentioned this pull request Feb 3, 2020

Groupby apply result shape depends on internal apply path #31612

Closed

simonjayhawkins mentioned this pull request Apr 4, 2020

groupby apply failed on dataframe with DatetimeIndex #26182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix groupby.apply #28662

BUG: Fix groupby.apply #28662

dsaxton commented Sep 27, 2019

dsaxton commented Sep 28, 2019

alimcmaster1 commented Sep 29, 2019

timothycrosley commented Oct 1, 2019

jbrockmendel commented Oct 1, 2019 •

edited

Loading

WillAyd left a comment

jreback commented Oct 22, 2019

jbrockmendel commented Nov 19, 2019

alimcmaster1 commented Dec 23, 2019

jreback commented Dec 27, 2019

jreback commented Jan 1, 2020

jorisvandenbossche commented Jan 6, 2020

dsaxton commented Jan 7, 2020 •

edited

Loading

BUG: Fix groupby.apply #28662

BUG: Fix groupby.apply #28662

Conversation

dsaxton commented Sep 27, 2019

dsaxton commented Sep 28, 2019

alimcmaster1 commented Sep 29, 2019

timothycrosley commented Oct 1, 2019

jbrockmendel commented Oct 1, 2019 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Oct 22, 2019

jbrockmendel commented Nov 19, 2019

alimcmaster1 commented Dec 23, 2019

jreback commented Dec 27, 2019

jreback commented Jan 1, 2020

jorisvandenbossche commented Jan 6, 2020

dsaxton commented Jan 7, 2020 • edited Loading

jbrockmendel commented Oct 1, 2019 •

edited

Loading

dsaxton commented Jan 7, 2020 •

edited

Loading