Skip to content

BUG: Ensure same index is returned for slow and fast path in groupby.apply #31613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 27, 2020

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Feb 3, 2020

This fixes the internal check to be consistent with the slow apply path

@pep8speaks
Copy link

pep8speaks commented Feb 3, 2020

Hello @fjetter! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-27 12:48:36 UTC

@fjetter fjetter force-pushed the bugfix/fast_slow_index branch from 7169acf to ffbcbe3 Compare February 3, 2020 13:17
@TomAugspurger TomAugspurger added this to the 1.0.1 milestone Feb 3, 2020
@fjetter fjetter force-pushed the bugfix/fast_slow_index branch 3 times, most recently from 8838d36 to ad1ec2f Compare February 4, 2020 08:34
@@ -851,3 +851,39 @@ def test_apply_function_returns_non_pandas_non_scalar(function, expected_values)
result = df.groupby("groups").apply(function)
expected = pd.Series(expected_values, index=pd.Index(["A", "B"], name="groups"))
tm.assert_series_equal(result, expected)


def test_apply_fast_slow_identical():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if we have similar fast/slow tests; place these by them (if not ok).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few tests further up the file wich affect similar code paths, e.g. test_fast_apply or test_group_apply_once_per_group which seems a suitable place

@jreback
Copy link
Contributor

jreback commented Feb 5, 2020

are any of the other issues you referenced closed by this? (aside from the ones you listed)

@jreback jreback added the Bug label Feb 5, 2020
@jreback
Copy link
Contributor

jreback commented Feb 5, 2020

is this a regression from 0.25.x?

@fjetter
Copy link
Member Author

fjetter commented Feb 5, 2020

are any of the other issues you referenced closed by this? (aside from the ones you listed)

The issues I referenced in the issue are already closed. I primarily referenced them to highlight this as a hotspot.

is this a regression from 0.25.x

I believe this is not a regression and has been around for a while now. I'll check anyhow to confirm.

  • check if it a regression

@jorisvandenbossche
Copy link
Member

If this is not a regression, let's keep it for 1.1 then, I would say

@TomAugspurger
Copy link
Contributor

Agreed.

@jorisvandenbossche jorisvandenbossche modified the milestones: 1.0.1, 1.1 Feb 5, 2020
@jreback
Copy link
Contributor

jreback commented Feb 9, 2020

also pls merge master

@jbrockmendel
Copy link
Member

I've found that if I disable fast_apply, then there is a test failure in tests.groupby.test_whitelist for tshift. any idea why this doesnt affect that?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, but have comments

@jreback
Copy link
Contributor

jreback commented Apr 10, 2020

can you merge master and ping on green and we can look

@fjetter
Copy link
Member Author

fjetter commented Apr 15, 2020

sorry, was a bit swamped. Will rebase and address the comments

@fjetter fjetter force-pushed the bugfix/fast_slow_index branch from ad1ec2f to ec34b4a Compare April 15, 2020 09:14
@fjetter fjetter changed the title groupy apply: Ensure same index is returned for slow and fast path BUG: Ensure same index is returned for slow and fast path in groupby.apply Apr 15, 2020
@fjetter fjetter force-pushed the bugfix/fast_slow_index branch 3 times, most recently from 2187f4d to 0f6f8a2 Compare April 15, 2020 12:13
@fjetter
Copy link
Member Author

fjetter commented Apr 15, 2020

@jreback I rebased but it seems there is a (probably unrelated) test failure on the azure pipelines with test_tab_complete_* tests. Is this a known issue?

@jbrockmendel
Copy link
Member

there is a (probably unrelated) test failure on the azure pipelines with test_tab_complete_* tests. Is this a known issue?

It is unrelated, but not yet trouble-shot.

@simonjayhawkins
Copy link
Member

there is a (probably unrelated) test failure on the azure pipelines with test_tab_complete_* tests. Is this a known issue?

It is unrelated, but not yet trouble-shot.

@fjetter #33567 has been fixed. can you merge upstream/master.

@fjetter fjetter force-pushed the bugfix/fast_slow_index branch from 0f6f8a2 to 8503c04 Compare May 25, 2020 07:40
@fjetter
Copy link
Member Author

fjetter commented May 25, 2020

I rebased on master and kept the changelog for 1.1.0

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, if you can update the whatsnew note a bit (ok to make a bit longer; re-reading this is not that common i think), but still want to signal the intentions to the user a bit more.

@@ -827,6 +827,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.agg`, :meth:`GroupBy.transform`, and :meth:`GroupBy.resample` where subclasses are not preserved (:issue:`28330`)
- Bug in :meth:`GroupBy.rolling.apply` ignores args and kwargs parameters (:issue:`33433`)
- Bug in :meth:`DataFrameGroupby.std` and :meth:`DataFrameGroupby.sem` would modify grouped-by columns when ``as_index=False`` (:issue:`10355`)
- Bug in :meth:`core.groupby.DataFrameGroupBy.apply` where the result shape was incorrect when the output index was not identical to the input index (:issue:`31612`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this more user informative, e.g. what case does this happen

@@ -827,6 +827,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.agg`, :meth:`GroupBy.transform`, and :meth:`GroupBy.resample` where subclasses are not preserved (:issue:`28330`)
- Bug in :meth:`GroupBy.rolling.apply` ignores args and kwargs parameters (:issue:`33433`)
- Bug in :meth:`DataFrameGroupby.std` and :meth:`DataFrameGroupby.sem` would modify grouped-by columns when ``as_index=False`` (:issue:`10355`)
- Bug in :meth:`core.groupby.DataFrameGroupBy.apply` where the result shape was incorrect when the output index was not identical to the input index (:issue:`31612`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also list the other closed issue here #14927

@jreback
Copy link
Contributor

jreback commented May 25, 2020

ping on green.

@fjetter
Copy link
Member Author

fjetter commented May 27, 2020

Trying to rewrite the changelog. pretty difficult to actually list all tickets affected by this since most users don't go into depths that far.
There is also an umbrella issue which tracks these kind of things which we may be able to close now #13056 another opinion appreciated)

@jreback
Copy link
Contributor

jreback commented May 27, 2020

Trying to rewrite the changelog. pretty difficult to actually list all tickets affected by this since most users don't go into depths that far.
There is also an umbrella issue which tracks these kind of things which we may be able to close now #13056 another opinion appreciated)

what you wrote is ok, the idea is really to just list the affected tickets so that users can click thru if desired. when we close an issue we want it referenced in the whatsnew and a test to confirm :->

@jreback jreback merged commit c382a19 into pandas-dev:master May 27, 2020
@jreback
Copy link
Contributor

jreback commented May 27, 2020

thanks @fjetter very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants