BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858

charlesdong1991 · 2019-08-11T15:01:39Z

closes BUG: Groupby.transform('cumcount') fails #27472
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

doc/source/whatsnew/v0.25.1.rst

TomAugspurger · 2019-08-12T18:18:28Z

pandas/tests/groupby/test_transform.py

+def test_transformation_kernels_length(func):
+    # This test is to evaluate if after transformation, the index
+    # of transformed data is still the same with original DataFrame
+    # TODO: exceptions are fillna, tshfit and corrwith


Ideally, these would be xfails in the param list

pytest.param('fillna', marks=pytest.mark.xfail('reason'),

Can you do that here?

Sure, nice tip! thanks! Also i found out not all transform functions are tested, e.g. fillna or tshift etc, is it okay I open another PR to add tests for the untested function? (some might have bug) @TomAugspurger

Yep, that would be great.

pep8speaks · 2019-08-13T17:22:42Z

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-23 19:38:32 UTC

charlesdong1991 · 2019-08-13T18:25:22Z

yeah, you are right, pushing to my pr again will somehow pass flaky tests! @TomAugspurger

TomAugspurger · 2019-08-13T18:53:32Z

pandas/tests/groupby/test_transform.py

+        "rank",
+        "shift",
+        "ngroup",
+        pytest.param("fillna", marks=pytest.mark.xfail(reason="TODO: potential bug")),


Can you find / open GitHub issues for each of these and refer to them in the reason?

yeah, actually I am not sure this is a bug, so add potential in front. Actually, there are some functions in transform that never got tested, those three are all among untested funcs. I am working on a pr now to add tests for those cases, since this PR is yet finished, I was supposed to wait for the review in this PR and afterwards submit the other PR. But I will of course open issue and refer those three to the new issue!

doc/source/whatsnew/v0.25.1.rst

TomAugspurger · 2019-08-14T20:23:07Z

@jbrockmendel can you take a glance at this?

jbrockmendel · 2019-08-14T21:34:58Z

will do

pandas/core/groupby/generic.py

pandas/tests/groupby/test_transform.py

charlesdong1991 · 2019-08-15T08:26:10Z

is flaky test failure back 😢? also saw the same failure in other PR @TomAugspurger

TomAugspurger · 2019-08-15T13:13:36Z

Restarted the azure builds.

charlesdong1991 · 2019-08-15T14:19:37Z

Thanks Tom, looks okay except one thing: what is codecov/project about? how to fix such issue? @TomAugspurger 😅

TomAugspurger · 2019-08-15T14:23:06Z

You can ignore that.

…

On Thu, Aug 15, 2019 at 9:19 AM Kaiqi Dong ***@***.***> wrote: Thanks Tom, looks okay except one thing: what is codecov/project about? how to fix such issue? @TomAugspurger <https://github.com/TomAugspurger> 😅 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27858?email_source=notifications&email_token=AAKAOIX2X44TKQ7AFY5S243QEVQYHA5CNFSM4IK373YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4L56JY#issuecomment-521658151>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIQ3U353VUT77DYHMRLQEVQYHANCNFSM4IK373YA> .

charlesdong1991 · 2019-08-19T16:35:14Z

any follow-up review? ^^ @TomAugspurger

charlesdong1991 · 2019-08-21T20:30:12Z

probably this is not gonna be merged today, then shall I move whatsnew note to 1.0.0? @TomAugspurger

TomAugspurger · 2019-08-21T20:45:36Z

Yes, that's probably best. I haven't had a chance to go through it again.

…

On Wed, Aug 21, 2019 at 3:30 PM Kaiqi Dong ***@***.***> wrote: if this is not gonna be merged today, then shall I move whatsnew note to 1.0.0? @TomAugspurger <https://github.com/TomAugspurger> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27858?email_source=notifications&email_token=AAKAOIX5VHOLTYPDD7TYZ3DQFWQV7A5CNFSM4IK373YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD43A4JA#issuecomment-523636260>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOISBW4IMYL7QBNGGL2DQFWQV7ANCNFSM4IK373YA> .

charlesdong1991 · 2019-08-21T20:51:41Z

ok, will move after the 0.25.1 releases

WillAyd · 2019-08-23T13:25:16Z

pandas/core/groupby/generic.py

-            if func in base.cythonized_kernels:
+
+            # transformation are added as well since they are broadcasted already
+            if func in base.cythonized_kernels or func in base.transformation_kernels:


Instead of this can't you just add cumcount to the transformation list? This somewhat blurs the line between cythonized_kernels which I think describe performance ops and transformation_kernels which describe output shape

WillAyd · 2019-08-23T13:27:30Z

pandas/tests/groupby/test_transform.py

+@pytest.mark.parametrize(
+    "func",
+    [
+        "backfill",


Is there not already a fixture for these we can leverage?

ok, will change! thanks for review!

WillAyd · 2019-08-23T13:27:53Z

pandas/tests/groupby/test_transform.py

@@ -1074,3 +1076,58 @@ def test_transform_lambda_with_datetimetz():
        name="time",
    )
    assert_series_equal(result, expected)
+
+
+def test_transform_cumcount_ngroup():


Is this not already covered by test below? Ideally we can rely on fixtures rather than one-off tests like this

it's a bit different than below... the test below is just to ensure after transform, the index of transformed result is the same as the original dataset.

But you are right, this should not be in one-off tests... will look for if there is fixture for this already @WillAyd

WillAyd · 2019-08-23T14:00:38Z

Can you also move to v1.0.0?

charlesdong1991 · 2019-08-23T14:02:44Z

Can you also move to v1.0.0?

yeah, of course.. was planning to do so

jreback

can you merge master and update to comments

jreback · 2019-09-08T16:18:08Z

pandas/core/groupby/base.py

@@ -158,6 +159,7 @@ def _gotitem(self, key, ndim, subset=None):
        "rank",
        "shift",
        "tshift",
+        "ngroup",


can you put it in the alphabetical order

jreback · 2019-09-08T16:18:37Z

pandas/core/groupby/base.py

@@ -120,7 +122,6 @@ def _gotitem(self, key, ndim, subset=None):
        "mean",
        "median",
        "min",
-        "ngroup",


leave this one

jreback · 2019-09-08T16:20:08Z

also needs a whatsnew note (1.0, groupby bug fix)

jreback · 2019-10-06T22:47:48Z

@charlesdong1991 can you merge master and update to any open comments

jbrockmendel · 2019-10-28T20:10:27Z

@charlesdong1991 can you rebase

charlesdong1991 added 9 commits December 3, 2018 17:43

remove \n from docstring

7e461a1

fix conflicts

1314059

Merge remote-tracking branch 'upstream/master'

8bcb313

Fix issue 27472

1bcf325

Merge remote-tracking branch 'upstream/master' into fix_issue_27472

5c96549

prettier

42fdb0b

Fix issue 27468

9b1af14

Add code commit

d520bcc

Update whatsnew

c298eaf

charlesdong1991 changed the title ~~BUG: DataFrameGroupBy.transform does not work with cumcount~~ BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount Aug 12, 2019

charlesdong1991 added 2 commits August 12, 2019 20:03

fix conflict

e2c51aa

revert change

c66ec83

TomAugspurger reviewed Aug 12, 2019

View reviewed changes

charlesdong1991 added 4 commits August 12, 2019 20:37

Code change based on review

4dc07eb

Fix test

9d60bbb

Fix linting error

0378a74

try to push again and pass flaky test

2a8e1ed

resumbit pr

9920344

TomAugspurger reviewed Aug 13, 2019

View reviewed changes

Code change based on review

9012e53

TomAugspurger added the Groupby label Aug 14, 2019

TomAugspurger added this to the 0.25.1 milestone Aug 14, 2019

jbrockmendel reviewed Aug 15, 2019

View reviewed changes

pandas/core/groupby/generic.py Show resolved Hide resolved

jbrockmendel reviewed Aug 15, 2019

View reviewed changes

pandas/tests/groupby/test_transform.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Aug 15, 2019

View reviewed changes

pandas/tests/groupby/test_transform.py Outdated Show resolved Hide resolved

Code change based on review

8d360a2

charlesdong1991 added 2 commits August 15, 2019 09:15

Merge remote-tracking branch 'upstream/master' into fix_issue_27472

8a371c6

Fix linting

e54f024

Merge remote-tracking branch 'upstream/master' into fix_issue_27472

5a33608

TomAugspurger modified the milestones: 0.25.1, 1.0 Aug 22, 2019

WillAyd requested changes Aug 23, 2019

View reviewed changes

remove from whatsnew

45c5339

jreback requested changes Sep 8, 2019

View reviewed changes

jreback added the Bug label Sep 8, 2019

charlesdong1991 closed this Oct 29, 2019

TomAugspurger mentioned this pull request Nov 12, 2019

BUG: Groupby.transform('cumcount') fails #27472

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858

BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858

charlesdong1991 commented Aug 11, 2019

TomAugspurger Aug 12, 2019

charlesdong1991 Aug 12, 2019

TomAugspurger Aug 12, 2019

pep8speaks commented Aug 13, 2019 •

edited

Loading

charlesdong1991 commented Aug 13, 2019

TomAugspurger Aug 13, 2019

charlesdong1991 Aug 13, 2019

TomAugspurger commented Aug 14, 2019

jbrockmendel commented Aug 14, 2019

charlesdong1991 commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019

charlesdong1991 commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019 via email

charlesdong1991 commented Aug 19, 2019

charlesdong1991 commented Aug 21, 2019 •

edited

Loading

TomAugspurger commented Aug 21, 2019 via email

charlesdong1991 commented Aug 21, 2019

WillAyd Aug 23, 2019

WillAyd Aug 23, 2019

charlesdong1991 Aug 23, 2019

WillAyd Aug 23, 2019

charlesdong1991 Aug 23, 2019

WillAyd commented Aug 23, 2019

charlesdong1991 commented Aug 23, 2019

jreback left a comment

jreback Sep 8, 2019

jreback Sep 8, 2019

jreback commented Sep 8, 2019

jreback commented Oct 6, 2019

jbrockmendel commented Oct 28, 2019

BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858

BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858

Conversation

charlesdong1991 commented Aug 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Aug 13, 2019 • edited Loading

Comment last updated at 2019-08-23 19:38:32 UTC

charlesdong1991 commented Aug 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Aug 14, 2019

jbrockmendel commented Aug 14, 2019

charlesdong1991 commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019

charlesdong1991 commented Aug 15, 2019

TomAugspurger commented Aug 15, 2019 via email

charlesdong1991 commented Aug 19, 2019

charlesdong1991 commented Aug 21, 2019 • edited Loading

TomAugspurger commented Aug 21, 2019 via email

charlesdong1991 commented Aug 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Aug 23, 2019

charlesdong1991 commented Aug 23, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 8, 2019

jreback commented Oct 6, 2019

jbrockmendel commented Oct 28, 2019

pep8speaks commented Aug 13, 2019 •

edited

Loading

charlesdong1991 commented Aug 21, 2019 •

edited

Loading