Skip to content

BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

charlesdong1991
Copy link
Member

@charlesdong1991 charlesdong1991 changed the title BUG: DataFrameGroupBy.transform does not work with cumcount BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount Aug 12, 2019
def test_transformation_kernels_length(func):
# This test is to evaluate if after transformation, the index
# of transformed data is still the same with original DataFrame
# TODO: exceptions are fillna, tshfit and corrwith
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, these would be xfails in the param list

    pytest.param('fillna', marks=pytest.mark.xfail('reason'),

Can you do that here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, nice tip! thanks! Also i found out not all transform functions are tested, e.g. fillna or tshift etc, is it okay I open another PR to add tests for the untested function? (some might have bug) @TomAugspurger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would be great.

@pep8speaks
Copy link

pep8speaks commented Aug 13, 2019

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-23 19:38:32 UTC

@charlesdong1991
Copy link
Member Author

yeah, you are right, pushing to my pr again will somehow pass flaky tests! @TomAugspurger

"rank",
"shift",
"ngroup",
pytest.param("fillna", marks=pytest.mark.xfail(reason="TODO: potential bug")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you find / open GitHub issues for each of these and refer to them in the reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, actually I am not sure this is a bug, so add potential in front. Actually, there are some functions in transform that never got tested, those three are all among untested funcs. I am working on a pr now to add tests for those cases, since this PR is yet finished, I was supposed to wait for the review in this PR and afterwards submit the other PR. But I will of course open issue and refer those three to the new issue!

@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 14, 2019
@TomAugspurger
Copy link
Contributor

@jbrockmendel can you take a glance at this?

@jbrockmendel
Copy link
Member

will do

@charlesdong1991
Copy link
Member Author

is flaky test failure back 😢? also saw the same failure in other PR @TomAugspurger

@TomAugspurger
Copy link
Contributor

Restarted the azure builds.

@charlesdong1991
Copy link
Member Author

Thanks Tom, looks okay except one thing: what is codecov/project about? how to fix such issue? @TomAugspurger 😅

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 15, 2019 via email

@charlesdong1991
Copy link
Member Author

any follow-up review? ^^ @TomAugspurger

@charlesdong1991
Copy link
Member Author

charlesdong1991 commented Aug 21, 2019

probably this is not gonna be merged today, then shall I move whatsnew note to 1.0.0? @TomAugspurger

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 21, 2019 via email

@charlesdong1991
Copy link
Member Author

ok, will move after the 0.25.1 releases

@TomAugspurger TomAugspurger modified the milestones: 0.25.1, 1.0 Aug 22, 2019
if func in base.cythonized_kernels:

# transformation are added as well since they are broadcasted already
if func in base.cythonized_kernels or func in base.transformation_kernels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this can't you just add cumcount to the transformation list? This somewhat blurs the line between cythonized_kernels which I think describe performance ops and transformation_kernels which describe output shape

@pytest.mark.parametrize(
"func",
[
"backfill",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not already a fixture for these we can leverage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will change! thanks for review!

@@ -1074,3 +1076,58 @@ def test_transform_lambda_with_datetimetz():
name="time",
)
assert_series_equal(result, expected)


def test_transform_cumcount_ngroup():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not already covered by test below? Ideally we can rely on fixtures rather than one-off tests like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit different than below... the test below is just to ensure after transform, the index of transformed result is the same as the original dataset.

But you are right, this should not be in one-off tests... will look for if there is fixture for this already @WillAyd

@WillAyd
Copy link
Member

WillAyd commented Aug 23, 2019

Can you also move to v1.0.0?

@charlesdong1991
Copy link
Member Author

Can you also move to v1.0.0?

yeah, of course.. was planning to do so

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge master and update to comments

@@ -158,6 +159,7 @@ def _gotitem(self, key, ndim, subset=None):
"rank",
"shift",
"tshift",
"ngroup",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put it in the alphabetical order

@@ -120,7 +122,6 @@ def _gotitem(self, key, ndim, subset=None):
"mean",
"median",
"min",
"ngroup",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this one

@jreback jreback added the Bug label Sep 8, 2019
@jreback
Copy link
Contributor

jreback commented Sep 8, 2019

also needs a whatsnew note (1.0, groupby bug fix)

@jreback
Copy link
Contributor

jreback commented Oct 6, 2019

@charlesdong1991 can you merge master and update to any open comments

@jbrockmendel
Copy link
Member

@charlesdong1991 can you rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Groupby.transform('cumcount') fails
6 participants