-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrameGroupBy.transform and ngroup do not work with cumcount #27858
Conversation
def test_transformation_kernels_length(func): | ||
# This test is to evaluate if after transformation, the index | ||
# of transformed data is still the same with original DataFrame | ||
# TODO: exceptions are fillna, tshfit and corrwith |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, these would be xfails in the param list
pytest.param('fillna', marks=pytest.mark.xfail('reason'),
Can you do that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, nice tip! thanks! Also i found out not all transform functions are tested, e.g. fillna
or tshift
etc, is it okay I open another PR to add tests for the untested function? (some might have bug) @TomAugspurger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would be great.
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-08-23 19:38:32 UTC |
yeah, you are right, pushing to my pr again will somehow pass flaky tests! @TomAugspurger |
"rank", | ||
"shift", | ||
"ngroup", | ||
pytest.param("fillna", marks=pytest.mark.xfail(reason="TODO: potential bug")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you find / open GitHub issues for each of these and refer to them in the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, actually I am not sure this is a bug, so add potential
in front. Actually, there are some functions in transform
that never got tested, those three are all among untested funcs. I am working on a pr now to add tests for those cases, since this PR is yet finished, I was supposed to wait for the review in this PR and afterwards submit the other PR. But I will of course open issue and refer those three to the new issue!
@jbrockmendel can you take a glance at this? |
will do |
is flaky test failure back 😢? also saw the same failure in other PR @TomAugspurger |
Restarted the azure builds. |
Thanks Tom, looks okay except one thing: what is |
You can ignore that.
…On Thu, Aug 15, 2019 at 9:19 AM Kaiqi Dong ***@***.***> wrote:
Thanks Tom, looks okay except one thing: what is codecov/project about?
how to fix such issue? @TomAugspurger <https://github.com/TomAugspurger>
😅
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27858?email_source=notifications&email_token=AAKAOIX2X44TKQ7AFY5S243QEVQYHA5CNFSM4IK373YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4L56JY#issuecomment-521658151>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQ3U353VUT77DYHMRLQEVQYHANCNFSM4IK373YA>
.
|
any follow-up review? ^^ @TomAugspurger |
probably this is not gonna be merged today, then shall I move whatsnew note to 1.0.0? @TomAugspurger |
Yes, that's probably best. I haven't had a chance to go through it again.
…On Wed, Aug 21, 2019 at 3:30 PM Kaiqi Dong ***@***.***> wrote:
if this is not gonna be merged today, then shall I move whatsnew note to
1.0.0? @TomAugspurger <https://github.com/TomAugspurger>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27858?email_source=notifications&email_token=AAKAOIX5VHOLTYPDD7TYZ3DQFWQV7A5CNFSM4IK373YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD43A4JA#issuecomment-523636260>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOISBW4IMYL7QBNGGL2DQFWQV7ANCNFSM4IK373YA>
.
|
ok, will move after the 0.25.1 releases |
if func in base.cythonized_kernels: | ||
|
||
# transformation are added as well since they are broadcasted already | ||
if func in base.cythonized_kernels or func in base.transformation_kernels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this can't you just add cumcount to the transformation list? This somewhat blurs the line between cythonized_kernels
which I think describe performance ops and transformation_kernels
which describe output shape
@pytest.mark.parametrize( | ||
"func", | ||
[ | ||
"backfill", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not already a fixture for these we can leverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will change! thanks for review!
@@ -1074,3 +1076,58 @@ def test_transform_lambda_with_datetimetz(): | |||
name="time", | |||
) | |||
assert_series_equal(result, expected) | |||
|
|||
|
|||
def test_transform_cumcount_ngroup(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not already covered by test below? Ideally we can rely on fixtures rather than one-off tests like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit different than below... the test below is just to ensure after transform
, the index of transformed result is the same as the original dataset.
But you are right, this should not be in one-off tests... will look for if there is fixture for this already @WillAyd
Can you also move to v1.0.0? |
yeah, of course.. was planning to do so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you merge master and update to comments
@@ -158,6 +159,7 @@ def _gotitem(self, key, ndim, subset=None): | |||
"rank", | |||
"shift", | |||
"tshift", | |||
"ngroup", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put it in the alphabetical order
@@ -120,7 +122,6 @@ def _gotitem(self, key, ndim, subset=None): | |||
"mean", | |||
"median", | |||
"min", | |||
"ngroup", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave this one
also needs a whatsnew note (1.0, groupby bug fix) |
@charlesdong1991 can you merge master and update to any open comments |
@charlesdong1991 can you rebase |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff