Skip to content

BUG Fixing columns dropped from multi index in group by transform GH4… #47840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 17, 2022

Conversation

mattB1989
Copy link
Contributor

@mattB1989 mattB1989 commented Jul 24, 2022

…7787

@pep8speaks
Copy link

pep8speaks commented Jul 24, 2022

Hello @mattB1989! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-08-16 17:05:43 UTC

@mattB1989
Copy link
Contributor Author

@github-actions pre-commit

@mattB1989
Copy link
Contributor Author

@mroeschke This is similar to #47672 but targets a different use case

@mattB1989
Copy link
Contributor Author

@rhshadrach let me know if there is anything else you'd like me to change

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@rhshadrach
Copy link
Member

@mattB1989 - what do you think of this:

def test_group_on_empty_multiindex(transformation_func):
    # GH 47787
    # Ensure empty vs non-empty frame/series gives same results
    df = DataFrame(
        data=[[1, Timestamp("today"), 3, 4]],
        columns=["col_1", "col_2", "col_3", "col_4"],
    )
    df = df.set_index(["col_1", "col_2"])
    args = get_groupby_method_args(transformation_func, df)

    result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df.groupby(["col_1"]).transform(transformation_func, *args)[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

    result = df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

@mattB1989
Copy link
Contributor Author

@mattB1989 - what do you think of this:

def test_group_on_empty_multiindex(transformation_func):
    # GH 47787
    # Ensure empty vs non-empty frame/series gives same results
    df = DataFrame(
        data=[[1, Timestamp("today"), 3, 4]],
        columns=["col_1", "col_2", "col_3", "col_4"],
    )
    df = df.set_index(["col_1", "col_2"])
    args = get_groupby_method_args(transformation_func, df)

    result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df.groupby(["col_1"]).transform(transformation_func, *args)[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

    result = df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
    expected = df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
    if transformation_func in ("diff", "shift"):
        expected = expected.astype(int)
    tm.assert_equal(result, expected)

That's much better than what I have. Committed it now

@mattB1989
Copy link
Contributor Author

there seems to be issues unrelated with the pr - not sure if I can rerun those manually

@rhshadrach
Copy link
Member

The 32-bit tests are failing for diff and shift:

tm.assert_equal(result, expected)
E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="col_3") are different
E       
E       Attribute "dtype" are different
E       [left]:  int64
E       [right]: int32

Can you set the dtype to be np.int64 instead.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; failure is unrelated:

ERROR pandas/tests/tseries/offsets/test_dst.py - ValueError: could not conver...
ERROR pandas/tests/tseries/offsets/test_dst.py - ValueError: could not conver...

@rhshadrach rhshadrach requested a review from mroeschke August 17, 2022 01:22
@mroeschke mroeschke added this to the 1.5 milestone Aug 17, 2022
@mroeschke mroeschke merged commit 89d024e into pandas-dev:main Aug 17, 2022
@mroeschke
Copy link
Member

Thanks @mattB1989

@mattB1989 mattB1989 deleted the issue47787 branch August 17, 2022 05:28
@rhshadrach
Copy link
Member

Thanks for all the work here @mattB1989!

noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
pandas-dev#47840)

* BUG Fixing columns dropped from multi index in group by transform GH47787

* fixing pep8 issues

* testing series as well as dataframe

* fixing typo

* adding a timestamp in the index so tshift fails with the right error

* fixing formatting

* using the module assert

* adding a test on the dataframe

* improve test post review

* typo fix

* explicitly casting to int

Co-authored-by: matt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Groupby on a partial index with fillna drops the grouped by key when the dataframe is empty
4 participants