Skip to content

TST: groupby of idxmax/min with mixed types #40795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 14, 2021
22 changes: 22 additions & 0 deletions pandas/tests/frame/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1025,6 +1025,28 @@ def test_idxmax_mixed_dtype(self):
expected = Series([0, 2, 1, 2], index=[1, 2, 3, 4])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"func_name, expected_value",
[("idxmax", [0, 4]), ("idxmin", [0, 5])],
)
def test_idxmax_idxmin_convert_dtypes(self, func_name, expected_value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency with other tests, and you rename func_name -> op

df = DataFrame(
{
"ID": [100, 100, 100, 200, 200, 200],
"value": [0, 0, 0, 1, 2, 0],
}
)

df = df.convert_dtypes().groupby("ID")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using convert_dtypes, better to just specify dtype="Int64" in DataFrame construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just like to confirm on this point - in the discussion I had with Marco in issue #40346, I understood the spirit of this new check as to replicate the conditions under which the original bug was found (in using convert_dtypes before idxmax) to make sure the bug was fixed. Would it not be preferable, in that case, to keep using the method which caused the original bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rhshadrach , I'd just like to follow-up on the above - please let me know, and would be happy to push through all the updates together

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder here, sorry it fell off my radar. From #40346 (comment) I believe @MarcoGorelli is just looking to create a DataFrame with nullable integers. My suggested change accomplishes this, but with using less of the pandas API. In general, it's desirable for tests to use as little of the API (besides what they are testing!) as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks @rhshadrach! I'll push the changes through

func = getattr(df, func_name)

result = func()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just call in the line above, no need for the intermediate func.

expected = DataFrame(
{"value": expected_value},
index=Index([100, 200], dtype="object", name="ID"),
)
tm.assert_frame_equal(result, expected)

def test_idxmax_dt64_multicolumn_axis1(self):
dti = date_range("2016-01-01", periods=3)
df = DataFrame({3: dti, 4: dti[::-1]})
Expand Down