Skip to content

TST: groupby of idxmax/min with mixed types #40795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 14, 2021

Conversation

bsun94
Copy link
Contributor

@bsun94 bsun94 commented Apr 6, 2021

@@ -1025,6 +1025,27 @@ def test_idxmax_mixed_dtype(self):
expected = Series([0, 2, 1, 2], index=[1, 2, 3, 4])
tm.assert_series_equal(result, expected)

# with convert dtypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a new test for this.

Could you try a parametrization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing, I'll separate them out and use parametrization

@@ -1025,6 +1025,46 @@ def test_idxmax_mixed_dtype(self):
expected = Series([0, 2, 1, 2], index=[1, 2, 3, 4])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"ID, value, expected_index, expected_value",
[([100, 100, 100, 200, 200, 200], [0, 0, 0, 1, 2, 0], [100, 200], [0, 4])],
Copy link
Member

@phofl phofl Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please parametrize over ecpected values and the applied function only, you will need only one test then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @phofl , will redo and push

@jreback jreback changed the title 40346 groupby error TST: groupby of idxmax/min with mixed types Apr 7, 2021
@jreback jreback added Groupby Testing pandas testing functions or related to the test suite labels Apr 7, 2021
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, looks good! Just some style/consistency nit-picks.

}
)

df = df.convert_dtypes().groupby("ID")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using convert_dtypes, better to just specify dtype="Int64" in DataFrame construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just like to confirm on this point - in the discussion I had with Marco in issue #40346, I understood the spirit of this new check as to replicate the conditions under which the original bug was found (in using convert_dtypes before idxmax) to make sure the bug was fixed. Would it not be preferable, in that case, to keep using the method which caused the original bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rhshadrach , I'd just like to follow-up on the above - please let me know, and would be happy to push through all the updates together

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder here, sorry it fell off my radar. From #40346 (comment) I believe @MarcoGorelli is just looking to create a DataFrame with nullable integers. My suggested change accomplishes this, but with using less of the pandas API. In general, it's desirable for tests to use as little of the API (besides what they are testing!) as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks @rhshadrach! I'll push the changes through

"func_name, expected_value",
[("idxmax", [0, 4]), ("idxmin", [0, 5])],
)
def test_idxmax_idxmin_convert_dtypes(self, func_name, expected_value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency with other tests, and you rename func_name -> op

df = df.convert_dtypes().groupby("ID")
func = getattr(df, func_name)

result = func()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just call in the line above, no need for the intermediate func.

@jreback jreback added this to the 1.3 milestone Apr 14, 2021
@jreback jreback merged commit 32d650d into pandas-dev:master Apr 14, 2021
@jreback
Copy link
Contributor

jreback commented Apr 14, 2021

thanks @bsun94

yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request Apr 21, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: groupby of idxmax/min with mixed types
4 participants