-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: groupby of idxmax/min with mixed types #40795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bsun94
commented
Apr 6, 2021
- closes TST: groupby of idxmax/min with mixed types #40346
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry - didn't think this was necessary, just added a test case to a solved problem.
…cel file, and fixed typo via pandas-dev#38990
…eets.rst Co-authored-by: Aidan Feldman <[email protected]>
…st_idxmax_mixed_dtypes for pandas-dev#40346
@@ -1025,6 +1025,27 @@ def test_idxmax_mixed_dtype(self): | |||
expected = Series([0, 2, 1, 2], index=[1, 2, 3, 4]) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
# with convert dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make a new test for this.
Could you try a parametrization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing, I'll separate them out and use parametrization
@@ -1025,6 +1025,46 @@ def test_idxmax_mixed_dtype(self): | |||
expected = Series([0, 2, 1, 2], index=[1, 2, 3, 4]) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
@pytest.mark.parametrize( | |||
"ID, value, expected_index, expected_value", | |||
[([100, 100, 100, 200, 200, 200], [0, 0, 0, 1, 2, 0], [100, 200], [0, 4])], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please parametrize over ecpected values and the applied function only, you will need only one test then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @phofl , will redo and push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looks good! Just some style/consistency nit-picks.
} | ||
) | ||
|
||
df = df.convert_dtypes().groupby("ID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using convert_dtypes
, better to just specify dtype="Int64"
in DataFrame construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just like to confirm on this point - in the discussion I had with Marco in issue #40346, I understood the spirit of this new check as to replicate the conditions under which the original bug was found (in using convert_dtypes
before idxmax
) to make sure the bug was fixed. Would it not be preferable, in that case, to keep using the method which caused the original bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rhshadrach , I'd just like to follow-up on the above - please let me know, and would be happy to push through all the updates together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder here, sorry it fell off my radar. From #40346 (comment) I believe @MarcoGorelli is just looking to create a DataFrame with nullable integers. My suggested change accomplishes this, but with using less of the pandas API. In general, it's desirable for tests to use as little of the API (besides what they are testing!) as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks @rhshadrach! I'll push the changes through
"func_name, expected_value", | ||
[("idxmax", [0, 4]), ("idxmin", [0, 5])], | ||
) | ||
def test_idxmax_idxmin_convert_dtypes(self, func_name, expected_value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consistency with other tests, and you rename func_name -> op
df = df.convert_dtypes().groupby("ID") | ||
func = getattr(df, func_name) | ||
|
||
result = func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just call in the line above, no need for the intermediate func
.
thanks @bsun94 |