-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index.get_loc raising incorrect error, closes #29189 #29700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
e6ea4da
f3e6147
d4eb26d
a57f197
edcf818
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1951,3 +1951,13 @@ def test_groupby_only_none_group(): | |
expected = pd.Series([np.nan], name="x") | ||
|
||
tm.assert_series_equal(actual, expected) | ||
|
||
|
||
def test_groupby_duplicate_index(): | ||
# GH#29189 the groupby call here used to raise | ||
ser = pd.Series([2, 5, 6, 8], index=[2.0, 4.0, 4.0, 5.0]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we not already have a similar test elsewhere? Not a blocker for this PR but feels like we should have a parametrized or more broad-reaching test for this in groupby There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good question and I dont have a good answer. i'll take a look tomorrow; most likely parametrizing/de-duplicating will need a dedicated pass There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this affects the exceptions we get for other Index subclasses too, so I'll prioritize fleshing this out in the AM |
||
gb = ser.groupby(level=0) | ||
|
||
result = gb.mean() | ||
expected = pd.Series([2, 5.5, 8], index=[2.0, 4.0, 5.0]) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what do you think about just catching the TypeError in groupby? Seems a little strange to catch and re-raise as a KeyError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My knee-jerk reaction is that this works against the direction we've been working on for a few weeks in groupby. But if there's a compelling case that we can't catch at a lower level, it definitely beats not-catching
I guess that depends on the official/desired signature/purpose of get_loc (the docstring doesnt say anything about what it raises), but I think the behavior below is probably not what we want:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with those aspects of indexing but would have expected all to raise a TypeError when the type of the object being passed in is incompatible with the dtype and a KeyError when valid type but simply not present. probably more of a @jreback question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we specificially catch everything in
.get_loc
, which to enable it to always return an indexer (might be -1).