Skip to content

BUG: Index.get_loc raising incorrect error, closes #29189 #29700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,7 @@ Indexing
- Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`)
- :meth:`Index.union` could fail when the left contained duplicates (:issue:`28257`)
- :meth:`Index.get_indexer_non_unique` could fail with `TypeError` in some cases, such as when searching for ints in a string index (:issue:`28257`)
-
- Bug in :meth:`Float64Index.get_loc` incorrectly raising ``TypeError`` instead of ``KeyError`` (:issue:`29189`)

Missing
^^^^^^^
Expand Down
8 changes: 6 additions & 2 deletions pandas/_libs/index.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,12 @@ cdef class IndexEngine:

if self.is_monotonic_increasing:
values = self._get_index_values()
left = values.searchsorted(val, side='left')
right = values.searchsorted(val, side='right')
try:
left = values.searchsorted(val, side='left')
right = values.searchsorted(val, side='right')
except TypeError:
# e.g. GH#29189 get_loc(None) with a Float64Index
raise KeyError(val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what do you think about just catching the TypeError in groupby? Seems a little strange to catch and re-raise as a KeyError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what do you think about just catching the TypeError in groupby

My knee-jerk reaction is that this works against the direction we've been working on for a few weeks in groupby. But if there's a compelling case that we can't catch at a lower level, it definitely beats not-catching

Seems a little strange to catch and re-raise as a KeyError

I guess that depends on the official/desired signature/purpose of get_loc (the docstring doesnt say anything about what it raises), but I think the behavior below is probably not what we want:

ser = pd.Series([2, 5, 6, 8], index=[2.0, 4.0, 4.0, 5.0])
ser2 = ser.set_axis(ser.index.astype("int64"))

>>> ser[None]   # <-- TypeError
>>> ser2[None]   # <-- IndexError

# if we slice so as to not have duplicates...
>>> ser[::2][None]  # <-- KeyError
>>> ser2[::2][None]  # <-- KeyError

# if we slice so as to not be monotonic increasing...
>>> ser[::-1][None]  # <-- KeyError
>>> ser2[::-1][None]  # <-- KeyError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with those aspects of indexing but would have expected all to raise a TypeError when the type of the object being passed in is incompatible with the dtype and a KeyError when valid type but simply not present. probably more of a @jreback question

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we specificially catch everything in .get_loc, which to enable it to always return an indexer (might be -1).


diff = right - left
if diff == 0:
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1953,6 +1953,16 @@ def test_groupby_only_none_group():
tm.assert_series_equal(actual, expected)


def test_groupby_duplicate_index():
# GH#29189 the groupby call here used to raise
ser = pd.Series([2, 5, 6, 8], index=[2.0, 4.0, 4.0, 5.0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not already have a similar test elsewhere? Not a blocker for this PR but feels like we should have a parametrized or more broad-reaching test for this in groupby

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question and I dont have a good answer. i'll take a look tomorrow; most likely parametrizing/de-duplicating will need a dedicated pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this affects the exceptions we get for other Index subclasses too, so I'll prioritize fleshing this out in the AM

gb = ser.groupby(level=0)

result = gb.mean()
expected = pd.Series([2, 5.5, 8], index=[2.0, 4.0, 5.0])
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
def test_bool_aggs_dup_column_labels(bool_agg_func):
# 21668
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1209,3 +1209,16 @@ def test_1tuple_without_multiindex():
result = ser[key]
expected = ser[key[0]]
tm.assert_series_equal(result, expected)


def test_duplicate_index_mistyped_key_raises_keyerror():
# GH#29189 float_index.get_loc(None) should raise KeyError, not TypeError
ser = pd.Series([2, 5, 6, 8], index=[2.0, 4.0, 4.0, 5.0])
with pytest.raises(KeyError):
ser[None]

with pytest.raises(KeyError):
ser.index.get_loc(None)

with pytest.raises(KeyError):
ser.index._engine.get_loc(None)