Skip to content

BUG: Fix Series.get() for ExtensionArray and Categorical #20885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 9, 2018
20 changes: 13 additions & 7 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3081,13 +3081,19 @@ def get_value(self, series, key):
# if we have something that is Index-like, then
# use this, e.g. DatetimeIndex
s = getattr(series, '_values', None)
if isinstance(s, (ExtensionArray, Index)) and is_scalar(key):
try:
return s[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just use .get_loc(key) on an Index as well does this work? (perf should be similar). That way we can avoid separating this.

except (IndexError, ValueError):

# invalid type as an indexer
pass
if is_scalar(key):
if isinstance(s, (Index, ExtensionArray)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be an and here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# GH 20825
# Unify Index and ExtensionArray treatment
# First try to convert the key to a location
# If that fails, see if key is an integer, and
# try that
try:
iloc = self.get_loc(key)
return s[iloc]
except KeyError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to the Indexerror case? is that not possible now?

if is_integer(key):
return s[key]

s = com._values_from_object(series)
k = com._values_from_object(key)
Expand Down
25 changes: 25 additions & 0 deletions pandas/tests/extension/base/getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,31 @@ def test_getitem_slice(self, data):
result = data[slice(1)] # scalar
assert isinstance(result, type(data))

def test_get(self, data):
# GH 20882
s = pd.Series(data, index=[2 * i for i in range(len(data))])
assert s.get(4) == s.iloc[2]

result = s.get([4, 6])
expected = s.iloc[[2, 3]]
self.assert_series_equal(result, expected)

result = s.get(slice(2))
expected = s.iloc[[0, 1]]
self.assert_series_equal(result, expected)

s = pd.Series(data[:6], index=list('abcdef'))
assert s.get('c') == s.iloc[2]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test for a slice like s.get(slice(2))? That seems to be valid as far as .get is concerned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your suggestion, I don't think you need all of those changes. (I also don't think they will work because self._engine.get_value() returns an item from the from the passed ndarray and you are using that to index into the series itself). Things work fine in how I did it when specifying a slice or multiple indices. The issue is when the key is a single value, which is what I took care of with my fix.

I'll push some additional tests.

result = s.get(slice('b', 'd'))
expected = s.iloc[[1, 2, 3]]
self.assert_series_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some cases with an out-of-range integer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

result = s.get('Z')
assert result is None

assert s.get(4) == s.iloc[4]

def test_take_sequence(self, data):
result = pd.Series(data)[[0, 1, 3]]
assert result.iloc[0] == data[0]
Expand Down