Skip to content

BUG: iloc raising for ea series #50171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 13, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` raising ``ValueError`` with ``bool`` indexer and :class:`MultiIndex` (:issue:`47687`)
- Bug in :meth:`DataFrame.__setitem__` raising ``ValueError`` when right hand side is :class:`DataFrame` with :class:`MultiIndex` columns (:issue:`49121`)
- Bug in :meth:`DataFrame.reindex` casting dtype to ``object`` when :class:`DataFrame` has single extension array column when re-indexing ``columns`` and ``index`` (:issue:`48190`)
- Bug in :meth:`DataFrame.iloc` raising ``IndexError`` when indexer is a :class:`Series` with numeric extension array dtype (:issue:`49521`)
- Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
- Bug in :meth:`DataFrame.compare` does not recognize differences when comparing ``NA`` with value in nullable dtypes (:issue:`48939`)
-
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1481,7 +1481,10 @@ def _validate_key(self, key, axis: AxisInt):
# so don't treat a tuple as a valid indexer
raise IndexingError("Too many indexers")
elif is_list_like_indexer(key):
arr = np.array(key)
if isinstance(key, ABCSeries):
arr = key._values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about integer arrays with pd.NA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is validated later

Copy link
Contributor

@topper-123 topper-123 Dec 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are different for Series and IntegerArray:

>>> import pandas as pd
>>> df = pd.DataFrame([[0,1,2,3,4],[5,6,7,8,9]])
>>> iarr = pd.array([0,1,2, pd.NA], dtype = pd.Int64Dtype())
>>> df.iloc[:, iarr]
IndexError: .iloc requires numeric indexers, got [0 1 2 <NA>]
df.iloc[:, pd.Series(iarr)]
ValueError: cannot convert to 'int64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.

I think the ValueError should be raised in both cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you tried the series case on main? I am getting consistent errors on my pr.

In General I would prefer a better suited error, pointing to NA in iloc. But will probably do in a follow up. We will have to check this later, because not all cases get here and this does not cover the series.iloc case

Copy link
Contributor

@topper-123 topper-123 Dec 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wrong snippet. It's the non-NA array:

>>> import pandas as pd
>>> df = pd.DataFrame([[0,1,2,3,4],[5,6,7,8,9]])
>>> iarr = pd.array([0,1,2], dtype = pd.Int64Dtype())
>>> df.iloc[:, iarr]
IndexError: .iloc requires numeric indexers, got [0 1]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got you, fixed that as well

else:
arr = np.array(key)
len_axis = len(self.obj._get_axis(axis))

# check that the key has a numeric dtype
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/frame/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1437,6 +1437,13 @@ def test_loc_rhs_empty_warning(self):
df.loc[:, "a"] = rhs
tm.assert_frame_equal(df, expected)

def test_iloc_ea_series_indexer(self):
# GH#49521
df = DataFrame([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
result = df.iloc[Series([1], dtype="Int64"), Series([0, 1], dtype="Int64")]
expected = DataFrame([[5, 6]], index=[1])
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty suire this would fail if a pd.NA was in the indexers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it does, which is fine. Will add a test


@pytest.mark.parametrize("indexer", [True, (True,)])
@pytest.mark.parametrize("dtype", [bool, "boolean"])
def test_loc_bool_multiindex(self, dtype, indexer):
Expand Down