Skip to content

API: Raise on iloc indexing with a non-integer based boolean mask (GH3631) #3635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 19, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ pandas 0.11.1
- Add ``squeeze`` keyword to ``groupby`` to allow reduction from
DataFrame -> Series if groups are unique. Regression from 0.10.1,
partial revert on (GH2893_) with (GH3596_)
- Raise on ``iloc`` when boolean indexing with a label based indexer mask
e.g. a boolean Series, even with integer labels, will raise. Since ``iloc``
is purely positional based, the labels on the Series are not alignable (GH3631_)

**Bug Fixes**

Expand Down Expand Up @@ -182,6 +185,7 @@ pandas 0.11.1
.. _GH3624: https://github.com/pydata/pandas/issues/3624
.. _GH3626: https://github.com/pydata/pandas/issues/3626
.. _GH3601: https://github.com/pydata/pandas/issues/3601
.. _GH3631: https://github.com/pydata/pandas/issues/3631
.. _GH1512: https://github.com/pydata/pandas/issues/1512


Expand Down
8 changes: 0 additions & 8 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ three types of multi-axis indexing.
- An integer e.g. ``5``
- A list or array of integers ``[4, 3, 0]``
- A slice object with ints ``1:7``
- A boolean array

See more at :ref:`Selection by Position <indexing.integer>`

Expand Down Expand Up @@ -291,7 +290,6 @@ The ``.iloc`` attribute is the primary access method. The following are valid in
- An integer e.g. ``5``
- A list or array of integers ``[4, 3, 0]``
- A slice object with ints ``1:7``
- A boolean array

.. ipython:: python

Expand Down Expand Up @@ -329,12 +327,6 @@ Select via integer list

df1.iloc[[1,3,5],[1,3]]

Select via boolean array

.. ipython:: python

df1.iloc[:,df1.iloc[0]>0]

For slicing rows explicitly (equiv to deprecated ``df.irow(slice(1,3))``).

.. ipython:: python
Expand Down
22 changes: 22 additions & 0 deletions doc/source/v0.11.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,27 @@ API changes
# no squeezing (the default, and behavior in 0.10.1)
df2.groupby("val1").apply(func)

- Raise on ``iloc`` when boolean indexing with a label based indexer mask
e.g. a boolean Series, even with integer labels, will raise. Since ``iloc``
is purely positional based, the labels on the Series are not alignable (GH3631_)

This case is rarely used, and there are plently of alternatives. This preserves the
``iloc`` API to be *purely* positional based.

.. ipython:: python

df = DataFrame(range(5), list('ABCDE'), columns=['a'])
mask = (df.a%2 == 0)
mask

# this is what you should use
df.loc[mask]

# this will work as well
df.iloc[mask.values]

``df.iloc[mask]`` will raise a ``ValueError``


Enhancements
~~~~~~~~~~~~
Expand Down Expand Up @@ -74,3 +95,4 @@ on GitHub for a complete list.
.. _GH3435: https://github.com/pydata/pandas/issues/3435
.. _GH1512: https://github.com/pydata/pandas/issues/1512
.. _GH2285: https://github.com/pydata/pandas/issues/2285
.. _GH3631: https://github.com/pydata/pandas/issues/3631
11 changes: 10 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -775,7 +775,14 @@ class _iLocIndexer(_LocationIndexer):
_exception = IndexError

def _has_valid_type(self, key, axis):
return isinstance(key, slice) or com.is_integer(key) or com._is_bool_indexer(key) or _is_list_like(key)
if com._is_bool_indexer(key):
if hasattr(key,'index') and isinstance(key.index,Index):
if key.index.inferred_type == 'integer':
raise NotImplementedError("iLocation based boolean indexing on an integer type is not available")
raise ValueError("iLocation based boolean indexing cannot use an indexable as a mask")
return True

return isinstance(key, slice) or com.is_integer(key) or _is_list_like(key)

def _getitem_tuple(self, tup):

Expand Down Expand Up @@ -811,9 +818,11 @@ def _get_slice_axis(self, slice_obj, axis=0):
def _getitem_axis(self, key, axis=0):

if isinstance(key, slice):
self._has_valid_type(key,axis)
return self._get_slice_axis(key, axis=axis)

elif com._is_bool_indexer(key):
self._has_valid_type(key,axis)
return self._getbool_axis(key, axis=axis)

# a single integer or a list of integers
Expand Down
54 changes: 54 additions & 0 deletions pandas/tests/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -888,6 +888,60 @@ def test_multi_assign(self):
df2.ix[mask, cols]= dft.ix[mask, cols].values
assert_frame_equal(df2,expected)

def test_iloc_mask(self):

# GH 3631, iloc with a mask (of a series) should raise
df = DataFrame(range(5), list('ABCDE'), columns=['a'])
mask = (df.a%2 == 0)
self.assertRaises(ValueError, df.iloc.__getitem__, tuple([mask]))
mask.index = range(len(mask))
self.assertRaises(NotImplementedError, df.iloc.__getitem__, tuple([mask]))

# ndarray ok
result = df.iloc[np.array([True] * len(mask),dtype=bool)]
assert_frame_equal(result,df)

# the possibilities
locs = np.arange(4)
nums = 2**locs
reps = map(bin, nums)
df = DataFrame({'locs':locs, 'nums':nums}, reps)

expected = {
(None,'') : '0b1100',
(None,'.loc') : '0b1100',
(None,'.iloc') : '0b1100',
('index','') : '0b11',
('index','.loc') : '0b11',
('index','.iloc') : 'iLocation based boolean indexing cannot use an indexable as a mask',
('locs','') : 'Unalignable boolean Series key provided',
('locs','.loc') : 'Unalignable boolean Series key provided',
('locs','.iloc') : 'iLocation based boolean indexing on an integer type is not available',
}

import warnings
warnings.filterwarnings(action='ignore', category=UserWarning)
result = dict()
for idx in [None, 'index', 'locs']:
mask = (df.nums>2).values
if idx:
mask = Series(mask, list(reversed(getattr(df, idx))))
for method in ['', '.loc', '.iloc']:
try:
if method:
accessor = getattr(df, method[1:])
else:
accessor = df
ans = str(bin(accessor[mask]['nums'].sum()))
except Exception, e:
ans = str(e)

key = tuple([idx,method])
r = expected.get(key)
if r != ans:
raise AssertionError("[%s] does not match [%s], received [%s]" %
(key,ans,r))
warnings.filterwarnings(action='always', category=UserWarning)

if __name__ == '__main__':
import nose
Expand Down