Skip to content

Inconsistent behavior when boolean mask has inconsistent size with a Series #27049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Jun 26, 2019 · 2 comments
Closed

Comments

@glemaitre
Copy link
Contributor

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

series = pd.Series(np.arange(3))
print(series.iloc[[True, False, True, False]])
print(series.iloc[[True, False, True, True]])
0    0
2    2
dtype: int64
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getbool_axis(self, key, axis)
   1517         try:
-> 1518             return self.obj._take(inds, axis=axis)
   1519         except Exception as detail:

~/miniconda3/lib/python3.7/site-packages/pandas/core/series.py in _take(self, indices, axis, is_copy)
   3925         indices = ensure_platform_int(indices)
-> 3926         new_index = self.index.take(indices)
   3927 

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in take(self, indices, axis, allow_fill, fill_value, **kwargs)
    798                 raise ValueError(msg.format(self.__class__.__name__))
--> 799             taken = self.values.take(indices)
    800         return self._shallow_copy(taken)

IndexError: index 3 is out of bounds for size 3

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-1-f9cf8f154f3b> in <module>
      4 series = pd.Series(np.arange(3))
      5 print(series.iloc[[True, False, True, False]])
----> 6 print(series.iloc[[True, False, True, True]])

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1498 
   1499             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1500             return self._getitem_axis(maybe_callable, axis=axis)
   1501 
   1502     def _is_scalar_access(self, key):

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   2215         if com.is_bool_indexer(key):
   2216             self._validate_key(key, axis)
-> 2217             return self._getbool_axis(key, axis=axis)
   2218 
   2219         # a list of integers

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getbool_axis(self, key, axis)
   1518             return self.obj._take(inds, axis=axis)
   1519         except Exception as detail:
-> 1520             raise self._exception(detail)
   1521 
   1522     def _get_slice_axis(self, slice_obj, axis=None):

IndexError: index 3 is out of bounds for size 3

Problem description

Giving a boolean mask of a different size than a Series, filtering will work when boolean values out of bounds will be False while it wil error with the proper IndexError when of these values will True.

It seems a surprising results, a little bit inconsistent. I would have expect an error in all cases due to the inconsistent size of the boolean mask.

Expected Output

Raise an error with inconsistent size of the boolean mask and the Series to filter.

Output of pd.show_versions()

/home/lemaitre/miniconda3/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-50-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.2.1
pyarrow: 0.11.1
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.3.0
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
@TomAugspurger
Copy link
Contributor

Did scikit-learn run into this recently, @NicolasHug reported this too 😄

#26911 is fixing it. I'll make it's in today.

@glemaitre
Copy link
Contributor Author

Ups sorry he was faster than me. This is what happens when you want to check IRL with a pandas dev (@jorisvandenbossche ;)). Closing then. Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants